scispacy

[ScispaCy](https://allenai.github.io/scispacy/) is a python package containing spaCy models for processing biomedical, scientific or clinical text. Scispacy includes [[named entity recognition]] models for - [CRAFT corpus](https://github.com/UCDenver-ccp/CRAFT): a collection of 97 articles from the PubMed Central Open Access subset, each of which has been annotated along a number of different axes spanning structural, coreference, and concept annotation. - [JNLPBA corpus](https://huggingface.co/datasets/jnlpba/jnlpba): the data came from the GENIA version 3.02 corpus (Kim et al., 2003). This was formed from a controlled search on MEDLINE using the MeSH terms human, blood cells and transcription factors. From this search 2,000 abstracts were selected and hand annotated according to a small taxonomy of 48 classes based on a chemical classification. Among the classes, 36 terminal classes were used to annotate the GENIA corpus. - [BC4CDR corpus](https://huggingface.co/datasets/bigbio/bc5cdr): The BioCreative V Chemical Disease Relation (CDR) dataset is a large annotated text corpus of human annotations of all chemicals, diseases and their interactions in 1,500 PubMed articles. - BIONLP13CG corpus