PubTator3 API¶
This document describes the PubTator3 API used by BioMCP for searching biomedical literature and retrieving article details with annotations. Understanding this API provides context for how BioMCP's article commands function.
Overview¶
The PubTator3 API provides a way to search for and retrieve biomedical articles with entity annotations. This document outlines the API implementation details. PubTator3 is a web-based tool that provides annotations of biomedical entities in PubMed abstracts and PMC full-text articles. BioMCP uses the PubTator3 API to search for and retrieve biomedical articles and their annotated entities ( genes, variants, diseases, chemicals, etc.).
CLI Documentation: For information on using these APIs through the BioMCP command line interface, see the Articles CLI Documentation.
API Workflow¶
The PubTator3 integration follows a three-step workflow:
- Entity Autocomplete: Get standardized entity identifiers
- Search: Find articles using entity identifiers and keywords
- Fetch: Retrieve full article details by PMID
API Endpoints¶
Entity Autocomplete API¶
Endpoint:
https://www.ncbi.nlm.nih.gov/research/pubtator3-api/entity/autocomplete/
This endpoint helps normalize entity names to their standard identifiers, improving search precision.
Parameters¶
Parameter | Description | Example |
---|---|---|
query |
Text to autocomplete | BRAF |
concept |
Entity type | GENE , CHEMICAL , DISEASE , etc. |
limit |
Number of results to return | 2 |
Example Request and Response¶
curl "https://www.ncbi.nlm.nih.gov/research/pubtator3-api/entity/autocomplete/?query=BRAF&concept=GENE&limit=2"
Response:
[
{
"_id": "@GENE_BRAF",
"biotype": "gene",
"name": "BRAF",
"description": "All Species",
"match": "Matched on name <m>BRAF</m>"
},
{
"_id": "@GENE_BRAFP1",
"biotype": "gene",
"name": "BRAFP1",
"description": "All Species",
"match": "Matched on name <m>BRAFP1</m>"
}
]
Entity Search API¶
Endpoint: https://www.ncbi.nlm.nih.gov/research/pubtator3-api/search/
This endpoint allows searching for PMIDs (PubMed IDs) based on entity identifiers and keywords.
Parameters¶
Parameter | Description | Example |
---|---|---|
text |
Entity identifier or text query | @CHEMICAL_remdesivir |
Example Request and Response¶
Response (truncated):
{
"results": [
{
"_id": "37711410",
"pmid": 37711410,
"title": "Remdesivir.",
"journal": "Hosp Pharm",
"authors": ["Levien TL", "Baker DE"],
"date": "2023-10-01T00:00:00Z",
"doi": "10.1177/0018578721999804",
"meta_date_publication": "2023 Oct",
"meta_volume": "58"
}
// More results...
]
}
Article Fetch API¶
Endpoint:
https://www.ncbi.nlm.nih.gov/research/pubtator3-api/publications/export/biocjson
This endpoint retrieves detailed information about specific articles, including annotations.
Parameters¶
Parameter | Description | Example |
---|---|---|
pmids |
List of PubMed IDs to retrieve | 29355051 |
full_text |
Whether to include full text (when available) | true |
Example Request¶
curl "https://www.ncbi.nlm.nih.gov/research/pubtator3-api/publications/export/biocjson?pmids=29355051&full=true"
Response format (truncated):
{
"PubTator3": [
{
"_id": "29355051|PMC6142073",
"id": "6142073",
"infons": {},
"passages": [
{
"infons": {
"name_3": "surname:Hu;given-names:Minghua",
"name_2": "surname:Luo;given-names:Xia",
"name_1": "surname:Luo;given-names:Shuang",
"article-id_pmid": "29355051"
// More metadata...
}
}
// More passages...
]
}
]
}
Entity Types¶
PubTator3 annotates several types of biomedical entities:
- Genes/Proteins: Gene or protein names (e.g., BRAF, TP53)
- Genetic Variants: Genetic variations (e.g., BRAF V600E)
- Diseases: Disease names and conditions (e.g., Melanoma)
- Chemicals/Drugs: Chemical substances or drugs (e.g., Vemurafenib)
Integration Strategy for BioMCP¶
The recommended workflow for integrating with PubTator3 in BioMCP is:
- Entity Normalization: Use the autocomplete API to convert user-provided entity names to standardized identifiers
- Literature Search: Use the search API with these identifiers to find relevant PMIDs
- Data Retrieval: Fetch detailed article data with annotations using the fetch API
This workflow ensures consistent entity handling and optimal search results.
Authentication¶
The PubTator3 API is public and does not require authentication for basic usage. However, there are rate limits in place to prevent abuse.
Rate Limits and Best Practices¶
- Request Limits: Approximately 30 requests per minute
- Batch Requests: For article retrieval, batch multiple PMIDs in a single request
- Caching: Implement caching to minimize repeated requests
- Specific Queries: Use specific entity names rather than general terms for better results
Error Handling¶
Common error responses:
- 400: Invalid parameters
- 404: Articles not found
- 429: Rate limit exceeded
- 500: Server error
More Information¶
For complete API documentation, visit the PubTator3 API Documentation.