BioThings Suite API Reference¶
The BioThings Suite provides unified access to biomedical annotations across genes, variants, diseases, and drugs through a consistent API interface.
Usage Examples¶
For practical examples using the BioThings APIs, see:
Overview¶
BioMCP integrates with four BioThings APIs:
- MyGene.info: Gene annotations and functional information
- MyVariant.info: Genetic variant annotations and clinical significance
- MyDisease.info: Disease ontology and terminology mappings
- MyChem.info: Drug/chemical properties and mechanisms
All APIs share:
- RESTful JSON interface
- No authentication required
- Elasticsearch-based queries
- Comprehensive data aggregation
MyGene.info¶
Base URL¶
https://mygene.info/v1/
Key Endpoints¶
Gene Query¶
Parameters:
q
: Query string (gene symbol, name, or ID)fields
: Specific fields to returnspecies
: Limit to species (default: human, mouse, rat)size
: Number of results (default: 10)
Example:
Gene Annotation¶
Gene ID formats:
- Entrez Gene ID:
673
- Ensembl ID:
ENSG00000157764
- Gene Symbol:
BRAF
Example:
Important Fields¶
Field | Description | Example |
---|---|---|
symbol |
Official gene symbol | "BRAF" |
name |
Full gene name | "B-Raf proto-oncogene" |
entrezgene |
NCBI Entrez ID | 673 |
summary |
Functional description | "This gene encodes..." |
genomic_pos |
Chromosomal location | {"chr": "7", "start": 140433812} |
pathway |
Pathway memberships | {"kegg": [...], "reactome": [...]} |
go |
Gene Ontology terms | {"BP": [...], "MF": [...], "CC": [...]} |
MyVariant.info¶
Base URL¶
https://myvariant.info/v1/
Key Endpoints¶
Variant Query¶
Query syntax:
- Gene + variant:
dbnsfp.genename:BRAF AND dbnsfp.hgvsp:p.V600E
- rsID:
dbsnp.rsid:rs121913529
- Genomic:
_id:chr7:g.140453136A>T
Example:
Variant Annotation¶
ID formats:
- HGVS genomic:
chr7:g.140453136A>T
- dbSNP:
rs121913529
Important Fields¶
Field | Description | Example |
---|---|---|
clinvar |
Clinical significance | {"clinical_significance": "Pathogenic"} |
dbsnp |
dbSNP annotations | {"rsid": "rs121913529"} |
cadd |
CADD scores | {"phred": 35} |
gnomad_exome |
Population frequency | {"af": {"af": 0.00001}} |
dbnsfp |
Functional predictions | {"polyphen2": "probably_damaging"} |
Query Filters¶
# Clinical significance
q = "clinvar.clinical_significance:pathogenic"
# Frequency filters
q = "gnomad_exome.af.af:<0.01" # Rare variants
# Gene-specific
q = "dbnsfp.genename:BRCA1 AND cadd.phred:>20"
MyDisease.info¶
Base URL¶
https://mydisease.info/v1/
Key Endpoints¶
Disease Query¶
Example:
Disease Annotation¶
ID formats:
- MONDO:
MONDO:0007254
- DOID:
DOID:1909
- OMIM:
OMIM:155600
Important Fields¶
Field | Description | Example |
---|---|---|
mondo |
MONDO ontology | {"id": "MONDO:0007254", "label": "melanoma"} |
disease_ontology |
Disease Ontology | {"id": "DOID:1909"} |
synonyms |
Alternative names | ["malignant melanoma", "MM"] |
xrefs |
Cross-references | {"omim": ["155600"], "mesh": ["D008545"]} |
phenotypes |
HPO terms | [{"hpo_id": "HP:0002861"}] |
MyChem.info¶
Base URL¶
https://mychem.info/v1/
Key Endpoints¶
Drug Query¶
Example:
Drug Annotation¶
ID formats:
- DrugBank:
DB00619
- ChEMBL:
CHEMBL941
- Name:
imatinib
Important Fields¶
Field | Description | Example |
---|---|---|
drugbank |
DrugBank data | {"id": "DB00619", "name": "Imatinib"} |
chembl |
ChEMBL data | {"molecule_chembl_id": "CHEMBL941"} |
chebi |
ChEBI ontology | {"id": "CHEBI:45783"} |
drugcentral |
Indications | {"indications": [...]} |
pharmacology |
Mechanism | {"mechanism_of_action": "BCR-ABL inhibitor"} |
Common Query Patterns¶
1. Gene to Variant Pipeline¶
# Step 1: Get gene info
gene_response = requests.get(
"https://mygene.info/v1/gene/BRAF",
params={"fields": "symbol,genomic_pos"}
)
# Step 2: Find variants in gene
variant_response = requests.get(
"https://myvariant.info/v1/query",
params={
"q": "dbnsfp.genename:BRAF",
"fields": "clinvar.clinical_significance,gnomad_exome.af",
"size": 100
}
)
2. Disease Synonym Expansion¶
# Get all synonyms for a disease
disease_response = requests.get(
"https://mydisease.info/v1/query",
params={
"q": "melanoma",
"fields": "mondo,synonyms,xrefs"
}
)
# Extract all names
all_names = ["melanoma"]
for hit in disease_response.json()["hits"]:
if "synonyms" in hit:
all_names.extend(hit["synonyms"])
3. Drug Target Lookup¶
# Find drugs targeting a gene
drug_response = requests.get(
"https://mychem.info/v1/query",
params={
"q": "drugcentral.targets.gene_symbol:BRAF",
"fields": "drugbank.name,chembl.pref_name",
"size": 50
}
)
Rate Limits and Best Practices¶
Rate Limits¶
- Default: 1,000 requests/hour per IP
- Batch queries: Up to 1,000 IDs per request
- No authentication: Public access
Best Practices¶
1. Use Field Filtering¶
# Good - only request needed fields
params = {"fields": "symbol,name,summary"}
# Bad - returns all fields
params = {}
2. Batch Requests¶
# Good - single request for multiple genes
response = requests.post(
"https://mygene.info/v1/gene",
json={"ids": ["BRAF", "KRAS", "EGFR"]}
)
# Bad - multiple individual requests
for gene in ["BRAF", "KRAS", "EGFR"]:
requests.get(f"https://mygene.info/v1/gene/{gene}")
3. Handle Missing Data¶
# Check for field existence
if "clinvar" in variant and "clinical_significance" in variant["clinvar"]:
significance = variant["clinvar"]["clinical_significance"]
else:
significance = "Not available"
Error Handling¶
Common Errors¶
404 Not Found¶
400 Bad Request¶
429 Rate Limited¶
Error Handling Code¶
def query_biothings(api_url, query_params):
try:
response = requests.get(api_url, params=query_params)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if e.response.status_code == 404:
return {"error": "Not found", "query": query_params}
elif e.response.status_code == 429:
# Implement exponential backoff
time.sleep(60)
return query_biothings(api_url, query_params)
else:
raise
Data Sources¶
Each BioThings API aggregates data from multiple sources:
MyGene.info Sources¶
- NCBI Entrez Gene
- Ensembl
- UniProt
- KEGG, Reactome, WikiPathways
- Gene Ontology
MyVariant.info Sources¶
- dbSNP
- ClinVar
- gnomAD
- CADD
- PolyPhen-2, SIFT
- COSMIC
MyDisease.info Sources¶
- MONDO
- Disease Ontology
- OMIM
- MeSH
- HPO
MyChem.info Sources¶
- DrugBank
- ChEMBL
- ChEBI
- PubChem
- DrugCentral
Advanced Features¶
Full-Text Search¶
# Search across all fields
params = {
"q": "lung cancer EGFR", # Searches all text fields
"fields": "symbol,name,summary"
}
Faceted Search¶
# Get aggregations
params = {
"q": "clinvar.clinical_significance:pathogenic",
"facets": "dbnsfp.genename",
"size": 0 # Only return facets
}
Scrolling Large Results¶
Integration Tips¶
1. Caching Strategy¶
- Cache gene/drug/disease lookups (stable)
- Don't cache variant queries (frequently updated)
- Use ETags for conditional requests
2. Parallel Requests¶
import asyncio
import aiohttp
async def fetch_all(session, urls):
tasks = []
for url in urls:
tasks.append(session.get(url))
return await asyncio.gather(*tasks)