Backend Services Reference Overview
BioMCP integrates with multiple biomedical databases and services to provide comprehensive research capabilities. This reference documents the underlying APIs and their capabilities.
Service Categories
Literature and Publications
- PubTator3: Biomedical literature with entity annotations
- Europe PMC: Preprints from bioRxiv and medRxiv
Clinical Trials
Biomedical Annotations
- BioThings Suite:
- MyGene.info - Gene annotations
- MyVariant.info - Variant annotations
- MyDisease.info - Disease ontology
- MyChem.info - Drug/chemical data
Cancer Genomics
- cBioPortal: Cancer genomics portal with mutation data
- TCGA: The Cancer Genome Atlas (via MyVariant.info)
Variant Effect Prediction
- AlphaGenome: Google DeepMind's AI for regulatory predictions
API Authentication
Service |
Authentication Required |
Type |
Rate Limits |
PubTator3 |
No |
Public |
3 requests/second |
ClinicalTrials.gov |
No |
Public |
50,000 requests/day |
NCI CTS API |
Yes |
API Key |
1,000 requests/day |
BioThings APIs |
No |
Public |
1,000 requests/hour |
cBioPortal |
Optional |
Token |
Higher with token |
AlphaGenome |
Yes |
API Key |
Contact provider |
Data Flow Architecture
User Query → BioMCP Tools → Backend APIs → Unified Response
Example Flow:
1. User: "Find articles about BRAF mutations"
2. BioMCP: article_searcher tool
3. APIs Called:
- PubTator3 (articles)
- cBioPortal (mutation data)
- Europe PMC (preprints)
4. Response: Integrated results with citations
Service Reliability
Primary Services
- PubTator3: 99.9% uptime, updated daily
- ClinicalTrials.gov: 99.5% uptime, updated daily
- BioThings APIs: 99.9% uptime, real-time data
Fallback Strategies
- Cache frequently accessed data
- Implement exponential backoff
- Use alternative endpoints when available
Common Integration Patterns
1. Entity Recognition Enhancement
PubTator3 → Extract entities → BioThings → Get detailed annotations
2. Variant to Trial Pipeline
MyVariant.info → Get gene → ClinicalTrials.gov → Find relevant trials
3. Comprehensive Gene Analysis
MyGene.info → Basic info
cBioPortal → Cancer mutations
PubTator3 → Literature
AlphaGenome → Predictions
Response Times (typical)
- PubTator3: 200-500ms
- ClinicalTrials.gov: 300-800ms
- BioThings APIs: 100-300ms
- cBioPortal: 200-600ms
- AlphaGenome: 1-3 seconds
Optimization Strategies
- Batch requests when APIs support it
- Cache static data (gene names, ontologies)
- Parallelize independent API calls
- Use pagination for large result sets
Error Handling
Common Error Types
- Rate Limiting: 429 errors, implement backoff
- Invalid Parameters: 400 errors, validate inputs
- Service Unavailable: 503 errors, retry with delay
- Authentication: 401 errors, check API keys
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "API rate limit exceeded",
"retry_after": 3600
}
}
- Identifiers: HGNC symbols, rsIDs, NCT numbers, PMIDs
- Coordinates: GRCh38 genomic positions
- Terms: MeSH, MONDO, HPO ontologies
- JSON: Primary format for all APIs
- XML: Available for some services
- TSV/CSV: Export options for bulk data
Update Frequencies
Service |
Update Frequency |
Data Lag |
PubTator3 |
Daily |
1-2 days |
ClinicalTrials.gov |
Daily |
Real-time |
NCI CTS |
Daily |
1 day |
BioThings |
Real-time |
Minutes |
cBioPortal |
Quarterly |
3-6 months |
Best Practices
1. API Key Management
- Store keys securely
- Rotate keys periodically
- Monitor usage against limits
2. Error Recovery
- Implement retry logic
- Log failed requests
- Provide fallback data
3. Data Validation
- Verify gene symbols
- Validate genomic coordinates
- Check identifier formats
- Cache when appropriate
- Batch similar requests
- Use appropriate page sizes
Getting Started
- Review individual service documentation
- Obtain necessary API keys
- Test endpoints with sample data
- Implement error handling
- Monitor usage and performance
Support Resources