How to Get Comprehensive Variant Annotations¶

This guide demonstrates how to retrieve and interpret genetic variant information using BioMCP's integrated databases.

Overview¶

BioMCP provides variant annotations from multiple sources:

MyVariant.info: Core variant database with clinical significance (BioThings Reference)
External Annotations: TCGA cancer data, 1000 Genomes population frequencies
cBioPortal Integration: Cancer-specific mutation context (API Reference)
BioThings Links: Connected gene, disease, and drug information (BioThings Suite)

Basic Variant Lookup¶

Search by rsID¶

Find variant information using dbSNP identifiers:

# CLI
biomcp variant get rs121913529

# Python
variant = await client.variants.get("rs121913529")

# MCP Tool
variant_getter(variant_id="rs121913529")

Search by HGVS Notation¶

Use standard HGVS notation:

# Protein change
variant = await variant_getter("NP_004324.2:p.Val600Glu")

# Coding DNA change
variant = await variant_getter("NM_004333.4:c.1799T>A")

# Genomic coordinates
variant = await variant_getter("NC_000007.13:g.140453136A>T")

Search by Genomic Position¶

# Search by coordinates
variants = await variant_searcher(
    chromosome="7",
    start=140453136,
    end=140453136,
    assembly="hg38"  # or hg19
)

Understanding Variant Annotations¶

Clinical Significance¶

# Get variant details
variant = await variant_getter("rs121913529")

# Check clinical significance
print(f"Clinical Significance: {variant.clinical_significance}")
# Output: "Pathogenic"

print(f"ClinVar Review Status: {variant.review_status}")
# Output: "reviewed by expert panel"

Population Frequencies¶

# Access frequency data
if variant.frequencies:
    print("Population Frequencies:")
    print(f"  gnomAD: {variant.frequencies.gnomad}")
    print(f"  1000 Genomes: {variant.frequencies.thousand_genomes}")
    print(f"  ExAC: {variant.frequencies.exac}")

Functional Predictions¶

# In silico predictions
if variant.predictions:
    print(f"CADD Score: {variant.predictions.cadd}")
    print(f"PolyPhen: {variant.predictions.polyphen}")
    print(f"SIFT: {variant.predictions.sift}")

Advanced Variant Searches¶

Filter by Clinical Significance¶

# Find pathogenic BRCA1 variants
pathogenic_variants = await variant_searcher(
    gene="BRCA1",
    significance="pathogenic",
    limit=20
)

# Multiple significance levels
variants = await variant_searcher(
    gene="TP53",
    significance=["pathogenic", "likely_pathogenic"]
)

Filter by Frequency¶

Find rare variants:

# Rare variants (MAF < 1%)
rare_variants = await variant_searcher(
    gene="CFTR",
    frequency_max=0.01,
    significance="pathogenic"
)

# Ultra-rare variants
ultra_rare = await variant_searcher(
    gene="SCN1A",
    frequency_max=0.0001
)

Filter by Prediction Scores¶

# High-impact variants
high_impact = await variant_searcher(
    gene="MLH1",
    cadd_score_min=20,  # CADD > 20 suggests deleteriousness
    polyphen_prediction="probably_damaging"
)

External Database Integration¶

For technical details on external data sources, see the BioThings Suite Reference.

TCGA Cancer Data¶

Variants automatically include TCGA annotations when available:

variant = await variant_getter("rs121913529", include_external=True)

# Check TCGA data
if variant.external_data.get("tcga"):
    tcga = variant.external_data["tcga"]
    print(f"TCGA Studies: {tcga['study_count']}")
    print(f"Cancer Types: {', '.join(tcga['cancer_types'])}")
    print(f"Sample Count: {tcga['sample_count']}")

1000 Genomes Project¶

Population-specific frequencies:

# Access 1000 Genomes data
if variant.external_data.get("thousand_genomes"):
    tg_data = variant.external_data["thousand_genomes"]
    print("Population Frequencies:")
    for pop, freq in tg_data["populations"].items():
        print(f"  {pop}: {freq}")

Ensembl VEP Annotations¶

# Consequence predictions
if variant.consequences:
    for consequence in variant.consequences:
        print(f"Gene: {consequence.gene}")
        print(f"Impact: {consequence.impact}")
        print(f"Consequence: {consequence.consequence_terms}")

Integration with Other BioMCP Tools¶

BioMCP's unified architecture allows seamless integration between variant data and other biomedical information. For implementation details, see the Transport Protocol Guide.

Variant to Gene Information¶

# Get variant
variant = await variant_getter("rs121913529")

# Get associated gene details
gene_symbol = variant.gene.symbol  # "BRAF"
gene_info = await gene_getter(gene_symbol)

print(f"Gene: {gene_info.name}")
print(f"Function: {gene_info.summary}")

Variant to Disease Context¶

# Find disease associations
diseases = variant.disease_associations

for disease in diseases:
    # Get detailed disease info
    disease_info = await disease_getter(disease.name)
    print(f"Disease: {disease_info.name}")
    print(f"Definition: {disease_info.definition}")
    print(f"Synonyms: {', '.join(disease_info.synonyms)}")

Variant to Clinical Trials¶

# Search trials for specific variant
gene = variant.gene.symbol
mutation = variant.protein_change  # e.g., "V600E"

trials = await trial_searcher(
    other_terms=[f"{gene} {mutation}", f"{gene} mutation"],
    recruiting_status="OPEN"
)

Practical Workflows¶

Workflow 1: Cancer Variant Analysis¶

async def analyze_cancer_variant(hgvs: str):
    # Think about the analysis
    await think(
        thought=f"Analyzing cancer variant {hgvs}",
        thoughtNumber=1
    )

    # Get variant details
    variant = await variant_getter(hgvs, include_external=True)

    # Get gene context
    gene = await gene_getter(variant.gene.symbol)

    # Search for targeted therapies
    drugs = await search(
        query=f"drugs.targets:{variant.gene.symbol}",
        domain="drug"
    )

    # Find relevant trials
    trials = await trial_searcher(
        other_terms=[
            variant.gene.symbol,
            variant.protein_change,
            "targeted therapy"
        ],
        recruiting_status="OPEN"
    )

    # Search literature
    articles = await article_searcher(
        genes=[variant.gene.symbol],
        variants=[hgvs],
        keywords=["therapy", "treatment", "resistance"]
    )

    return {
        "variant": variant,
        "gene": gene,
        "potential_drugs": drugs,
        "clinical_trials": trials,
        "literature": articles
    }

Workflow 2: Rare Disease Variant¶

async def rare_disease_variant_analysis(gene: str, phenotype: str):
    # Find all pathogenic variants
    variants = await variant_searcher(
        gene=gene,
        significance=["pathogenic", "likely_pathogenic"],
        frequency_max=0.001  # Rare
    )

    # Analyze each variant
    results = []
    for v in variants[:10]:  # Top 10
        # Get full annotations
        full_variant = await variant_getter(v.id)

        # Check phenotype associations
        if phenotype.lower() in str(full_variant.phenotypes).lower():
            results.append({
                "variant": full_variant,
                "phenotype_match": True,
                "frequency": full_variant.frequencies.gnomad or 0
            })

    # Sort by relevance
    results.sort(key=lambda x: x["frequency"])
    return results

Workflow 3: Pharmacogenomics¶

async def pharmacogenomic_analysis(drug_name: str):
    # Get drug information
    drug = await drug_getter(drug_name)

    # Find pharmGKB annotations
    pgx_variants = []

    # Search for drug-related variants
    if drug.targets:
        for target in drug.targets:
            variants = await variant_searcher(
                gene=target,
                keywords=[drug_name, "pharmacogenomics", "drug response"]
            )
            pgx_variants.extend(variants)

    # Get detailed annotations
    annotated = []
    for v in pgx_variants:
        full = await variant_getter(v.id)
        if full.pharmacogenomics:
            annotated.append(full)

    return {
        "drug": drug,
        "pgx_variants": annotated,
        "affected_genes": list(set(v.gene.symbol for v in annotated))
    }

Interpreting Results¶

Clinical Actionability¶

def assess_actionability(variant):
    """Determine if variant is clinically actionable"""

    actionable = False
    reasons = []

    # Check pathogenicity
    if variant.clinical_significance in ["pathogenic", "likely_pathogenic"]:
        actionable = True
        reasons.append("Pathogenic variant")

    # Check for drug associations
    if variant.drug_associations:
        actionable = True
        reasons.append(f"Associated with {len(variant.drug_associations)} drugs")

    # Check guidelines
    if variant.clinical_guidelines:
        actionable = True
        reasons.append("Clinical guidelines available")

    return {
        "actionable": actionable,
        "reasons": reasons,
        "recommendations": variant.clinical_guidelines
    }

Report Generation¶

def generate_variant_report(variant):
    """Create a clinical variant report"""

    report = f"""
## Variant Report: {variant.id}

### Basic Information
- **Gene**: {variant.gene.symbol}
- **Protein Change**: {variant.protein_change or "N/A"}
- **Genomic Location**: chr{variant.chr}:{variant.pos}
- **Reference**: {variant.ref} → **Alternate**: {variant.alt}

### Clinical Significance
- **Status**: {variant.clinical_significance}
- **Review**: {variant.review_status}
- **Last Updated**: {variant.last_updated}

### Population Frequency
- **gnomAD**: {variant.frequencies.gnomad or "Not found"}
- **1000 Genomes**: {variant.frequencies.thousand_genomes or "Not found"}

### Predictions
- **CADD Score**: {variant.predictions.cadd or "N/A"}
- **PolyPhen**: {variant.predictions.polyphen or "N/A"}
- **SIFT**: {variant.predictions.sift or "N/A"}

### Associated Conditions
{format_conditions(variant.conditions)}

### Clinical Resources
- **ClinVar**: {variant.clinvar_url}
- **dbSNP**: {variant.dbsnp_url}
"""
    return report

Best Practices¶

1. Use Multiple Identifiers¶

# Try multiple formats if one fails
identifiers = [
    "rs121913529",
    "NM_004333.4:c.1799T>A",
    "7:140453136:A:T"
]

for id in identifiers:
    try:
        variant = await variant_getter(id)
        break
    except:
        continue

2. Check Data Completeness¶

# Not all variants have all annotations
if variant.frequencies:
    # Use frequency data
    pass
else:
    # Note that frequency unavailable
    pass

3. Consider Assembly Versions¶

# Specify genome assembly
variants_hg38 = await variant_searcher(
    chromosome="7",
    start=140453136,
    assembly="hg38"
)

variants_hg19 = await variant_searcher(
    chromosome="7",
    start=140153336,  # Different coordinate!
    assembly="hg19"
)

Troubleshooting¶

Variant Not Found¶

Check notation: Ensure proper HGVS format
Try alternatives: rsID, genomic coordinates, protein change
Verify gene symbol: Use official HGNC symbols

Missing Annotations¶

Not all variants have all data types
Rare variants may lack population frequencies
Novel variants won't have ClinVar data

Performance Issues¶

Use pagination for large searches
Limit external data requests when not needed
Cache frequently accessed variants

Next Steps¶

Learn to predict variant effects
Explore article searches for variant literature
Set up logging and monitoring