Skip to content

AlphaGenome API Reference

Google DeepMind's AlphaGenome provides AI-powered predictions of variant effects on gene regulation, chromatin accessibility, and splicing.

Usage Guide

For a step-by-step tutorial on using AlphaGenome for variant effect prediction, see How to Predict Variant Effects with AlphaGenome.

Overview

AlphaGenome predicts regulatory effects of genetic variants by analyzing:

  • Gene expression changes in nearby genes
  • Chromatin accessibility alterations
  • Splicing pattern modifications
  • Enhancer and promoter activity
  • Transcription factor binding
  • 3D chromatin interactions

Note: AlphaGenome is an optional integration requiring separate installation and API key.

Authentication

Obtaining an API Key

  1. Visit https://deepmind.google.com/science/alphagenome
  2. Register for non-commercial research use
  3. Accept terms of service
  4. Receive API key via email

API Key Usage

Environment Variable:

export ALPHAGENOME_API_KEY="your-key-here"

Per-Request:

result = alphagenome_predictor(
    chromosome="chr7",
    position=140753336,
    reference="A",
    alternate="T",
    api_key="your-key-here"  # Overrides environment
)

Installation

AlphaGenome requires separate installation:

# Clone and install
git clone https://github.com/google-deepmind/alphagenome.git
cd alphagenome
pip install .

# Verify installation
python -c "import alphagenome; print('AlphaGenome installed')"

API Interface

Prediction Endpoint

The AlphaGenome API is accessed through the BioMCP alphagenome_predictor tool.

Parameters

Parameter Type Required Description
chromosome str Yes Chromosome (e.g., "chr7")
position int Yes 1-based genomic position
reference str Yes Reference allele
alternate str Yes Alternate allele
interval_size int No Analysis window (default: 131072)
tissue_types list[str] No UBERON tissue codes
significance_threshold float No Log2FC threshold (default: 0.5)
api_key str No AlphaGenome API key

Interval Sizes

Size Use Case Description
2,048 Promoter TSS and promoter variants
16,384 Local Proximal regulatory elements
131,072 Standard Enhancer-promoter interactions
524,288 Long-range Distal regulatory elements
1,048,576 TAD-level Topological domain effects

Tissue Codes

AlphaGenome supports tissue-specific predictions using UBERON ontology:

Tissue UBERON Code Description
Breast UBERON:0000310 Mammary gland tissue
Liver UBERON:0002107 Hepatic tissue
Prostate UBERON:0002367 Prostate gland
Brain UBERON:0000955 Neural tissue
Lung UBERON:0002048 Pulmonary tissue
Colon UBERON:0001155 Colonic mucosa

Response Format

Gene Expression Predictions

{
  "gene_expression": [
    {
      "gene_name": "BRAF",
      "gene_id": "ENSG00000157764",
      "distance_to_tss": 1234,
      "log2_fold_change": 1.25,
      "confidence": 0.89,
      "tissue": "UBERON:0000310"
    }
  ]
}

Interpretation:

  • log2_fold_change > 1.0: Strong increase (2x+)
  • log2_fold_change > 0.5: Moderate increase
  • log2_fold_change < -1.0: Strong decrease (0.5x)
  • log2_fold_change < -0.5: Moderate decrease

Chromatin Accessibility

{
  "chromatin_accessibility": [
    {
      "region_type": "enhancer",
      "coordinates": "chr7:140450000-140451000",
      "accessibility_change": 0.75,
      "peak_height_change": 1.2,
      "tissue": "UBERON:0000310"
    }
  ]
}

Interpretation:

  • Positive values: Increased accessibility (open chromatin)
  • Negative values: Decreased accessibility (closed chromatin)

Splicing Predictions

{
  "splicing": [
    {
      "event_type": "exon_skipping",
      "affected_exon": "ENST00000288602.6:exon14",
      "delta_psi": -0.35,
      "splice_site_strength_change": -2.1
    }
  ]
}

PSI (Percent Spliced In):

  • delta_psi > 0: Increased exon inclusion
  • delta_psi < 0: Increased exon skipping
  • |delta_psi| > 0.1: Biologically significant

Usage Examples

Basic Prediction

# Predict BRAF V600E effects
result = await alphagenome_predictor(
    chromosome="chr7",
    position=140753336,
    reference="A",
    alternate="T"
)

# Process results
for gene in result.gene_expression:
    if abs(gene.log2_fold_change) > 1.0:
        print(f"{gene.gene_name}: {gene.log2_fold_change:.2f} log2FC")

Tissue-Specific Analysis

# Compare effects across tissues
tissues = {
    "breast": "UBERON:0000310",
    "lung": "UBERON:0002048",
    "brain": "UBERON:0000955"
}

results = {}
for tissue_name, tissue_code in tissues.items():
    results[tissue_name] = await alphagenome_predictor(
        chromosome="chr17",
        position=7577120,
        reference="G",
        alternate="A",
        tissue_types=[tissue_code]
    )

Promoter Variant Analysis

# Use small window for promoter variants
result = await alphagenome_predictor(
    chromosome="chr7",
    position=5569100,  # Near ACTB promoter
    reference="C",
    alternate="T",
    interval_size=2048  # 2kb window
)

# Check for promoter effects
promoter_effects = [
    g for g in result.gene_expression
    if abs(g.distance_to_tss) < 1000
]

Enhancer Variant Analysis

# Use larger window for enhancer variants
result = await alphagenome_predictor(
    chromosome="chr8",
    position=128748315,  # MYC enhancer region
    reference="G",
    alternate="A",
    interval_size=524288  # 512kb window
)

# Analyze chromatin changes
enhancer_changes = [
    c for c in result.chromatin_accessibility
    if c.region_type == "enhancer" and abs(c.accessibility_change) > 0.5
]

Best Practices

1. Choose Appropriate Interval Size

def select_interval_size(variant_type):
    """Select interval based on variant type"""
    intervals = {
        "promoter": 2048,
        "splice_site": 16384,
        "enhancer": 131072,
        "intergenic": 524288,
        "structural": 1048576
    }
    return intervals.get(variant_type, 131072)

2. Handle Missing Predictions

# Not all variants affect gene expression
if not result.gene_expression:
    print("No gene expression changes predicted")
    # Check chromatin or splicing effects instead

3. Filter by Significance

# Focus on significant changes
significant_genes = [
    g for g in result.gene_expression
    if abs(g.log2_fold_change) > significance_threshold
    and g.confidence > 0.8
]

4. Validate Input

def validate_variant(chr, pos, ref, alt):
    """Validate variant format"""
    # Check chromosome format
    if not chr.startswith("chr"):
        raise ValueError("Chromosome must start with 'chr'")

    # Check alleles
    valid_bases = set("ACGT")
    if ref not in valid_bases or alt not in valid_bases:
        raise ValueError("Invalid nucleotide")

    # Check position
    if pos < 1:
        raise ValueError("Position must be 1-based")

Integration Patterns

VUS Classification Pipeline

async def classify_vus(variant):
    """Classify variant of unknown significance"""

    # 1. Predict regulatory effects
    predictions = await alphagenome_predictor(
        chromosome=variant.chr,
        position=variant.pos,
        reference=variant.ref,
        alternate=variant.alt
    )

    # 2. Score impact
    max_expression = max(
        abs(g.log2_fold_change) for g in predictions.gene_expression
    ) if predictions.gene_expression else 0

    max_chromatin = max(
        abs(c.accessibility_change) for c in predictions.chromatin_accessibility
    ) if predictions.chromatin_accessibility else 0

    # 3. Classify
    if max_expression > 2.0 or max_chromatin > 1.5:
        return "High regulatory impact"
    elif max_expression > 1.0 or max_chromatin > 0.75:
        return "Moderate regulatory impact"
    else:
        return "Low regulatory impact"

Multi-Variant Analysis

async def analyze_variant_set(variants, target_gene):
    """Analyze multiple variants affecting a gene"""

    results = []
    for variant in variants:
        prediction = await alphagenome_predictor(
            chromosome=variant["chr"],
            position=variant["pos"],
            reference=variant["ref"],
            alternate=variant["alt"]
        )

        # Find target gene effect
        for gene in prediction.gene_expression:
            if gene.gene_name == target_gene:
                results.append({
                    "variant": f"{variant['chr']}:{variant['pos']}",
                    "effect": gene.log2_fold_change,
                    "confidence": gene.confidence
                })
                break

    # Sort by effect size
    return sorted(results, key=lambda x: abs(x["effect"]), reverse=True)

Limitations

Technical Limitations

  • Species: Human only (GRCh38)
  • Variant Types: SNVs only (no indels/SVs)
  • Sequence Context: Requires reference match
  • Computation Time: 1-3 seconds per variant

Biological Limitations

  • Cell Type: Predictions are tissue-specific approximations
  • Environmental Factors: Does not account for conditions
  • Epistasis: Single variant effects only
  • Temporal: No developmental stage consideration

Error Handling

Common Errors

try:
    result = await alphagenome_predictor(...)
except AlphaGenomeError as e:
    if "API key" in str(e):
        # Handle missing/invalid key
        pass
    elif "Invalid sequence" in str(e):
        # Handle sequence errors
        pass
    elif "Rate limit" in str(e):
        # Handle rate limiting
        pass

Retry Logic

async def predict_with_retry(params, max_retries=3):
    """Retry on transient failures"""
    for attempt in range(max_retries):
        try:
            return await alphagenome_predictor(**params)
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            await asyncio.sleep(2 ** attempt)  # Exponential backoff

Performance Optimization

Batch Processing

async def batch_predict(variants, batch_size=10):
    """Process variants in batches"""
    results = []

    for i in range(0, len(variants), batch_size):
        batch = variants[i:i + batch_size]
        batch_results = await asyncio.gather(*[
            alphagenome_predictor(**v) for v in batch
        ])
        results.extend(batch_results)

        # Rate limiting
        if i + batch_size < len(variants):
            await asyncio.sleep(1)

    return results

Caching Strategy

from functools import lru_cache

@lru_cache(maxsize=1000)
def get_cached_prediction(chr, pos, ref, alt, interval):
    """Cache predictions for repeated queries"""
    return alphagenome_predictor(
        chromosome=chr,
        position=pos,
        reference=ref,
        alternate=alt,
        interval_size=interval
    )

Support Resources