Skip to content

Benchmarks

This page defines practical benchmark patterns for BioMCP.

The objective is reproducible command behavior, not synthetic leaderboard numbers.

Benchmark goals

  • Compare latency across entities.
  • Track output-size drift.
  • Validate cache behavior.
  • Catch regressions in command startup overhead.

Baseline command set

Use a fixed baseline so runs are comparable over time.

biomcp get gene BRAF
biomcp get variant "BRAF V600E"
biomcp get trial NCT02576665
biomcp search article -g BRAF --limit 5

Latency measurement

Use repeated runs and report median + spread.

Example with hyperfine:

hyperfine -m 10 'biomcp get gene BRAF' 'biomcp search trial -c melanoma --limit 5'

Recommendations:

  • run on stable network,
  • avoid mixed-background workloads,
  • capture raw command outputs for auditing.

Output-size tracking

Track markdown and JSON sizes independently.

biomcp get gene BRAF | wc -c
biomcp --json get gene BRAF | wc -c

Why this matters:

  • markdown size impacts prompt context cost,
  • JSON size affects API and queue throughput,
  • sudden growth can signal response-shape regressions.

Cache-behavior checks

Compare first call vs second call for cache-eligible endpoints.

biomcp get article 22663011
biomcp get article 22663011

Capture timing for both runs to verify expected improvement.

Date-validation contract checks

Invalid dates should fail before network calls.

Examples:

biomcp search article -g BRAF --since 2024-13-01 --limit 1
biomcp search article -g BRAF --since 2024-02-30 --limit 1

Expected behavior:

  • immediate validation error,
  • no long wait for upstream API timeouts.

Cache-directory contract checks

Downloaded artifacts should use platform cache paths.

Use command help and health output to validate paths in your environment.

Reporting template

When sharing benchmark results, include:

  • command list,
  • machine details,
  • network assumptions,
  • cache warm/cold status,
  • median and p95 latency,
  • output byte counts.

Caveats

  • Public APIs can throttle unpredictably.
  • Upstream schema changes can affect command duration.
  • Geographic distance to API hosts influences baseline latency.
  • Run quick checks on every release candidate.
  • Run fuller benchmark suites before major parser or rendering changes.