Benchmarks¶

This page defines practical benchmark patterns for BioMCP.

The objective is reproducible command behavior, not synthetic leaderboard numbers.

For BioASQ's public-corpus ingestion and official Task B lane documentation, see BioASQ Benchmark.

Benchmark goals¶

Use a fixed baseline so runs are comparable over time.

biomcp get gene BRAF
biomcp get variant "BRAF V600E"
biomcp get trial NCT02576665
biomcp search article -g BRAF --limit 5

Use repeated runs and report median + spread.

Example with hyperfine:

hyperfine -m 10 'biomcp get gene BRAF' 'biomcp search trial -c melanoma --limit 5'

Recommendations:

Track markdown and JSON sizes independently.

biomcp get gene BRAF | wc -c
biomcp --json get gene BRAF | wc -c

Why this matters:

Compare first call vs second call for cache-eligible endpoints.

biomcp get article 22663011
biomcp get article 22663011

Capture timing for both runs to verify expected improvement.

Invalid dates should fail before network calls.

Examples:

biomcp search article -g BRAF --since 2024-13-01 --limit 1
biomcp search article -g BRAF --since 2024-02-30 --limit 1

Expected behavior:

Downloaded artifacts should use platform cache paths.

Use command help and health output to validate paths in your environment.

When sharing benchmark results, include: