Article¶
Use article commands for literature retrieval by disease, gene, drug, and identifier.
Typical article workflow¶
- search a topic,
- choose an identifier,
- retrieve default summary,
- request full text or annotations only when needed.
Search articles¶
By gene and disease:
By keyword:
Tune keyword-bearing relevance:
biomcp search article -k "Hirschsprung disease ganglion cells" --ranking-mode hybrid --weight-semantic 0.5 --weight-lexical 0.2 --limit 5
By date:
Exclude preprints when supported by source metadata:
Query formulation¶
Turn a natural-language literature question into two parts:
- Put a known gene, disease, or drug in
-g/--gene,-d/--disease, or--drug. - Put mechanisms, phenotypes, outcomes, datasets, and other free-text concepts in
-k/--keyword. - If the question is asking which gene, disease, or drug fits the evidence and you do not know the entity yet, do not guess a typed flag. Start with keyword-only article search or run
biomcp discover "<question>"first. - Use
--type reviewfor synthesis questions, list-style questions, and dataset surveys.
Known anchor only:
Known anchor plus mechanism or process:
Unknown-entity disease-identification question:
Known drug plus mechanism:
Dataset or method question:
Multi-source federation¶
Article search fans out to PubTator3, Europe PMC, and PubMed by default when
the filter set is compatible. Known gene, disease, and drug anchors
participate in that typed route. When a non-empty keyword is present, BioMCP
also adds LitSense2 to the federated route. Semantic Scholar can still join
the same query when the filter set is compatible. BioMCP merges duplicates
across PMID, PMCID, and DOI where possible. S2_API_KEY upgrades the Semantic
Scholar leg to authenticated requests at 1 req/sec; without it, BioMCP uses
the shared unauthenticated pool at 1 req/2sec. Search results are still
deduplicated by PMID when BioMCP can resolve one.
Default --sort relevance is mode-aware:
- Keyword-bearing queries default to
--ranking-mode hybrid, using0.4*semantic + 0.3*lexical + 0.2*citations + 0.1*positionwith the LitSense2-derived semantic signal. - Entity-only queries default to
--ranking-mode lexical, preserving the existing calibrated PubMed rescue plus lexical directness comparator. --ranking-mode semanticsorts the LitSense2-derived semantic signal first and falls back to the lexical comparator for deterministic ties.- Rows without LitSense2 provenance contribute
ranking.semantic_score = 0in semantic-aware ranking modes. --weight-semantic,--weight-lexical,--weight-citations, and--weight-positionretune the hybrid formula.
Markdown preserves the merged rank order, and JSON includes row-level
matched_sources, ranking, citation_count, and
influential_citation_count.
Use --source <all, pubtator, europepmc, pubmed, litsense2> to select one
backend or keep the default federated search.
Default article search excludes confirmed retractions unless you pass
--include-retracted. Sources that do not expose retraction metadata still
participate in the search, and JSON search rows keep the tri-state contract:
"is_retracted": true, false, or null.
--type, --open-access, and --no-preprints are backend-compatibility
constraints rather than universal filters across every article source.
--type on --source all uses Europe PMC + PubMed when --open-access and
--no-preprints are both absent. If you add --open-access or
--no-preprints, PubMed becomes ineligible and BioMCP surfaces the Europe
PMC-only note in markdown, JSON, and debug-plan output instead of silently
pretending the filter applies across every source.
To search a single backend:
biomcp search article -g BRAF --source pubtator --limit 5
biomcp search article -g BRAF --source europepmc --limit 5
biomcp search article -g BRAF --source pubmed --limit 5
Get an article¶
Supported IDs are PMID (digits only), PMCID (e.g., PMC9984800), and DOI
(e.g., 10.1056/NEJMoa1203421). Publisher PIIs (e.g., S1535610826000103) are not
indexed by PubMed or Europe PMC and cannot be resolved.
Default article output can include an optional Semantic Scholar section with
TLDR text, influence counts, and open-access PDF metadata when that paper
resolves in Semantic Scholar. S2_API_KEY makes those requests authenticated;
without it, BioMCP uses the shared pool. search article --source now supports
all, pubtator, europepmc, pubmed, and litsense2; Semantic Scholar
remains an automatic compatible leg rather than a directly selectable backend.
Request specific sections¶
Full text section:
This prints a local Saved to: path for cached full-text Markdown when PMC
full text is available.
Annotation section:
Semantic Scholar TLDR section:
Helper commands¶
biomcp article entities 22663011 # extract annotated entities via PubTator
biomcp article batch 22663011 24200969 # compact multi-article summary cards
biomcp article citations 22663011 --limit 3 # Semantic Scholar citation graph
biomcp article references 22663011 --limit 3 # Semantic Scholar reference graph
biomcp article recommendations 22663011 --limit 3 # Semantic Scholar related papers
article batch works without S2_API_KEY and echoes the original
requested_id together with resolved PMID/PMCID/DOI fields. When Semantic
Scholar data is available, the batch helper can add optional TLDR and citation
metadata. S2_API_KEY makes that enrichment authenticated and more reliable.
Use article batch as the default follow-up after search article when you
already have several shortlisted PMIDs or DOIs.
The Semantic Scholar graph helpers also work without S2_API_KEY, but they use
the shared pool and can fail fast on HTTP 429 with guidance to set the key for
a dedicated rate limit. Citations usually work broadly; references and
recommendations can be sparse or empty for paywalled papers because of
publisher elision in the Semantic Scholar graph.
Caching behavior¶
Downloaded content is stored in the BioMCP cache directory. This avoids repeated large payload downloads during iterative workflows.
JSON mode¶
biomcp --json get article 22663011
biomcp --json search article -g BRAF --limit 3
biomcp --json article batch 22663011 24200969
JSON article responses include _meta.next_commands and _meta.section_sources,
so article workflows can promote the next likely pivots and preserve section
provenance without scraping markdown. JSON search article responses also echo
query, sort, semantic_scholar_enabled, and row-level ranking/provenance
metadata. In relevance mode, ranking metadata now includes the effective mode
plus normalized lexical, citation, and position components; semantic-aware
rows expose ranking.semantic_score as the LitSense2-derived signal and use
0 when LitSense2 did not match. Hybrid rows also include the composite
score. JSON article batch responses are a bare array of compact cards so
callers can map results back to the original input order.
Practical tips¶
- Start with narrow
--limitvalues. - Add a disease term when gene-only search is too broad.
- Use section requests to avoid oversized responses.
- Use
biomcp get article <id> tldrwhen you want only the optional Semantic Scholar section.