Article¶
Use article commands for literature retrieval by disease, gene, drug, and identifier.
Typical article workflow¶
- search a topic,
- choose an identifier,
- retrieve default summary,
- request full text or annotations only when needed.
Search articles¶
By gene and disease:
By keyword:
Tune keyword-bearing relevance:
biomcp search article -k "Hirschsprung disease ganglion cells" --ranking-mode hybrid --weight-semantic 0.5 --weight-lexical 0.2 --limit 5
By date:
By year range:
Exclude preprints when supported by source metadata:
Query formulation¶
Turn a natural-language literature question into two parts:
- Put a known gene, disease, or drug in
-g/--gene,-d/--disease, or--drug. - Put mechanisms, phenotypes, outcomes, datasets, and other free-text concepts in
-k/--keyword. - If the question is asking which gene, disease, or drug fits the evidence and you do not know the entity yet, do not guess a typed flag. Start with keyword-only article search or run
biomcp discover "<question>"first. - Question-format article terms are acceptable: PubMed ESearch cleans bounded filler words from unfielded gene, disease, drug, and keyword terms provider-locally, while query echoes and non-PubMed sources keep the original wording.
- Use
--type reviewfor synthesis questions, list-style questions, and dataset surveys.
Keyword-only searches can also return exact entity suggestions. When the whole
keyword exactly matches a gene, drug, or disease vocabulary label or alias,
BioMCP can add a typed get gene, get drug, or get disease follow-up in
See also, _meta.next_commands, and JSON _meta.suggestions[]. The
structured suggestion object includes command, reason, and sections.
Multi-concept phrases such as BRAF V600E or lung cancer immunotherapy do
not get direct entity suggestions, and searches that already use -g, -d,
or --drug suppress the exact suggestion.
For agent loops, --session <token> lets JSON article search compare the
current keyword with the previous successful article keyword search for the
same local token. The token is not a secret; use a short non-identifying label
such as lit-review-1. When post-stopword term overlap is at least 60%,
BioMCP can add JSON-only _meta.suggestions[] fallbacks after exact entity
suggestions: prior article batch, discover, and a date-narrowed retry when
available. Session baselines expire after 10 minutes. Markdown output is
unchanged.
Known anchor only:
Known anchor plus mechanism or process:
Unknown-entity disease-identification question:
Known drug plus mechanism:
Dataset or method question:
Multi-source federation¶
Article search fans out to PubTator3, Europe PMC, and PubMed by default when
the filter set is compatible. Known gene, disease, and drug anchors
participate in that typed route. When a non-empty keyword is present, BioMCP
also adds LitSense2 to the federated route. Semantic Scholar can still join
the same query when the filter set is compatible. BioMCP merges duplicates
across PMID, PMCID, and DOI where possible. S2_API_KEY upgrades the Semantic
Scholar leg to authenticated requests at 1 req/sec; without it, BioMCP uses
the shared unauthenticated pool at 1 req/2sec. Search results are still
deduplicated by PMID when BioMCP can resolve one.
Default --sort relevance is mode-aware:
- Keyword-bearing queries default to
--ranking-mode hybrid, using0.4*semantic + 0.3*lexical + 0.2*citations + 0.1*positionwith the LitSense2-derived semantic signal. - Entity-only queries default to
--ranking-mode lexical, preserving the existing calibrated PubMed rescue plus lexical directness comparator. --ranking-mode semanticsorts the LitSense2-derived semantic signal first and falls back to the lexical comparator for deterministic ties.- Rows without LitSense2 provenance contribute
ranking.semantic_score = 0in semantic-aware ranking modes. --weight-semantic,--weight-lexical,--weight-citations, and--weight-positionretune the hybrid formula.
Markdown preserves the merged rank order, and JSON includes row-level
matched_sources, ranking, citation_count, and
influential_citation_count.
Use --source <all, pubtator, europepmc, pubmed, litsense2> to select one
backend or keep the default federated search.
BioMCP caps each federated source's contribution after deduplication and before
ranking. Default: 40% of --limit on federated pools with at least three
surviving primary sources. Rows count against their primary source after
deduplication. Use --max-per-source <N> to override that cap, use
--max-per-source 0 for the default cap explicitly, and set it equal to
--limit to disable capping.
Default article search excludes confirmed retractions unless you pass
--include-retracted. Sources that do not expose retraction metadata still
participate in the search, and JSON search rows keep the tri-state contract:
"is_retracted": true, false, or null.
--type, --open-access, and --no-preprints are backend-compatibility
constraints rather than universal filters across every article source.
--type on --source all uses Europe PMC + PubMed when --open-access and
--no-preprints are both absent. If you add --open-access or
--no-preprints, PubMed becomes ineligible and BioMCP surfaces the Europe
PMC-only note in markdown, JSON, and debug-plan output instead of silently
pretending the filter applies across every source.
To search a single backend:
biomcp search article -g BRAF --source pubtator --limit 5
biomcp search article -g BRAF --source europepmc --limit 5
biomcp search article -g BRAF --source pubmed --limit 5
To force a tighter federated balance:
Get an article¶
Supported IDs are PMID (digits only), PMCID (e.g., PMC9984800), and DOI
(e.g., 10.1056/NEJMoa1203421). Publisher PIIs (e.g., S1535610826000103) are not
indexed by PubMed or Europe PMC and cannot be resolved.
Default article output can include an optional Semantic Scholar section with
TLDR text, influence counts, and open-access PDF metadata when that paper
resolves in Semantic Scholar. S2_API_KEY makes those requests authenticated;
without it, BioMCP uses the shared pool. search article --source now supports
all, pubtator, europepmc, pubmed, and litsense2; Semantic Scholar
remains an automatic compatible leg rather than a directly selectable backend.
Request specific sections¶
Full text section:
This uses the default article full-text ladder: XML first, then PMC HTML when
the XML path misses for a PMCID-backed article. It never falls back to PDF.
When full text resolves, BioMCP prints a local Saved to: path for cached
Markdown and surfaces the winning source label (Europe PMC XML, PMC HTML,
etc.) in markdown and JSON provenance.
Opt in to the final PDF rung only when you want the last-resort open-access PDF path after XML and PMC HTML both miss:
With --pdf, BioMCP can use the Semantic Scholar open-access PDF URL as the
final fallback and labels the winner as Semantic Scholar PDF. --pdf is only
valid with the fulltext section; biomcp get article 22663011 --pdf is
rejected instead of silently doing nothing.
Annotation section:
Semantic Scholar TLDR section:
Helper commands¶
biomcp article entities 22663011 # extract annotated entities via PubTator
biomcp article batch 22663011 24200969 # compact multi-article summary cards
biomcp article citations 22663011 --limit 3 # Semantic Scholar citation graph
biomcp article references 22663011 --limit 3 # Semantic Scholar reference graph
biomcp article recommendations 22663011 --limit 3 # Semantic Scholar related papers
article batch works without S2_API_KEY and echoes the original
requested_id together with resolved PMID/PMCID/DOI fields. When Semantic
Scholar data is available, the batch helper can add optional TLDR and citation
metadata. S2_API_KEY makes that enrichment authenticated and more reliable.
Use article batch as the default follow-up after search article when you
already have several shortlisted PMIDs or DOIs.
The Semantic Scholar graph helpers also work without S2_API_KEY, but they use
the shared pool and can fail fast on HTTP 429 with guidance to set the key for
a dedicated rate limit. Citations usually work broadly; references and
recommendations can be sparse or empty for paywalled papers because of
publisher elision in the Semantic Scholar graph.
Caching behavior¶
Downloaded content is stored in the BioMCP cache directory. This avoids repeated large payload downloads during iterative workflows.
JSON mode¶
biomcp --json get article 22663011
biomcp --json search article -g BRAF --limit 3
biomcp --json search article -k "Oncotype DX review" --session lit-review-1 --limit 5
biomcp --json article batch 22663011 24200969
JSON article responses include _meta.next_commands and _meta.section_sources,
so article workflows can promote the next likely pivots and preserve section
provenance without scraping markdown. JSON search article responses also echo
query, sort, semantic_scholar_enabled, and row-level ranking/provenance
metadata. In relevance mode, ranking metadata now includes the effective mode
plus normalized lexical, citation, and position components; semantic-aware
rows expose ranking.semantic_score as the LitSense2-derived signal and use
0 when LitSense2 did not match. Hybrid rows also include the composite
score. Keyword-only article searches with an exact gene, drug, or disease
label/alias match may include _meta.suggestions[] objects with command,
reason, and sections; same-session keyword loop-breaker suggestions include
command and reason and omit sections. _meta.next_commands remains the
executable string command list. JSON article batch responses are a bare array of compact cards
so callers can map results back to the original input order.
Practical tips¶
- Start with narrow
--limitvalues. - Add a disease term when gene-only search is too broad.
- Use section requests to avoid oversized responses.
- Use
biomcp get article <id> tldrwhen you want only the optional Semantic Scholar section.