Article¶

Use article commands for literature retrieval by disease, gene, drug, and identifier.

Typical article workflow¶

search a topic,
choose an identifier,
retrieve default summary,
request full text or annotations only when needed.

Search articles¶

By gene and disease:

biomcp search article -g BRAF -d melanoma --limit 5

By keyword:

biomcp search article -k "immunotherapy resistance" --limit 5

Tune keyword-bearing relevance:

biomcp search article -k "Hirschsprung disease ganglion cells" --ranking-mode hybrid --weight-semantic 0.5 --weight-lexical 0.2 --limit 5

By date:

biomcp search article -g BRAF --since 2024-01-01 --limit 5

By year range:

biomcp search article -k "BRAF melanoma" --year-min 2000 --year-max 2013 --limit 5

Exclude preprints when supported by source metadata:

biomcp search article -g BRAF --since 2024-01-01 --no-preprints --limit 5

Query formulation¶

Turn a natural-language literature question into two parts:

Put a known gene, disease, or drug in -g/--gene, -d/--disease, or --drug.
Put mechanisms, phenotypes, outcomes, datasets, and other free-text concepts in -k/--keyword.
If the question is asking which gene, disease, or drug fits the evidence and you do not know the entity yet, do not guess a typed flag. Start with keyword-only article search or run biomcp discover "<question>" first.
Question-format article terms are acceptable: PubMed ESearch cleans bounded filler words from unfielded gene, disease, drug, and keyword terms provider-locally, while query echoes and non-PubMed sources keep the original wording.
Use --type review for synthesis questions, list-style questions, and dataset surveys.

Keyword-only searches can also return exact entity suggestions. When the whole keyword exactly matches a gene, drug, or disease vocabulary label or alias, BioMCP can add a typed get gene, get drug, or get disease follow-up in See also, _meta.next_commands, and JSON _meta.suggestions[]. The structured suggestion object includes command, reason, and sections. Multi-concept phrases such as BRAF V600E or lung cancer immunotherapy do not get direct entity suggestions, and searches that already use -g, -d, or --drug suppress the exact suggestion.

For agent loops, --session <token> lets JSON article search compare the current keyword with the previous successful article keyword search for the same local token. The token is not a secret; use a short non-identifying label such as lit-review-1. When post-stopword term overlap is at least 60%, BioMCP can add JSON-only _meta.suggestions[] fallbacks after exact entity suggestions: prior article batch, discover, and a date-narrowed retry when available. Session baselines expire after 10 minutes. Markdown output is unchanged.

Known anchor only:

biomcp search article -g BRAF --limit 5

Known anchor plus mechanism or process:

biomcp search article -g TP53 -k "apoptosis gene regulation" --limit 5

Unknown-entity disease-identification question:

biomcp search article -k '"cafe-au-lait spots" neurofibromas disease' --type review --limit 5

Known drug plus mechanism:

biomcp search article --drug amiodarone -k "photosensitivity mechanism" --limit 5

Dataset or method question:

biomcp search article -k "TCGA mutation analysis dataset" --type review --limit 5

Multi-source federation¶

Article search fans out to PubTator3, Europe PMC, and PubMed by default when the filter set is compatible. Known gene, disease, and drug anchors participate in that typed route. When a non-empty keyword is present, BioMCP also adds LitSense2 to the federated route. Semantic Scholar can still join the same query when the filter set is compatible. BioMCP merges duplicates across PMID, PMCID, and DOI where possible. S2_API_KEY upgrades the Semantic Scholar leg to authenticated requests at 1 req/sec; without it, BioMCP uses the shared unauthenticated pool at 1 req/2sec. Search results are still deduplicated by PMID when BioMCP can resolve one.

Default --sort relevance is mode-aware:

Keyword-bearing queries default to --ranking-mode hybrid, using 0.4*semantic + 0.3*lexical + 0.2*citations + 0.1*position with the LitSense2-derived semantic signal.
Entity-only queries default to --ranking-mode lexical, preserving the existing calibrated PubMed rescue plus lexical directness comparator.
--ranking-mode semantic sorts the LitSense2-derived semantic signal first and falls back to the lexical comparator for deterministic ties.
Rows without LitSense2 provenance contribute ranking.semantic_score = 0 in semantic-aware ranking modes.
--weight-semantic, --weight-lexical, --weight-citations, and --weight-position retune the hybrid formula.

Markdown preserves the merged rank order, and JSON includes row-level matched_sources, ranking, citation_count, and influential_citation_count.

Use --source <all, pubtator, europepmc, pubmed, litsense2> to select one backend or keep the default federated search. BioMCP caps each federated source's contribution after deduplication and before ranking. Default: 40% of --limit on federated pools with at least three surviving primary sources. Rows count against their primary source after deduplication. Use --max-per-source <N> to override that cap, use --max-per-source 0 for the default cap explicitly, and set it equal to --limit to disable capping. Default article search excludes confirmed retractions unless you pass --include-retracted. Sources that do not expose retraction metadata still participate in the search, and JSON search rows keep the tri-state contract: "is_retracted": true, false, or null. --type, --open-access, and --no-preprints are backend-compatibility constraints rather than universal filters across every article source. --type on --source all uses Europe PMC + PubMed when --open-access and --no-preprints are both absent. If you add --open-access or --no-preprints, PubMed becomes ineligible and BioMCP surfaces the Europe PMC-only note in markdown, JSON, and debug-plan output instead of silently pretending the filter applies across every source.

To search a single backend:

biomcp search article -g BRAF --source pubtator --limit 5
biomcp search article -g BRAF --source europepmc --limit 5
biomcp search article -g BRAF --source pubmed --limit 5

To force a tighter federated balance:

biomcp search article -k "Kartagener syndrome ciliopathy" --limit 50 --max-per-source 10

Get an article¶

Supported IDs are PMID (digits only), PMCID (e.g., PMC9984800), and DOI (e.g., 10.1056/NEJMoa1203421). Publisher PIIs (e.g., S1535610826000103) are not indexed by PubMed or Europe PMC and cannot be resolved.

biomcp get article 22663011

Default article output can include an optional Semantic Scholar section with TLDR text, influence counts, and open-access PDF metadata when that paper resolves in Semantic Scholar. S2_API_KEY makes those requests authenticated; without it, BioMCP uses the shared pool. search article --source now supports all, pubtator, europepmc, pubmed, and litsense2; Semantic Scholar remains an automatic compatible leg rather than a directly selectable backend.

Request specific sections¶

Full text section:

biomcp get article 22663011 fulltext

This uses the default article full-text ladder: XML first, then PMC HTML when the XML path misses for a PMCID-backed article. It never falls back to PDF. When full text resolves, BioMCP prints a local Saved to: path for cached Markdown and surfaces the winning source label (Europe PMC XML, PMC HTML, etc.) in markdown and JSON provenance.

Opt in to the final PDF rung only when you want the last-resort open-access PDF path after XML and PMC HTML both miss:

biomcp get article 22663011 fulltext --pdf

With --pdf, BioMCP can use the Semantic Scholar open-access PDF URL as the final fallback and labels the winner as Semantic Scholar PDF. --pdf is only valid with the fulltext section; biomcp get article 22663011 --pdf is rejected instead of silently doing nothing.

Annotation section:

biomcp get article 22663011 annotations

Semantic Scholar TLDR section:

biomcp get article 22663011 tldr

Helper commands¶

biomcp article entities 22663011   # extract annotated entities via PubTator
biomcp article batch 22663011 24200969          # compact multi-article summary cards
biomcp article citations 22663011 --limit 3         # Semantic Scholar citation graph
biomcp article references 22663011 --limit 3        # Semantic Scholar reference graph
biomcp article recommendations 22663011 --limit 3   # Semantic Scholar related papers

article batch works without S2_API_KEY and echoes the original requested_id together with resolved PMID/PMCID/DOI fields. When Semantic Scholar data is available, the batch helper can add optional TLDR and citation metadata. S2_API_KEY makes that enrichment authenticated and more reliable. Use article batch as the default follow-up after search article when you already have several shortlisted PMIDs or DOIs.

The Semantic Scholar graph helpers also work without S2_API_KEY, but they use the shared pool and can fail fast on HTTP 429 with guidance to set the key for a dedicated rate limit. Citations usually work broadly; references and recommendations can be sparse or empty for paywalled papers because of publisher elision in the Semantic Scholar graph.

Caching behavior¶

Downloaded content is stored in the BioMCP cache directory. This avoids repeated large payload downloads during iterative workflows.

JSON mode¶

biomcp --json get article 22663011
biomcp --json search article -g BRAF --limit 3
biomcp --json search article -k "Oncotype DX review" --session lit-review-1 --limit 5
biomcp --json article batch 22663011 24200969

JSON article responses include _meta.next_commands and _meta.section_sources, so article workflows can promote the next likely pivots and preserve section provenance without scraping markdown. JSON search article responses also echo query, sort, semantic_scholar_enabled, and row-level ranking/provenance metadata. In relevance mode, ranking metadata now includes the effective mode plus normalized lexical, citation, and position components; semantic-aware rows expose ranking.semantic_score as the LitSense2-derived signal and use 0 when LitSense2 did not match. Hybrid rows also include the composite score. Keyword-only article searches with an exact gene, drug, or disease label/alias match may include _meta.suggestions[] objects with command, reason, and sections; same-session keyword loop-breaker suggestions include command and reason and omit sections. _meta.next_commands remains the executable string command list. JSON article batch responses are a bare array of compact cards so callers can map results back to the original input order.

Practical tips¶

Start with narrow --limit values.
Add a disease term when gene-only search is too broad.
Use section requests to avoid oversized responses.
Use biomcp get article <id> tldr when you want only the optional Semantic Scholar section.