`sciencescraper.pmc.pmc_scrape`#

Functions for retrieving the raw text of PubMed Central articles.

Module Contents#

`fetch_pmc_article`(pmc_id)	Fetches an article from PMC given a PMC ID
`parse_pmc_article`(pmc_article, chunk_size)	Parses an article from PMC
`get_article_info`(pmc_id[, chunk_size])	Fetches and parses an article from PMC given a PMC ID
`get_full_text`(pmc_id[, chunk_size])	Fetches the full text of an article from PMC given a PMC ID

sciencescraper.pmc.pmc_scrape.fetch_pmc_article(pmc_id)[source]#

Fetches an article from PMC given a PMC ID

sciencescraper.pmc.pmc_scrape.parse_pmc_article(pmc_article, chunk_size)[source]#

Parses an article from PMC

Parameters:

Returns:

article – The parsed article

Return type:

dict

sciencescraper.pmc.pmc_scrape.get_article_info(pmc_id, chunk_size=None)[source]#

Fetches and parses an article from PMC given a PMC ID

Parameters:

pmc_id (str) – The PMC ID of the article
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.

Returns:

article – The parsed article

Return type:

dict

sciencescraper.pmc.pmc_scrape.get_full_text(pmc_id, chunk_size=None)[source]#

Fetches the full text of an article from PMC given a PMC ID

Parameters:

pmc_id (str) – The PMC ID of the article
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.

Returns:

full_text – The full text of the article

Return type:

str