sciencescraper.pmc.pmc_scrape
#
Functions for retrieving the raw text of PubMed Central articles.
Module Contents#
Functions#
|
Fetches an article from PMC given a PMC ID |
|
Parses an article from PMC |
|
Fetches and parses an article from PMC given a PMC ID |
|
Fetches the full text of an article from PMC given a PMC ID |
- sciencescraper.pmc.pmc_scrape.fetch_pmc_article(pmc_id)[source]#
Fetches an article from PMC given a PMC ID
- Parameters:
pmc_id (str) – The PMC ID of the article
- Returns:
soup – The article as a BeautifulSoup object
- Return type:
BeautifulSoup
- sciencescraper.pmc.pmc_scrape.parse_pmc_article(pmc_article, chunk_size)[source]#
Parses an article from PMC
- Parameters:
pmc_article (BeautifulSoup) – The article as a BeautifulSoup object
chunk_size (int) – The size of the chunks to split the full text into
- Returns:
article – The parsed article
- Return type:
dict
- sciencescraper.pmc.pmc_scrape.get_article_info(pmc_id, chunk_size=None)[source]#
Fetches and parses an article from PMC given a PMC ID
- Parameters:
pmc_id (str) – The PMC ID of the article
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.
- Returns:
article – The parsed article
- Return type:
dict
- sciencescraper.pmc.pmc_scrape.get_full_text(pmc_id, chunk_size=None)[source]#
Fetches the full text of an article from PMC given a PMC ID
- Parameters:
pmc_id (str) – The PMC ID of the article
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.
- Returns:
full_text – The full text of the article
- Return type:
str