sciencescraper.pmc.pmc_extract#

Functions that extract information from the raw text of PubMed Central articles.

Module Contents#

Functions#

get_title(pmc_article)

Returns the title of the article

get_authors(pmc_article)

Returns the authors of the article

get_journal(pmc_article)

Returns the journal of the article

get_publisher(pmc_article)

Returns the publisher of the article

get_article_type(pmc_article)

Returns the article type of the article

get_doi(pmc_article)

Returns the DOI of the article

get_pmc_id(pmc_article)

Returns the PMC ID of the article

get_date(pmc_article)

Returns the date of the article

get_url(pmc_article)

Returns the URL of the article

get_keywords(pmc_article)

Returns the keywords of the article

get_abstract(pmc_article)

Returns the abstract of the article

get_intro(pmc_article)

Returns the introduction section of the article

get_methods(pmc_article)

Returns the methods section of the article

get_discussion(pmc_article)

Returns the discussion section of the article

clean_references(pmc_article)

Removes the reference numbers from the article which can clutter the text in terms of

sciencescraper.pmc.pmc_extract.get_title(pmc_article)[source]#

Returns the title of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

title – The title of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_authors(pmc_article)[source]#

Returns the authors of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

authors – The authors of the article

Return type:

list

sciencescraper.pmc.pmc_extract.get_journal(pmc_article)[source]#

Returns the journal of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

journal – The journal of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_publisher(pmc_article)[source]#

Returns the publisher of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

publisher – The publisher of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_article_type(pmc_article)[source]#

Returns the article type of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

article_type – The article type of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_doi(pmc_article)[source]#

Returns the DOI of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

doi – The DOI of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_pmc_id(pmc_article)[source]#

Returns the PMC ID of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

pmc_id – The PMC ID of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_date(pmc_article)[source]#

Returns the date of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

date – The date of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_url(pmc_article)[source]#

Returns the URL of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

url – The URL of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_keywords(pmc_article)[source]#

Returns the keywords of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

keywords – The keywords of the article

Return type:

list

sciencescraper.pmc.pmc_extract.get_abstract(pmc_article)[source]#

Returns the abstract of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

abstract – The abstract of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_intro(pmc_article)[source]#

Returns the introduction section of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

intro – The introduction section of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_methods(pmc_article)[source]#

Returns the methods section of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

methods – The methods section of the article

Return type:

str

sciencescraper.pmc.pmc_extract.get_discussion(pmc_article)[source]#

Returns the discussion section of the article

Parameters:

pmc_article (BeautifulSoup) – The article as a BeautifulSoup object

Returns:

discussion – The discussion section of the article

Return type:

str

sciencescraper.pmc.pmc_extract.clean_references(pmc_article)[source]#

Removes the reference numbers from the article which can clutter the text in terms of readability and analysis by a machine learning model.