sciencescraper.sciencedirect.scidir_extract#

Functions that extract information from the raw text of ScienceDirect articles.

Module Contents#

Functions#

get_title(xml_text)

Get the title of a ScienceDirect article from the article's raw XML text.

get_authors(xml_text)

Get the authors of a ScienceDirect article from the article's raw XML text.

get_journal(xml_text)

Get the journal of a ScienceDirect article from the article's raw XML text.

get_publisher(xml_text)

Get the publisher of a ScienceDirect article from the article's raw XML text.

get_article_type(xml_text)

Get the article type of a ScienceDirect article from the article's raw XML text.

get_date(xml_text)

Get the date of a ScienceDirect article from the article's raw XML text.

get_url(xml_text)

Get the URL of a ScienceDirect article from the article's raw XML text.

get_doi(xml_text)

Get the DOI of a ScienceDirect article from the article's raw XML text.

get_pii(xml_text)

Get the PII of a ScienceDirect article from the article's raw XML text.

get_open_access(xml_text)

Get the open access status of a ScienceDirect article from the article's raw XML text.

get_keywords(xml_text)

Get the keywords of a ScienceDirect article from the article's raw XML text.

get_abstract(xml_text)

Get the abstract of a ScienceDirect article from the article's raw XML text.

get_methods(xml_text)

Get the methods section of a ScienceDirect article from the article's raw XML text.

get_results(xml_text)

Get the results section of a ScienceDirect article from the article's raw XML text.

get_discussion(xml_text)

Get the discussion section of a ScienceDirect article from the article's raw XML text.

get_references(xml_text)

Get the references section of a ScienceDirect article from the article's raw XML text.

sciencescraper.sciencedirect.scidir_extract.get_title(xml_text)[source]#

Get the title of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The title of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_authors(xml_text)[source]#

Get the authors of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The authors of the article in the format [First Name Last Name].

Return type:

list

sciencescraper.sciencedirect.scidir_extract.get_journal(xml_text)[source]#

Get the journal of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The journal of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_publisher(xml_text)[source]#

Get the publisher of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The publisher of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_article_type(xml_text)[source]#

Get the article type of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The article type of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_date(xml_text)[source]#

Get the date of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The date of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_url(xml_text)[source]#

Get the URL of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The URL of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_doi(xml_text)[source]#

Get the DOI of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The DOI of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_pii(xml_text)[source]#

Get the PII of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The PII of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_open_access(xml_text)[source]#

Get the open access status of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The open access status of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_keywords(xml_text)[source]#

Get the keywords of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

A list of the keywords of the article.

Return type:

list

sciencescraper.sciencedirect.scidir_extract.get_abstract(xml_text)[source]#

Get the abstract of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The abstract of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_methods(xml_text)[source]#

Get the methods section of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The methods section of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_results(xml_text)[source]#

Get the results section of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The results section of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_discussion(xml_text)[source]#

Get the discussion section of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The discussion section of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_extract.get_references(xml_text)[source]#

Get the references section of a ScienceDirect article from the article’s raw XML text.

Parameters:

xml_text (str) – The raw XML text of an article.

Returns:

The title of the references used in the references section of the article.

Return type:

str