`sciencescraper.sciencedirect.scidir_scrape`#

Functions for retrieving the raw text of ScienceDirect articles.

Module Contents#

Functions#

`get_article_info`(api_key[, doi, pii, url, chunk_size])	Get the full text of a ScienceDirect article using the ScienceDirect API.
`get_full_text`(api_key[, doi, pii, url, chunk_size])	Get the full text of a ScienceDirect article using the ScienceDirect API.
`get_xml_doi`(api_key, doi)	Get the raw XML text from an article using the ScienceDirect API and the article's DOI.
`get_xml_pii`(api_key, pii)	Get the raw XML text from an article using the ScienceDirect API and the article's PII.
`get_xml_url`(api_key, url)	Get the raw XML text from an article using the ScienceDirect API and the article's URL.

sciencescraper.sciencedirect.scidir_scrape.get_article_info(api_key, doi=None, pii=None, url=None, chunk_size=None)[source]#

Get the full text of a ScienceDirect article using the ScienceDirect API.

Parameters:

api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
doi (str, optional) – The DOI of the article to be scraped.
pii (str, optional) – The PII of the article to be scraped.
url (str, optional) – The URL of the article to be scraped.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.

Returns:

A dictionary containing the title, authors, journal, year, URL, open access status, keywords, abstract, methods, results, discussion, and references of the article.

Return type:

dict

sciencescraper.sciencedirect.scidir_scrape.get_full_text(api_key, doi=None, pii=None, url=None, chunk_size=None)[source]#

Get the full text of a ScienceDirect article using the ScienceDirect API.

Parameters:

api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
doi (str, optional) – The DOI of the article to be scraped.
pii (str, optional) – The PII of the article to be scraped.
url (str, optional) – The URL of the article to be scraped.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.

Returns:

The full text of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_scrape.get_xml_doi(api_key, doi)[source]#

Get the raw XML text from an article using the ScienceDirect API and the article’s DOI.

Parameters:

api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
doi (str) – The DOI of the article to be scraped.

Returns:

The raw XML text of the article.

Return type:

str

Raises:

requests.exceptions.HTTPError – If the request to the ScienceDirect API fails.

sciencescraper.sciencedirect.scidir_scrape.get_xml_pii(api_key, pii)[source]#

Get the raw XML text from an article using the ScienceDirect API and the article’s PII.

Parameters:

api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
pii (str) – The PII of the article to be scraped.

Returns:

The raw XML text of the article.

Return type:

str

Raises:

requests.exceptions.HTTPError – If the request to the ScienceDirect API fails.

sciencescraper.sciencedirect.scidir_scrape.get_xml_url(api_key, url)[source]#

Get the raw XML text from an article using the ScienceDirect API and the article’s URL.

Parameters:

api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
url (str) – The URL of the article to be scraped.

Returns:

The raw XML text of the article.

Return type:

str

Raises:

requests.exceptions.HTTPError – If the request to the ScienceDirect API fails.

sciencescraper.sciencedirect.scidir_scrape#

Module Contents#

Functions#

`sciencescraper.sciencedirect.scidir_scrape`#