sciencescraper.sciencedirect.scidir_scrape#

Functions for retrieving the raw text of ScienceDirect articles.

Module Contents#

Functions#

get_article_info(api_key[, doi, pii, url, chunk_size])

Get the full text of a ScienceDirect article using the ScienceDirect API.

get_full_text(api_key[, doi, pii, url, chunk_size])

Get the full text of a ScienceDirect article using the ScienceDirect API.

get_xml_doi(api_key, doi)

Get the raw XML text from an article using the ScienceDirect API and the article's DOI.

get_xml_pii(api_key, pii)

Get the raw XML text from an article using the ScienceDirect API and the article's PII.

get_xml_url(api_key, url)

Get the raw XML text from an article using the ScienceDirect API and the article's URL.

sciencescraper.sciencedirect.scidir_scrape.get_article_info(api_key, doi=None, pii=None, url=None, chunk_size=None)[source]#

Get the full text of a ScienceDirect article using the ScienceDirect API.

Parameters:
  • api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.

  • doi (str, optional) – The DOI of the article to be scraped.

  • pii (str, optional) – The PII of the article to be scraped.

  • url (str, optional) – The URL of the article to be scraped.

  • chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.

Returns:

A dictionary containing the title, authors, journal, year, URL, open access status, keywords, abstract, methods, results, discussion, and references of the article.

Return type:

dict

sciencescraper.sciencedirect.scidir_scrape.get_full_text(api_key, doi=None, pii=None, url=None, chunk_size=None)[source]#

Get the full text of a ScienceDirect article using the ScienceDirect API.

Parameters:
  • api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.

  • doi (str, optional) – The DOI of the article to be scraped.

  • pii (str, optional) – The PII of the article to be scraped.

  • url (str, optional) – The URL of the article to be scraped.

  • chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.

Returns:

The full text of the article.

Return type:

str

sciencescraper.sciencedirect.scidir_scrape.get_xml_doi(api_key, doi)[source]#

Get the raw XML text from an article using the ScienceDirect API and the article’s DOI.

Parameters:
  • api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.

  • doi (str) – The DOI of the article to be scraped.

Returns:

The raw XML text of the article.

Return type:

str

Raises:

requests.exceptions.HTTPError – If the request to the ScienceDirect API fails.

sciencescraper.sciencedirect.scidir_scrape.get_xml_pii(api_key, pii)[source]#

Get the raw XML text from an article using the ScienceDirect API and the article’s PII.

Parameters:
  • api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.

  • pii (str) – The PII of the article to be scraped.

Returns:

The raw XML text of the article.

Return type:

str

Raises:

requests.exceptions.HTTPError – If the request to the ScienceDirect API fails.

sciencescraper.sciencedirect.scidir_scrape.get_xml_url(api_key, url)[source]#

Get the raw XML text from an article using the ScienceDirect API and the article’s URL.

Parameters:
  • api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.

  • url (str) – The URL of the article to be scraped.

Returns:

The raw XML text of the article.

Return type:

str

Raises:

requests.exceptions.HTTPError – If the request to the ScienceDirect API fails.