`sciencescraper.sciencedirect`#

This subpackage contains modules for scraping articles from ScienceDirect.

Submodules#

scidir_extract: Functions to extract information from the raw XML text of a ScienceDirect article.
scidir_clean: Functions to clean the text extracted from ScienceDirect articles.
scidir_scrape: Functions for retrieving the clean text of ScienceDirect articles.

The main function of this subpackage is get_article_info in scidir_scrape. This function retrieves the full text of a ScienceDirect article using the ScienceDirect API and returns a dictionary containing the title, authors, journal, year, URL, open access status, keywords, abstract, methods, results, discussion, and references of the article. search_scidir in scidir_search can be used to search for articles on ScienceDirect given a query string and a start date to search from.

Submodules#

Package Contents#

Functions#

`get_article_info`(api_key[, doi, pii, url, chunk_size])	Get the full text of a ScienceDirect article using the ScienceDirect API.
`get_full_text`(api_key[, doi, pii, url, chunk_size])	Get the full text of a ScienceDirect article using the ScienceDirect API.
`check_new_articles`(api_key, query, days)	Check for new articles in Elsevier's ScienceDirect database and notify the user of any new articles.
`search_scidir`(api_key, query[, sortBy, startDate, ...])	Get articles from Elsevier's ScienceDirect database that are relevant to a specified search query.

sciencescraper.sciencedirect.get_article_info(api_key, doi=None, pii=None, url=None, chunk_size=None)[source]#

Get the full text of a ScienceDirect article using the ScienceDirect API.

Parameters:

api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
doi (str, optional) – The DOI of the article to be scraped.
pii (str, optional) – The PII of the article to be scraped.
url (str, optional) – The URL of the article to be scraped.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.

Returns:

A dictionary containing the title, authors, journal, year, URL, open access status, keywords, abstract, methods, results, discussion, and references of the article.

Return type:

dict

sciencescraper.sciencedirect.get_full_text(api_key, doi=None, pii=None, url=None, chunk_size=None)[source]#

Get the full text of a ScienceDirect article using the ScienceDirect API.

Parameters:

api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
doi (str, optional) – The DOI of the article to be scraped.
pii (str, optional) – The PII of the article to be scraped.
url (str, optional) – The URL of the article to be scraped.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is None.

Returns:

The full text of the article.

Return type:

str

sciencescraper.sciencedirect.check_new_articles(api_key, query, days)[source]#

Check for new articles in Elsevier’s ScienceDirect database and notify the user of any new articles.

Parameters:

api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
query (str) – The search query to be used to search for new articles.
days (int) – The number of days to search for new articles.

Returns:

A list of dictionaries containing the title, authors, journal, year, URL, open access status, keywords, abstract, methods, results, discussion, and references of the new articles.

Return type:

list of dict

sciencescraper.sciencedirect.search_scidir(api_key, query, sortBy='relevance', startDate=None, max_results=25, offset=0)[source]#

Get articles from Elsevier’s ScienceDirect database that are relevant to a specified search query.

Parameters:

api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
query (str) – The search query to be used to search for articles.
sortBy (str, optional) – The sorting order for the search results. Options are: - “relevance”: Sort by relevance - “date”: Sort by date Default is “relevance”.
startDate (str, optional) – The start date for the search query in the format ‘YYYY-MM-DD’.
max_results (int, optional) – The maximum number of results to return. Default is 25. Permitted values: 10, 25, 50, 100.
offset (int, optional) – The number of results to skip. Default is 0.

Return type:

list of DOIs of the articles

sciencescraper.sciencedirect#

Submodules#

Submodules#

Package Contents#

Functions#

`sciencescraper.sciencedirect`#