Getting Started#
This page details how to get started with ScienceScraper. ScienceScraper is a Python package that allows you to search for and scrape scientific articles from ScienceDirect and PMC.
Installation#
To install ScienceScraper, you will need to be in an environment with:
Python 3.9 or higher.
To install ScienceScraper, first clone the repository from GitHub:
git clone https://github.com/peptide-digest/ScienceScraper
Next, let’s install the package onto your system. Navigate to the root directory of the repository and run the following command:
pip install -e .
This will install the package in editable mode, meaning that you can make changes to the package and see the changes reflected in your environment without having to reinstall the package. Installation is now complete!
Usage#
Once installed, you can use the package by importing the sciencescraper
module or its submodules: sciencescraper.sciencedirect
and sciencescraper.pmc
.
If sciencescraper
is imported, the following functions are available:
- ScienceDirect:
search_scidir
get_scidir_article_info
get_scidir_full_text
check_new_scidir_articles
- PMC:
search_pmc
get_pmc_article_info
get_pmc_full_text
check_new_pmc_articles
ScienceDirect Scraping Only#
If you only want to use the functions related to ScienceDirect, you can import the sciencescraper.sciencedirect`
module, giving you access to the following functions:
search_scidir
get_article_info
get_full_text
check_new_articles
PMC Scraping Only#
If you only want to use the functions related to PMC, you can import the sciencescraper.pmc
module, giving you access to the following functions:
search_pmc
get_article_info
get_full_text
check_new_articles
Examples#
Here are some examples of how to import the package and use its functions:
import sciencescraper as ss
search_results = ss.search_scidir('covid-19', max_results=100)
import sciencescraper as ss
api_key = 'your_api_key'
doi = '10.1016/j.str.2020.04.005'
article_info = ss.get_scidir_article_info(api_key, doi=doi)
import sciencescraper.sciencedirect as ss_scidir
api_key = 'your_api_key'
doi = '10.1016/j.str.2020.04.005'
full_text = ss.get_full_text(api_key, doi=doi)
import sciencescraper.pmc as ss_pmc
article_info = ss_pmc.get_particle_info('10680866')