Skip to main content
Ctrl+K

ScienceScraper

  • Getting Started
  • User Guide
  • How to Contribute
  • API Reference
  • GitHub
  • Getting Started
  • User Guide
  • How to Contribute
  • API Reference
  • GitHub

Section Navigation

  • sciencescraper
    • sciencescraper.pmc
      • sciencescraper.pmc.pmc_clean
      • sciencescraper.pmc.pmc_extract
      • sciencescraper.pmc.pmc_scrape
      • sciencescraper.pmc.pmc_search
    • sciencescraper.sciencedirect
      • sciencescraper.sciencedirect.scidir_clean
      • sciencescraper.sciencedirect.scidir_extract
      • sciencescraper.sciencedirect.scidir_scrape
      • sciencescraper.sciencedirect.scidir_search
  • API Reference
  • sciencescraper.pmc

sciencescraper.pmc.pmc_search#

Functions for searching for articles on PubMed Central.

Module Contents#

Functions#

search_pmc(query[, sort, mindate, maxdate, reldate, ...])

Searches PMC for articles given a query

check_new_articles(query, days[, chunk_size])

Get open access articles from PubMed Central that have been published after a specified date.

notify_new_articles(articles)

Notify the user of new articles.

sciencescraper.pmc.pmc_search.search_pmc(query, sort='relevance', mindate=None, maxdate=None, reldate=None, retstart=0, retmax=20)[source]#

Searches PMC for articles given a query

Parameters:
  • query (str) – The query to search for

  • sort (str, optional) – The sorting order for the search results. Options are: - “relevance”: Sort by relevance - “pub_date”: Sort by publication date in descending order - “JournalName”: Sort by journal in ascending order - “Author”: Sort by first author in ascending order

  • mindate (str, optional) – The minimum date for the search results. Format is “YYYY/MM/DD”, “YYYY/MM”, or “YYYY”. Must also provide maxdate

  • maxdate (str, optional) – The maximum date for the search results. Format is “YYYY/MM/DD”, “YYYY/MM”, or “YYYY”. Must also provide mindate

  • reldate (str, optional) – The number of days to search back from the current date.

  • retstart (int, optional) – The index of the first article to return

  • retmax (int, optional) – The maximum number of articles to return

Returns:

pmc_ids – The PMC IDs of the search results

Return type:

list

sciencescraper.pmc.pmc_search.check_new_articles(query, days, chunk_size=None)[source]#

Get open access articles from PubMed Central that have been published after a specified date.

Parameters:
  • query (str) – The query to search for

  • days (int) – The number of days to search back from the current date.

  • chunk_size (int, optional) – The size of the chunks to split the full text into

Returns:

pmc_articles – A list of dictionaries containing article information

Return type:

list of dict

sciencescraper.pmc.pmc_search.notify_new_articles(articles)[source]#

Notify the user of new articles.

Parameters:

articles (list of dict) – A list of dictionaries containing the title, authors, journal, year, URL, open access status, keywords, abstract, methods, results, discussion, and references of the new articles.

previous

sciencescraper.pmc.pmc_scrape

next

sciencescraper.sciencedirect

On this page
  • Module Contents
    • Functions
      • search_pmc()
      • check_new_articles()
      • notify_new_articles()
Show Source

© Copyright 2024, Joshua Blomgren, Elizabeth Gilson, Jeffrey Jacob. Project structure based on the Computational Molecular Science Python Cookiecutter version 1.1.

Created using Sphinx 7.3.7.

Built with the PyData Sphinx Theme 0.15.2.