:py:mod:`peptidedigest.article_processing`
==========================================

.. py:module:: peptidedigest.article_processing

.. autoapi-nested-parse::

   Functions to process and analyze articles using the model.


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   peptidedigest.article_processing.process_scidir_article
   peptidedigest.article_processing.process_multiple_scidir_articles
   peptidedigest.article_processing.process_pmc_article
   peptidedigest.article_processing.process_multiple_pmc_articles


.. py:function:: process_scidir_article(database, tokenizer, model, api_key, doi=None, pii=None, url=None, chunk_size=4200, update=False)

   Process a ScienceDirect article, summarize the article using the model, and store the information in the database.

   :Parameters: * **database** (*str*) -- The database to store the processed article information.
                * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model.
                * **model** (*transformers.PreTrainedModel*) -- The model to use to process the article.
                * **api_key** (*str*) -- The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
                * **doi** (*str, optional*) -- The DOI of the article to be processed.
                * **pii** (*str, optional*) -- The PII of the article to be processed.
                * **url** (*str, optional*) -- The URL of the article to be processed.
                * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200.
                * **update** (*bool, optional*) -- If True, the article will be updated in the database if it already exists. Default is False.

   :returns: The processed article information is stored in the database.
   :rtype: None


.. py:function:: process_multiple_scidir_articles(database, tokenizer, model, api_key, dois=None, piis=None, urls=None, chunk_size=4200, update=False)

   Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database.

   :Parameters: * **database** (*str*) -- The database to store the processed articles information.
                * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model.
                * **model** (*transformers.PreTrainedModel*) -- The model to use to process the articles.
                * **api_key** (*str*) -- The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
                * **dois** (*list of str, optional*) -- The DOIs of the articles to be processed.
                * **piis** (*list of str, optional*) -- The PIIs of the articles to be processed.
                * **urls** (*list of str, optional*) -- The URLs of the articles to be processed.
                * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200.
                * **update** (*bool, optional*) -- If True, the articles will be updated in the database if they already exist. Default is False.

   :returns: The processed articles information is stored in the database.
   :rtype: None


.. py:function:: process_pmc_article(database, tokenizer, model, pmc_id, chunk_size=4200, update=False)

   Process a PubMed Central article, summarize the article using the model, and store the information in the database.

   :Parameters: * **database** (*str*) -- The database to store the processed article information.
                * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model.
                * **model** (*transformers.PreTrainedModel*) -- The model to use to process the article.
                * **pmc_id** (*str*) -- The PMC ID of the article to be processed.
                * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200.
                * **update** (*bool, optional*) -- If True, the article will be updated in the database if it already exists. Default is False.

   :returns: The processed article information is stored in the database.
   :rtype: None


.. py:function:: process_multiple_pmc_articles(database, tokenizer, model, pmc_ids, chunk_size=4200, update=False)

   Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database.

   :Parameters: * **database** (*str*) -- The database to store the processed articles information.
                * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model.
                * **model** (*transformers.PreTrainedModel*) -- The model to use to process the articles.
                * **pmc_ids** (*list of str*) -- The PMC IDs of the articles to be processed.
                * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200.
                * **update** (*bool, optional*) -- If True, the articles will be updated in the database if they already exist. Default is False.

   :returns: The processed articles information is stored in the database.
   :rtype: None