:py:mod:`peptidedigest`
=======================

.. py:module:: peptidedigest

.. autoapi-nested-parse::

   LLM for summarization of scientific articles related to computational peptides.


Submodules
----------
.. toctree::
   :titlesonly:
   :maxdepth: 1

   article_db/index.rst
   article_processing/index.rst
   clean_text/index.rst
   model_prompts/index.rst


Package Contents
----------------


Functions
~~~~~~~~~

.. autoapisummary::

   peptidedigest.create_database
   peptidedigest.get_article
   peptidedigest.get_articles
   peptidedigest.check_article_exists
   peptidedigest.delete_article
   peptidedigest.insert_article
   peptidedigest.update_article
   peptidedigest.process_scidir_article
   peptidedigest.process_multiple_scidir_articles
   peptidedigest.process_pmc_article
   peptidedigest.process_multiple_pmc_articles
   peptidedigest.summarize_article_segments
   peptidedigest.summarize_article_meta
   peptidedigest.score_texts_peptide_research


.. py:function:: create_database(name)

   Create a SQLite database with the given name.

   :Parameters: **name** (*str*) -- The name of the database to create.

   :returns: The database is created in the current working directory.
   :rtype: None


.. py:function:: get_article(database, doi=None, pmc_id=None)

   Get the article information and model responses for a given DOI.

   :Parameters: * **database** (*str*) -- The name of the database to retrieve the article from.
                * **doi** (*str*) -- The DOI of the article to retrieve.
                * **pmc_id** (*str*) -- The PMC ID of the article to retrieve.

   :returns: A dictionary containing the article information and model responses.
   :rtype: dict


.. py:function:: get_articles(database)

   Get all articles from the database.

   :Parameters: **database** (*str*) -- The name of the database to retrieve the articles from.

   :returns: A list of dictionaries containing the article information and model responses.
   :rtype: list


.. py:function:: check_article_exists(database, value, column)

   Check if an article with the given value in the specified column exists in the database.

   :Parameters: * **database** (*str*) -- The name of the database to check for the article.
                * **value** (*str*) -- The value to check for in the specified column.
                * **column** (*str*) -- The column to check for the value.

   :returns: True if the article exists, False otherwise.
   :rtype: bool


.. py:function:: delete_article(database, doi=None, pmc_id=None)

   Delete an article and its model responses from the database.

   :Parameters: * **database** (*str*) -- The name of the database to delete the article from.
                * **doi** (*str*) -- The DOI of the article to delete.
                * **pmc_id** (*str*) -- The PMC ID of the article to delete.

   :returns: The article and model responses are deleted from the database.
   :rtype: None


.. py:function:: insert_article(database, article_info, model_responses=None)

   Insert an article and its model responses into the database.

   :Parameters: * **database** (*str*) -- The name of the database to insert the article into.
                * **article_info** (*dict*) -- A dictionary containing the article information.
                * **model_responses** (*dict*) -- A dictionary containing the model responses for the article.

   :returns: The article and model responses are inserted into the database.
   :rtype: None


.. py:function:: update_article(database, doi, model_responses)

   Update the model responses for an article in the database.

   :Parameters: * **database** (*str*) -- The name of the database to update the article in.
                * **doi** (*str*) -- The DOI of the article to update.
                * **model_responses** (*dict*) -- A dictionary containing the updated model responses.

   :returns: The model responses for the article are updated in the database.
   :rtype: None


.. py:function:: process_scidir_article(database, tokenizer, model, api_key, doi=None, pii=None, url=None, chunk_size=4200, update=False)

   Process a ScienceDirect article, summarize the article using the model, and store the information in the database.

   :Parameters: * **database** (*str*) -- The database to store the processed article information.
                * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model.
                * **model** (*transformers.PreTrainedModel*) -- The model to use to process the article.
                * **api_key** (*str*) -- The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
                * **doi** (*str, optional*) -- The DOI of the article to be processed.
                * **pii** (*str, optional*) -- The PII of the article to be processed.
                * **url** (*str, optional*) -- The URL of the article to be processed.
                * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200.
                * **update** (*bool, optional*) -- If True, the article will be updated in the database if it already exists. Default is False.

   :returns: The processed article information is stored in the database.
   :rtype: None


.. py:function:: process_multiple_scidir_articles(database, tokenizer, model, api_key, dois=None, piis=None, urls=None, chunk_size=4200, update=False)

   Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database.

   :Parameters: * **database** (*str*) -- The database to store the processed articles information.
                * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model.
                * **model** (*transformers.PreTrainedModel*) -- The model to use to process the articles.
                * **api_key** (*str*) -- The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
                * **dois** (*list of str, optional*) -- The DOIs of the articles to be processed.
                * **piis** (*list of str, optional*) -- The PIIs of the articles to be processed.
                * **urls** (*list of str, optional*) -- The URLs of the articles to be processed.
                * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200.
                * **update** (*bool, optional*) -- If True, the articles will be updated in the database if they already exist. Default is False.

   :returns: The processed articles information is stored in the database.
   :rtype: None


.. py:function:: process_pmc_article(database, tokenizer, model, pmc_id, chunk_size=4200, update=False)

   Process a PubMed Central article, summarize the article using the model, and store the information in the database.

   :Parameters: * **database** (*str*) -- The database to store the processed article information.
                * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model.
                * **model** (*transformers.PreTrainedModel*) -- The model to use to process the article.
                * **pmc_id** (*str*) -- The PMC ID of the article to be processed.
                * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200.
                * **update** (*bool, optional*) -- If True, the article will be updated in the database if it already exists. Default is False.

   :returns: The processed article information is stored in the database.
   :rtype: None


.. py:function:: process_multiple_pmc_articles(database, tokenizer, model, pmc_ids, chunk_size=4200, update=False)

   Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database.

   :Parameters: * **database** (*str*) -- The database to store the processed articles information.
                * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model.
                * **model** (*transformers.PreTrainedModel*) -- The model to use to process the articles.
                * **pmc_ids** (*list of str*) -- The PMC IDs of the articles to be processed.
                * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200.
                * **update** (*bool, optional*) -- If True, the articles will be updated in the database if they already exist. Default is False.

   :returns: The processed articles information is stored in the database.
   :rtype: None


.. py:function:: summarize_article_segments(fulltext, tokenizer, model)

   Summarizes a scientific article into bullet points and a concise summary.

   :Parameters: **fulltext** (*list of str*) -- A list of text chunks from a scientific article.

   :returns: * **final_summary** (*str*) -- A concise summary of the scientific article.
             * **bullet_points** (*str*) -- Bullet points summarizing the scientific article.


.. py:function:: summarize_article_meta(fulltext, tokenizer, model)


.. py:function:: score_texts_peptide_research(texts_to_score, summary, bullet_points, metadata, tokenizer, model)