:py:mod:`peptidedigest` ======================= .. py:module:: peptidedigest .. autoapi-nested-parse:: LLM for summarization of scientific articles related to computational peptides. Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 article_db/index.rst article_processing/index.rst clean_text/index.rst model_prompts/index.rst Package Contents ---------------- Functions ~~~~~~~~~ .. autoapisummary:: peptidedigest.create_database peptidedigest.get_article peptidedigest.get_articles peptidedigest.check_article_exists peptidedigest.delete_article peptidedigest.insert_article peptidedigest.update_article peptidedigest.process_scidir_article peptidedigest.process_multiple_scidir_articles peptidedigest.process_pmc_article peptidedigest.process_multiple_pmc_articles peptidedigest.summarize_article_segments peptidedigest.summarize_article_meta peptidedigest.score_texts_peptide_research .. py:function:: create_database(name) Create a SQLite database with the given name. :Parameters: **name** (*str*) -- The name of the database to create. :returns: The database is created in the current working directory. :rtype: None .. py:function:: get_article(database, doi=None, pmc_id=None) Get the article information and model responses for a given DOI. :Parameters: * **database** (*str*) -- The name of the database to retrieve the article from. * **doi** (*str*) -- The DOI of the article to retrieve. * **pmc_id** (*str*) -- The PMC ID of the article to retrieve. :returns: A dictionary containing the article information and model responses. :rtype: dict .. py:function:: get_articles(database) Get all articles from the database. :Parameters: **database** (*str*) -- The name of the database to retrieve the articles from. :returns: A list of dictionaries containing the article information and model responses. :rtype: list .. py:function:: check_article_exists(database, value, column) Check if an article with the given value in the specified column exists in the database. :Parameters: * **database** (*str*) -- The name of the database to check for the article. * **value** (*str*) -- The value to check for in the specified column. * **column** (*str*) -- The column to check for the value. :returns: True if the article exists, False otherwise. :rtype: bool .. py:function:: delete_article(database, doi=None, pmc_id=None) Delete an article and its model responses from the database. :Parameters: * **database** (*str*) -- The name of the database to delete the article from. * **doi** (*str*) -- The DOI of the article to delete. * **pmc_id** (*str*) -- The PMC ID of the article to delete. :returns: The article and model responses are deleted from the database. :rtype: None .. py:function:: insert_article(database, article_info, model_responses=None) Insert an article and its model responses into the database. :Parameters: * **database** (*str*) -- The name of the database to insert the article into. * **article_info** (*dict*) -- A dictionary containing the article information. * **model_responses** (*dict*) -- A dictionary containing the model responses for the article. :returns: The article and model responses are inserted into the database. :rtype: None .. py:function:: update_article(database, doi, model_responses) Update the model responses for an article in the database. :Parameters: * **database** (*str*) -- The name of the database to update the article in. * **doi** (*str*) -- The DOI of the article to update. * **model_responses** (*dict*) -- A dictionary containing the updated model responses. :returns: The model responses for the article are updated in the database. :rtype: None .. py:function:: process_scidir_article(database, tokenizer, model, api_key, doi=None, pii=None, url=None, chunk_size=4200, update=False) Process a ScienceDirect article, summarize the article using the model, and store the information in the database. :Parameters: * **database** (*str*) -- The database to store the processed article information. * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model. * **model** (*transformers.PreTrainedModel*) -- The model to use to process the article. * **api_key** (*str*) -- The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/. * **doi** (*str, optional*) -- The DOI of the article to be processed. * **pii** (*str, optional*) -- The PII of the article to be processed. * **url** (*str, optional*) -- The URL of the article to be processed. * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200. * **update** (*bool, optional*) -- If True, the article will be updated in the database if it already exists. Default is False. :returns: The processed article information is stored in the database. :rtype: None .. py:function:: process_multiple_scidir_articles(database, tokenizer, model, api_key, dois=None, piis=None, urls=None, chunk_size=4200, update=False) Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database. :Parameters: * **database** (*str*) -- The database to store the processed articles information. * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model. * **model** (*transformers.PreTrainedModel*) -- The model to use to process the articles. * **api_key** (*str*) -- The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/. * **dois** (*list of str, optional*) -- The DOIs of the articles to be processed. * **piis** (*list of str, optional*) -- The PIIs of the articles to be processed. * **urls** (*list of str, optional*) -- The URLs of the articles to be processed. * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200. * **update** (*bool, optional*) -- If True, the articles will be updated in the database if they already exist. Default is False. :returns: The processed articles information is stored in the database. :rtype: None .. py:function:: process_pmc_article(database, tokenizer, model, pmc_id, chunk_size=4200, update=False) Process a PubMed Central article, summarize the article using the model, and store the information in the database. :Parameters: * **database** (*str*) -- The database to store the processed article information. * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model. * **model** (*transformers.PreTrainedModel*) -- The model to use to process the article. * **pmc_id** (*str*) -- The PMC ID of the article to be processed. * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200. * **update** (*bool, optional*) -- If True, the article will be updated in the database if it already exists. Default is False. :returns: The processed article information is stored in the database. :rtype: None .. py:function:: process_multiple_pmc_articles(database, tokenizer, model, pmc_ids, chunk_size=4200, update=False) Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database. :Parameters: * **database** (*str*) -- The database to store the processed articles information. * **tokenizer** (*transformers.PreTrainedTokenizer*) -- The tokenizer to use for the model. * **model** (*transformers.PreTrainedModel*) -- The model to use to process the articles. * **pmc_ids** (*list of str*) -- The PMC IDs of the articles to be processed. * **chunk_size** (*int, optional*) -- The size of the chunks to split the full text into. Default is 4200. * **update** (*bool, optional*) -- If True, the articles will be updated in the database if they already exist. Default is False. :returns: The processed articles information is stored in the database. :rtype: None .. py:function:: summarize_article_segments(fulltext, tokenizer, model) Summarizes a scientific article into bullet points and a concise summary. :Parameters: **fulltext** (*list of str*) -- A list of text chunks from a scientific article. :returns: * **final_summary** (*str*) -- A concise summary of the scientific article. * **bullet_points** (*str*) -- Bullet points summarizing the scientific article. .. py:function:: summarize_article_meta(fulltext, tokenizer, model) .. py:function:: score_texts_peptide_research(texts_to_score, summary, bullet_points, metadata, tokenizer, model)