peptidedigest#
LLM for summarization of scientific articles related to computational peptides.
Submodules#
Package Contents#
Functions#
| 
 | Create a SQLite database with the given name. | 
| 
 | Get the article information and model responses for a given DOI. | 
| 
 | Get all articles from the database. | 
| 
 | Check if an article with the given value in the specified column exists in the database. | 
| 
 | Delete an article and its model responses from the database. | 
| 
 | Insert an article and its model responses into the database. | 
| 
 | Update the model responses for an article in the database. | 
| 
 | Process a ScienceDirect article, summarize the article using the model, and store the information in the database. | 
| 
 | Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database. | 
| 
 | Process a PubMed Central article, summarize the article using the model, and store the information in the database. | 
| 
 | Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database. | 
| 
 | Summarizes a scientific article into bullet points and a concise summary. | 
| 
 | |
| 
 | 
- peptidedigest.create_database(name)[source]#
- Create a SQLite database with the given name. - Parameters:
- name (str) – The name of the database to create. 
- Returns:
- The database is created in the current working directory. 
- Return type:
- None 
 
- peptidedigest.get_article(database, doi=None, pmc_id=None)[source]#
- Get the article information and model responses for a given DOI. - Parameters:
- database (str) – The name of the database to retrieve the article from. 
- doi (str) – The DOI of the article to retrieve. 
- pmc_id (str) – The PMC ID of the article to retrieve. 
 
- Returns:
- A dictionary containing the article information and model responses. 
- Return type:
- dict 
 
- peptidedigest.get_articles(database)[source]#
- Get all articles from the database. - Parameters:
- database (str) – The name of the database to retrieve the articles from. 
- Returns:
- A list of dictionaries containing the article information and model responses. 
- Return type:
- list 
 
- peptidedigest.check_article_exists(database, value, column)[source]#
- Check if an article with the given value in the specified column exists in the database. - Parameters:
- database (str) – The name of the database to check for the article. 
- value (str) – The value to check for in the specified column. 
- column (str) – The column to check for the value. 
 
- Returns:
- True if the article exists, False otherwise. 
- Return type:
- bool 
 
- peptidedigest.delete_article(database, doi=None, pmc_id=None)[source]#
- Delete an article and its model responses from the database. - Parameters:
- database (str) – The name of the database to delete the article from. 
- doi (str) – The DOI of the article to delete. 
- pmc_id (str) – The PMC ID of the article to delete. 
 
- Returns:
- The article and model responses are deleted from the database. 
- Return type:
- None 
 
- peptidedigest.insert_article(database, article_info, model_responses=None)[source]#
- Insert an article and its model responses into the database. - Parameters:
- database (str) – The name of the database to insert the article into. 
- article_info (dict) – A dictionary containing the article information. 
- model_responses (dict) – A dictionary containing the model responses for the article. 
 
- Returns:
- The article and model responses are inserted into the database. 
- Return type:
- None 
 
- peptidedigest.update_article(database, doi, model_responses)[source]#
- Update the model responses for an article in the database. - Parameters:
- database (str) – The name of the database to update the article in. 
- doi (str) – The DOI of the article to update. 
- model_responses (dict) – A dictionary containing the updated model responses. 
 
- Returns:
- The model responses for the article are updated in the database. 
- Return type:
- None 
 
- peptidedigest.process_scidir_article(database, tokenizer, model, api_key, doi=None, pii=None, url=None, chunk_size=4200, update=False)[source]#
- Process a ScienceDirect article, summarize the article using the model, and store the information in the database. - Parameters:
- database (str) – The database to store the processed article information. 
- tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model. 
- model (transformers.PreTrainedModel) – The model to use to process the article. 
- api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/. 
- doi (str, optional) – The DOI of the article to be processed. 
- pii (str, optional) – The PII of the article to be processed. 
- url (str, optional) – The URL of the article to be processed. 
- chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200. 
- update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False. 
 
- Returns:
- The processed article information is stored in the database. 
- Return type:
- None 
 
- peptidedigest.process_multiple_scidir_articles(database, tokenizer, model, api_key, dois=None, piis=None, urls=None, chunk_size=4200, update=False)[source]#
- Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database. - Parameters:
- database (str) – The database to store the processed articles information. 
- tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model. 
- model (transformers.PreTrainedModel) – The model to use to process the articles. 
- api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/. 
- dois (list of str, optional) – The DOIs of the articles to be processed. 
- piis (list of str, optional) – The PIIs of the articles to be processed. 
- urls (list of str, optional) – The URLs of the articles to be processed. 
- chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200. 
- update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False. 
 
- Returns:
- The processed articles information is stored in the database. 
- Return type:
- None 
 
- peptidedigest.process_pmc_article(database, tokenizer, model, pmc_id, chunk_size=4200, update=False)[source]#
- Process a PubMed Central article, summarize the article using the model, and store the information in the database. - Parameters:
- database (str) – The database to store the processed article information. 
- tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model. 
- model (transformers.PreTrainedModel) – The model to use to process the article. 
- pmc_id (str) – The PMC ID of the article to be processed. 
- chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200. 
- update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False. 
 
- Returns:
- The processed article information is stored in the database. 
- Return type:
- None 
 
- peptidedigest.process_multiple_pmc_articles(database, tokenizer, model, pmc_ids, chunk_size=4200, update=False)[source]#
- Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database. - Parameters:
- database (str) – The database to store the processed articles information. 
- tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model. 
- model (transformers.PreTrainedModel) – The model to use to process the articles. 
- pmc_ids (list of str) – The PMC IDs of the articles to be processed. 
- chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200. 
- update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False. 
 
- Returns:
- The processed articles information is stored in the database. 
- Return type:
- None 
 
- peptidedigest.summarize_article_segments(fulltext, tokenizer, model)[source]#
- Summarizes a scientific article into bullet points and a concise summary. - Parameters:
- fulltext (list of str) – A list of text chunks from a scientific article. 
- Returns:
- final_summary (str) – A concise summary of the scientific article. 
- bullet_points (str) – Bullet points summarizing the scientific article. 
 
 
