peptidedigest
#
LLM for summarization of scientific articles related to computational peptides.
Submodules#
Package Contents#
Functions#
|
Create a SQLite database with the given name. |
|
Get the article information and model responses for a given DOI. |
|
Get all articles from the database. |
|
Check if an article with the given value in the specified column exists in the database. |
|
Delete an article and its model responses from the database. |
|
Insert an article and its model responses into the database. |
|
Update the model responses for an article in the database. |
|
Process a ScienceDirect article, summarize the article using the model, and store the information in the database. |
|
Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database. |
|
Process a PubMed Central article, summarize the article using the model, and store the information in the database. |
|
Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database. |
|
Summarizes a scientific article into bullet points and a concise summary. |
|
|
|
- peptidedigest.create_database(name)[source]#
Create a SQLite database with the given name.
- Parameters:
name (str) – The name of the database to create.
- Returns:
The database is created in the current working directory.
- Return type:
None
- peptidedigest.get_article(database, doi=None, pmc_id=None)[source]#
Get the article information and model responses for a given DOI.
- Parameters:
database (str) – The name of the database to retrieve the article from.
doi (str) – The DOI of the article to retrieve.
pmc_id (str) – The PMC ID of the article to retrieve.
- Returns:
A dictionary containing the article information and model responses.
- Return type:
dict
- peptidedigest.get_articles(database)[source]#
Get all articles from the database.
- Parameters:
database (str) – The name of the database to retrieve the articles from.
- Returns:
A list of dictionaries containing the article information and model responses.
- Return type:
list
- peptidedigest.check_article_exists(database, value, column)[source]#
Check if an article with the given value in the specified column exists in the database.
- Parameters:
database (str) – The name of the database to check for the article.
value (str) – The value to check for in the specified column.
column (str) – The column to check for the value.
- Returns:
True if the article exists, False otherwise.
- Return type:
bool
- peptidedigest.delete_article(database, doi=None, pmc_id=None)[source]#
Delete an article and its model responses from the database.
- Parameters:
database (str) – The name of the database to delete the article from.
doi (str) – The DOI of the article to delete.
pmc_id (str) – The PMC ID of the article to delete.
- Returns:
The article and model responses are deleted from the database.
- Return type:
None
- peptidedigest.insert_article(database, article_info, model_responses=None)[source]#
Insert an article and its model responses into the database.
- Parameters:
database (str) – The name of the database to insert the article into.
article_info (dict) – A dictionary containing the article information.
model_responses (dict) – A dictionary containing the model responses for the article.
- Returns:
The article and model responses are inserted into the database.
- Return type:
None
- peptidedigest.update_article(database, doi, model_responses)[source]#
Update the model responses for an article in the database.
- Parameters:
database (str) – The name of the database to update the article in.
doi (str) – The DOI of the article to update.
model_responses (dict) – A dictionary containing the updated model responses.
- Returns:
The model responses for the article are updated in the database.
- Return type:
None
- peptidedigest.process_scidir_article(database, tokenizer, model, api_key, doi=None, pii=None, url=None, chunk_size=4200, update=False)[source]#
Process a ScienceDirect article, summarize the article using the model, and store the information in the database.
- Parameters:
database (str) – The database to store the processed article information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the article.
api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
doi (str, optional) – The DOI of the article to be processed.
pii (str, optional) – The PII of the article to be processed.
url (str, optional) – The URL of the article to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False.
- Returns:
The processed article information is stored in the database.
- Return type:
None
- peptidedigest.process_multiple_scidir_articles(database, tokenizer, model, api_key, dois=None, piis=None, urls=None, chunk_size=4200, update=False)[source]#
Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database.
- Parameters:
database (str) – The database to store the processed articles information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the articles.
api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
dois (list of str, optional) – The DOIs of the articles to be processed.
piis (list of str, optional) – The PIIs of the articles to be processed.
urls (list of str, optional) – The URLs of the articles to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False.
- Returns:
The processed articles information is stored in the database.
- Return type:
None
- peptidedigest.process_pmc_article(database, tokenizer, model, pmc_id, chunk_size=4200, update=False)[source]#
Process a PubMed Central article, summarize the article using the model, and store the information in the database.
- Parameters:
database (str) – The database to store the processed article information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the article.
pmc_id (str) – The PMC ID of the article to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False.
- Returns:
The processed article information is stored in the database.
- Return type:
None
- peptidedigest.process_multiple_pmc_articles(database, tokenizer, model, pmc_ids, chunk_size=4200, update=False)[source]#
Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database.
- Parameters:
database (str) – The database to store the processed articles information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the articles.
pmc_ids (list of str) – The PMC IDs of the articles to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False.
- Returns:
The processed articles information is stored in the database.
- Return type:
None
- peptidedigest.summarize_article_segments(fulltext, tokenizer, model)[source]#
Summarizes a scientific article into bullet points and a concise summary.
- Parameters:
fulltext (list of str) – A list of text chunks from a scientific article.
- Returns:
final_summary (str) – A concise summary of the scientific article.
bullet_points (str) – Bullet points summarizing the scientific article.