`peptidedigest`#

LLM for summarization of scientific articles related to computational peptides.

Submodules#

Package Contents#

Functions#

`create_database`(name)	Create a SQLite database with the given name.
`get_article`(database[, doi, pmc_id])	Get the article information and model responses for a given DOI.
`get_articles`(database)	Get all articles from the database.
`check_article_exists`(database, value, column)	Check if an article with the given value in the specified column exists in the database.
`delete_article`(database[, doi, pmc_id])	Delete an article and its model responses from the database.
`insert_article`(database, article_info[, model_responses])	Insert an article and its model responses into the database.
`update_article`(database, doi, model_responses)	Update the model responses for an article in the database.
`process_scidir_article`(database, tokenizer, model, api_key)	Process a ScienceDirect article, summarize the article using the model, and store the information in the database.
`process_multiple_scidir_articles`(database, tokenizer, ...)	Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database.
`process_pmc_article`(database, tokenizer, model, pmc_id)	Process a PubMed Central article, summarize the article using the model, and store the information in the database.
`process_multiple_pmc_articles`(database, tokenizer, ...)	Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database.
`summarize_article_segments`(fulltext, tokenizer, model)	Summarizes a scientific article into bullet points and a concise summary.
`summarize_article_meta`(fulltext, tokenizer, model)
`score_texts_peptide_research`(texts_to_score, summary, ...)

peptidedigest.create_database(name)[source]#

Create a SQLite database with the given name.

Parameters:: name (str) – The name of the database to create.
Returns:: The database is created in the current working directory.
Return type:: None

peptidedigest.get_article(database, doi=None, pmc_id=None)[source]#

Get the article information and model responses for a given DOI.

Parameters:

database (str) – The name of the database to retrieve the article from.
doi (str) – The DOI of the article to retrieve.
pmc_id (str) – The PMC ID of the article to retrieve.

Returns:

A dictionary containing the article information and model responses.

Return type:

dict

peptidedigest.get_articles(database)[source]#

Get all articles from the database.

Parameters:: database (str) – The name of the database to retrieve the articles from.
Returns:: A list of dictionaries containing the article information and model responses.
Return type:: list

peptidedigest.check_article_exists(database, value, column)[source]#

Check if an article with the given value in the specified column exists in the database.

Parameters:

database (str) – The name of the database to check for the article.
value (str) – The value to check for in the specified column.
column (str) – The column to check for the value.

Returns:

True if the article exists, False otherwise.

Return type:

bool

peptidedigest.delete_article(database, doi=None, pmc_id=None)[source]#

Delete an article and its model responses from the database.

Parameters:

database (str) – The name of the database to delete the article from.
doi (str) – The DOI of the article to delete.
pmc_id (str) – The PMC ID of the article to delete.

Returns:

The article and model responses are deleted from the database.

Return type:

None

peptidedigest.insert_article(database, article_info, model_responses=None)[source]#

Insert an article and its model responses into the database.

Parameters:

database (str) – The name of the database to insert the article into.
article_info (dict) – A dictionary containing the article information.
model_responses (dict) – A dictionary containing the model responses for the article.

Returns:

The article and model responses are inserted into the database.

Return type:

None

peptidedigest.update_article(database, doi, model_responses)[source]#

Update the model responses for an article in the database.

Parameters:

database (str) – The name of the database to update the article in.
doi (str) – The DOI of the article to update.
model_responses (dict) – A dictionary containing the updated model responses.

Returns:

The model responses for the article are updated in the database.

Return type:

None

peptidedigest.process_scidir_article(database, tokenizer, model, api_key, doi=None, pii=None, url=None, chunk_size=4200, update=False)[source]#

Process a ScienceDirect article, summarize the article using the model, and store the information in the database.

Parameters:

database (str) – The database to store the processed article information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the article.
api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
doi (str, optional) – The DOI of the article to be processed.
pii (str, optional) – The PII of the article to be processed.
url (str, optional) – The URL of the article to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False.

Returns:

The processed article information is stored in the database.

Return type:

None

peptidedigest.process_multiple_scidir_articles(database, tokenizer, model, api_key, dois=None, piis=None, urls=None, chunk_size=4200, update=False)[source]#

Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database.

Parameters:

database (str) – The database to store the processed articles information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the articles.
api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
dois (list of str, optional) – The DOIs of the articles to be processed.
piis (list of str, optional) – The PIIs of the articles to be processed.
urls (list of str, optional) – The URLs of the articles to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False.

Returns:

The processed articles information is stored in the database.

Return type:

None

peptidedigest.process_pmc_article(database, tokenizer, model, pmc_id, chunk_size=4200, update=False)[source]#

Process a PubMed Central article, summarize the article using the model, and store the information in the database.

Parameters:

database (str) – The database to store the processed article information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the article.
pmc_id (str) – The PMC ID of the article to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False.

Returns:

The processed article information is stored in the database.

Return type:

None

peptidedigest.process_multiple_pmc_articles(database, tokenizer, model, pmc_ids, chunk_size=4200, update=False)[source]#

Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database.

Parameters:

database (str) – The database to store the processed articles information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the articles.
pmc_ids (list of str) – The PMC IDs of the articles to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False.

Returns:

The processed articles information is stored in the database.

Return type:

None

peptidedigest.summarize_article_segments(fulltext, tokenizer, model)[source]#

Summarizes a scientific article into bullet points and a concise summary.

Parameters:

fulltext (list of str) – A list of text chunks from a scientific article.

Returns:

final_summary (str) – A concise summary of the scientific article.
bullet_points (str) – Bullet points summarizing the scientific article.

peptidedigest.summarize_article_meta(fulltext, tokenizer, model)[source]#

peptidedigest.score_texts_peptide_research(texts_to_score, summary, bullet_points, metadata, tokenizer, model)[source]#

peptidedigest#

Submodules#

Package Contents#

Functions#

`peptidedigest`#