`peptidedigest.article_processing`#

Functions to process and analyze articles using the model.

Module Contents#

Functions#

`process_scidir_article`(database, tokenizer, model, api_key)	Process a ScienceDirect article, summarize the article using the model, and store the information in the database.
`process_multiple_scidir_articles`(database, tokenizer, ...)	Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database.
`process_pmc_article`(database, tokenizer, model, pmc_id)	Process a PubMed Central article, summarize the article using the model, and store the information in the database.
`process_multiple_pmc_articles`(database, tokenizer, ...)	Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database.

peptidedigest.article_processing.process_scidir_article(database, tokenizer, model, api_key, doi=None, pii=None, url=None, chunk_size=4200, update=False)[source]#

Process a ScienceDirect article, summarize the article using the model, and store the information in the database.

Parameters:

database (str) – The database to store the processed article information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the article.
api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
doi (str, optional) – The DOI of the article to be processed.
pii (str, optional) – The PII of the article to be processed.
url (str, optional) – The URL of the article to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False.

Returns:

The processed article information is stored in the database.

Return type:

None

peptidedigest.article_processing.process_multiple_scidir_articles(database, tokenizer, model, api_key, dois=None, piis=None, urls=None, chunk_size=4200, update=False)[source]#

Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database.

Parameters:

database (str) – The database to store the processed articles information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the articles.
api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
dois (list of str, optional) – The DOIs of the articles to be processed.
piis (list of str, optional) – The PIIs of the articles to be processed.
urls (list of str, optional) – The URLs of the articles to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False.

Returns:

The processed articles information is stored in the database.

Return type:

None

peptidedigest.article_processing.process_pmc_article(database, tokenizer, model, pmc_id, chunk_size=4200, update=False)[source]#

Process a PubMed Central article, summarize the article using the model, and store the information in the database.

Parameters:

database (str) – The database to store the processed article information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the article.
pmc_id (str) – The PMC ID of the article to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False.

Returns:

The processed article information is stored in the database.

Return type:

None

peptidedigest.article_processing.process_multiple_pmc_articles(database, tokenizer, model, pmc_ids, chunk_size=4200, update=False)[source]#

Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database.

Parameters:

database (str) – The database to store the processed articles information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the articles.
pmc_ids (list of str) – The PMC IDs of the articles to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False.

Returns:

The processed articles information is stored in the database.

Return type:

None

peptidedigest.article_processing#

Module Contents#

Functions#

`peptidedigest.article_processing`#