peptidedigest.article_processing
#
Functions to process and analyze articles using the model.
Module Contents#
Functions#
|
Process a ScienceDirect article, summarize the article using the model, and store the information in the database. |
|
Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database. |
|
Process a PubMed Central article, summarize the article using the model, and store the information in the database. |
|
Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database. |
- peptidedigest.article_processing.process_scidir_article(database, tokenizer, model, api_key, doi=None, pii=None, url=None, chunk_size=4200, update=False)[source]#
Process a ScienceDirect article, summarize the article using the model, and store the information in the database.
- Parameters:
database (str) – The database to store the processed article information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the article.
api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
doi (str, optional) – The DOI of the article to be processed.
pii (str, optional) – The PII of the article to be processed.
url (str, optional) – The URL of the article to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False.
- Returns:
The processed article information is stored in the database.
- Return type:
None
- peptidedigest.article_processing.process_multiple_scidir_articles(database, tokenizer, model, api_key, dois=None, piis=None, urls=None, chunk_size=4200, update=False)[source]#
Process multiple ScienceDirect articles, summarize the articles using the model, and store the information in the database.
- Parameters:
database (str) – The database to store the processed articles information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the articles.
api_key (str) – The API key for the ScienceDirect API. API keys can be obtained by creating an account at https://dev.elsevier.com/.
dois (list of str, optional) – The DOIs of the articles to be processed.
piis (list of str, optional) – The PIIs of the articles to be processed.
urls (list of str, optional) – The URLs of the articles to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False.
- Returns:
The processed articles information is stored in the database.
- Return type:
None
- peptidedigest.article_processing.process_pmc_article(database, tokenizer, model, pmc_id, chunk_size=4200, update=False)[source]#
Process a PubMed Central article, summarize the article using the model, and store the information in the database.
- Parameters:
database (str) – The database to store the processed article information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the article.
pmc_id (str) – The PMC ID of the article to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the article will be updated in the database if it already exists. Default is False.
- Returns:
The processed article information is stored in the database.
- Return type:
None
- peptidedigest.article_processing.process_multiple_pmc_articles(database, tokenizer, model, pmc_ids, chunk_size=4200, update=False)[source]#
Process multiple PubMed Central articles, summarize the articles using the model, and store the information in the database.
- Parameters:
database (str) – The database to store the processed articles information.
tokenizer (transformers.PreTrainedTokenizer) – The tokenizer to use for the model.
model (transformers.PreTrainedModel) – The model to use to process the articles.
pmc_ids (list of str) – The PMC IDs of the articles to be processed.
chunk_size (int, optional) – The size of the chunks to split the full text into. Default is 4200.
update (bool, optional) – If True, the articles will be updated in the database if they already exist. Default is False.
- Returns:
The processed articles information is stored in the database.
- Return type:
None