peptidedigest.clean_text
#
Functions to clean text data.
Module Contents#
Functions#
|
Splits a given text into chunks of approximately 'chunk_size' words. |
|
Cleans a summary text by removing unwanted patterns and phrases. |
|
Extract peptides, proteins, domains of interest, chemistry discussed, |
- peptidedigest.clean_text.split_into_chunks(text, chunk_size)[source]#
Splits a given text into chunks of approximately ‘chunk_size’ words.
- Parameters:
text (str) – The text to split into chunks.
chunk_size (int) – The approximate number of words to include in each chunk.
- Returns:
chunks – A list of text chunks, each containing approximately ‘chunk_size’ words.
- Return type:
list of str
- peptidedigest.clean_text.clean_summary(summary_text)[source]#
Cleans a summary text by removing unwanted patterns and phrases.
- Parameters:
summary_text (str) – The summary text to clean.
- Returns:
cleaned_summary – The cleaned summary text.
- Return type:
str
- peptidedigest.clean_text.extract_metadata(metadata_text)[source]#
Extract peptides, proteins, domains of interest, chemistry discussed, biology discussed, and computational methods discussed from the model metadata text.
- Parameters:
metadata_text (str) – The model metadata text to be parsed.
- Returns:
A dictionary containing the extracted metadata as lists.
- Return type:
dict