`peptidedigest.clean_text`#

Functions to clean text data.

Module Contents#

Functions#

`split_into_chunks`(text, chunk_size)	Splits a given text into chunks of approximately 'chunk_size' words.
`clean_summary`(summary_text)	Cleans a summary text by removing unwanted patterns and phrases.
`extract_metadata`(metadata_text)	Extract peptides, proteins, domains of interest, chemistry discussed,

peptidedigest.clean_text.split_into_chunks(text, chunk_size)[source]#

Splits a given text into chunks of approximately ‘chunk_size’ words.

Parameters:

text (str) – The text to split into chunks.
chunk_size (int) – The approximate number of words to include in each chunk.

Returns:

chunks – A list of text chunks, each containing approximately ‘chunk_size’ words.

Return type:

list of str

peptidedigest.clean_text.clean_summary(summary_text)[source]#

Cleans a summary text by removing unwanted patterns and phrases.

Parameters:: summary_text (str) – The summary text to clean.
Returns:: cleaned_summary – The cleaned summary text.
Return type:: str

peptidedigest.clean_text.extract_metadata(metadata_text)[source]#

Extract peptides, proteins, domains of interest, chemistry discussed, biology discussed, and computational methods discussed from the model metadata text.

Parameters:: metadata_text (str) – The model metadata text to be parsed.
Returns:: A dictionary containing the extracted metadata as lists.
Return type:: dict

peptidedigest.clean_text#

Module Contents#

Functions#

`peptidedigest.clean_text`#