PeptideDigest Setup =================== Importing the Necessary Packages -------------------------------- First, we need to load the following packages: PeptideDigest and `ScienceScraper `_. ScienceScraper is a package that allows us to prepare clean data for PeptideDigest as well as search ScienceDirect and PMC for articles that we are interested in. ScienceScraper is installed as a dependency of PeptideDigest, so you do not need to install it separately. In a Jupyter notebook or Python shell, run the following: .. tab-set-code:: .. code-block:: python import peptidedigest as pd import sciencescraper as ss Initializing the Model ---------------------- Now, let's load in the tokenizer and model for PeptideDigest. Using the path to the model, we can load the model and tokenizer: .. tab-set-code:: .. code-block:: python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig import torch model_path = "path/to/model" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, config=BitsAndBytesConfig()) model = model.to("cuda") Create the database ------------------- PeptideDigest uses SQLite databases to store the data. This is useful for storing the data that we scrape from ScienceDirect and PMC as well as data generated by the model. Because generating model output is not instantaneous, storing the data in a database allows us to retrieve article information and summaries without having to regenerate the data. To create a database, we can use the :func:`peptidedigest.create_database` function. This function takes in the path to the database file and creates a new database at that location. Don't worry if the file already exists; the function will not overwrite it. .. tab-set-code:: .. code-block:: python db_path = "path/to/article_database.db" pd.create_database(db_path) This will create a new database called ``article_database.db`` at the specified path. To see the database scheme, please refer to `Database <./database.html>`_. Now that everything is set up, we can start scraping articles and generating summaries!