PeptideDigest Setup
===================


Importing the Necessary Packages
--------------------------------

First, we need to load the following packages: PeptideDigest and `ScienceScraper <https://github.com/peptide-digest/ScienceScraper>`_.
ScienceScraper is a package that allows us to prepare clean data for PeptideDigest as well as search ScienceDirect and PMC
for articles that we are interested in. ScienceScraper is installed as a dependency of PeptideDigest, so you do not need to install it separately.

In a Jupyter notebook or Python shell, run the following:

.. tab-set-code::

    .. code-block:: python

        import peptidedigest as pd
        import sciencescraper as ss


Initializing the Model 
----------------------

Now, let's load in the tokenizer and model for PeptideDigest. Using the path to the model, we can load the model and tokenizer:

.. tab-set-code::

    .. code-block:: python

        from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
        import torch

        model_path = "path/to/model"
        tokenizer = AutoTokenizer.from_pretrained(model_path)
        model = AutoModelForCausalLM.from_pretrained(model_path, config=BitsAndBytesConfig())
        model = model.to("cuda")


Create the database
-------------------

PeptideDigest uses SQLite databases to store the data. This is useful for storing the data that we scrape from ScienceDirect and PMC as 
well as data generated by the model. Because generating model output is not instantaneous, storing the data in a database allows us to
retrieve article information and summaries without having to regenerate the data.

To create a database, we can use the :func:`peptidedigest.create_database` function. This function takes in the path to the database file
and creates a new database at that location. Don't worry if the file already exists; the function will not overwrite it.

.. tab-set-code::

    .. code-block:: python

        db_path = "path/to/article_database.db"
        pd.create_database(db_path)

This will create a new database called ``article_database.db`` at the specified path. To see the database scheme, please refer to `Database <./database.html>`_.


Now that everything is set up, we can start scraping articles and generating summaries!