dl4ds_tutor / README.md
Thomas (Tom) Gardos
Update README.md on dev_branch with updated HF pointers
1dd61f7 unverified
|
raw
history blame
No virus
4.24 kB
metadata
title: AI Class Tutor -- Dev
description: An LLM based AI class tutor with RAG on DL4DS course
emoji: 🐢
colorFrom: red
colorTo: green
sdk: docker
app_port: 7860

DL4DS Tutor πŸƒ

Check out the configuration reference at Hugging Face Spaces Config Reference.

You can find a "production" implementation of the Tutor running live at DL4DS Tutor from the Hugging Face Space. It is pushed automatically from the main branch of this repo by this Actions Workflow upon a push to main.

A "development" version of the Tutor is running live at DL4DS Tutor -- Dev from this Hugging Face Space. It is pushed automatically from the dev_branch branch of this repo by this Actions Workflow upon a push to dev_branch.

Running Locally

Please view docs/setup.md for more information on setting up the project.

  1. Clone the Repository

    git clone https://github.com/DL4DS/dl4ds_tutor
    
  2. Put your data under the storage/data directory

    • Add URLs in the urls.txt file.
    • Add other PDF files in the storage/data directory.
  3. To test Data Loading (Optional)

    cd code
    python -m modules.dataloader.data_loader
    
  4. Create the Vector Database

    cd code
    python -m modules.vectorstore.store_manager
    
    • Note: You need to run the above command when you add new data to the storage/data directory, or if the storage/data/urls.txt file is updated.
  5. Run the Chainlit App

    chainlit run main.py
    

See the docs for more information.

File Structure

code/
 β”œβ”€β”€ modules
 β”‚   β”œβ”€β”€ chat                # Contains the chatbot implementation
 β”‚   β”œβ”€β”€ chat_processor      # Contains the implementation to process and log the conversations
 β”‚   β”œβ”€β”€ config              # Contains the configuration files
 β”‚   β”œβ”€β”€ dataloader          # Contains the implementation to load the data from the storage directory
 β”‚   β”œβ”€β”€ retriever           # Contains the implementation to create the retriever
 β”‚   └── vectorstore         # Contains the implementation to create the vector database
 β”œβ”€β”€ public
 β”‚   β”œβ”€β”€ logo_dark.png       # Dark theme logo
 β”‚   β”œβ”€β”€ logo_light.png      # Light theme logo
 β”‚   └── test.css            # Custom CSS file
 └── main.py

 
docs/                        # Contains the documentation to the codebase and methods used

storage/
 β”œβ”€β”€ data                    # Store files and URLs here
 β”œβ”€β”€ logs                    # Logs directory, includes logs on vector DB creation, tutor logs, and chunks logged in JSON files
 └── models                  # Local LLMs are loaded from here

vectorstores/                # Stores the created vector databases

.env                         # This needs to be created, store the API keys here
  • code/modules/vectorstore/vectorstore.py: Instantiates the VectorStore class to create the vector database.
  • code/modules/vectorstore/store_manager.py: Instantiates the VectorStoreManager: class to manage the vector database, and all associated methods.
  • code/modules/retriever/retriever.py: Instantiates the Retriever class to create the retriever.

Docker

The HuggingFace Space is built using the Dockerfile in the repository. To run it locally, use the Dockerfile.dev file.

docker build --tag dev  -f Dockerfile.dev .
docker run -it --rm -p 8000:8000 dev

Contributing

Please create an issue if you have any suggestions or improvements, and start working on it by creating a branch and by making a pull request to the main branch.

Please view docs/contribute.md for more information on contributing.