File size: 3,594 Bytes
89c1950
 
 
 
6161dc5
 
 
 
89c1950
7a233a3
b9be4de
7a233a3
b9be4de
7a233a3
db6b619
7a233a3
849b2ae
1e2550f
 
7a233a3
 
 
 
db6b619
7a233a3
 
 
db6b619
9a7da99
 
 
 
 
 
 
7a233a3
 
 
 
 
db6b619
9a7da99
7a233a3
 
 
db6b619
 
849b2ae
7a233a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1da7ee
 
 
7f989d6
 
 
 
 
 
 
 
da88571
7f989d6
 
 
 
1e2550f
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
title: AI Class Tutor
description: An LLM based AI class tutor with RAG on DL4DS course
emoji: 🐢
colorFrom: red
colorTo: green
sdk: docker
app_port: 7860
---
# DL4DS Tutor πŸƒ

Check out the configuration reference at [Hugging Face Spaces Config Reference](https://huggingface.co/docs/hub/spaces-config-reference).

You can find an implementation of the Tutor at [DL4DS Tutor on Hugging Face](https://dl4ds-dl4ds-tutor.hf.space/), which is hosted on Hugging Face [here](https://huggingface.co/spaces/dl4ds/dl4ds_tutor).

## Running Locally

Please view `docs/setup.md` for more information on setting up the project.

1. **Clone the Repository**
   ```bash
   git clone https://github.com/DL4DS/dl4ds_tutor
   ```

2. **Put your data under the `storage/data` directory**
   - Add URLs in the `urls.txt` file.
   - Add other PDF files in the `storage/data` directory.

3. **To test Data Loading (Optional)**
   ```bash
   cd code
   python -m modules.dataloader.data_loader
   ```

4. **Create the Vector Database**
   ```bash
   cd code
   python -m modules.vectorstore.store_manager
   ```
   - Note: You need to run the above command when you add new data to the `storage/data` directory, or if the `storage/data/urls.txt` file is updated.

5. **Run the Chainlit App**
   ```bash
   chainlit run main.py
   ```

See the [docs](https://github.com/DL4DS/dl4ds_tutor/tree/main/docs) for more information.

## File Structure

```plaintext
code/
 β”œβ”€β”€ modules
 β”‚   β”œβ”€β”€ chat                # Contains the chatbot implementation
 β”‚   β”œβ”€β”€ chat_processor      # Contains the implementation to process and log the conversations
 β”‚   β”œβ”€β”€ config              # Contains the configuration files
 β”‚   β”œβ”€β”€ dataloader          # Contains the implementation to load the data from the storage directory
 β”‚   β”œβ”€β”€ retriever           # Contains the implementation to create the retriever
 β”‚   └── vectorstore         # Contains the implementation to create the vector database
 β”œβ”€β”€ public
 β”‚   β”œβ”€β”€ logo_dark.png       # Dark theme logo
 β”‚   β”œβ”€β”€ logo_light.png      # Light theme logo
 β”‚   └── test.css            # Custom CSS file
 └── main.py

 
docs/                        # Contains the documentation to the codebase and methods used

storage/
 β”œβ”€β”€ data                    # Store files and URLs here
 β”œβ”€β”€ logs                    # Logs directory, includes logs on vector DB creation, tutor logs, and chunks logged in JSON files
 └── models                  # Local LLMs are loaded from here

vectorstores/                # Stores the created vector databases

.env                         # This needs to be created, store the API keys here
```
- `code/modules/vectorstore/vectorstore.py`: Instantiates the `VectorStore` class to create the vector database.
- `code/modules/vectorstore/store_manager.py`: Instantiates the `VectorStoreManager:` class to manage the vector database, and all associated methods.
- `code/modules/retriever/retriever.py`: Instantiates the `Retriever` class to create the retriever.


## Docker 

The HuggingFace Space is built using the `Dockerfile` in the repository. To run it locally, use the `Dockerfile.dev` file.

```bash
docker build --tag dev  -f Dockerfile.dev .
docker run -it --rm -p 8000:8000 dev
```

## Contributing

Please create an issue if you have any suggestions or improvements, and start working on it by creating a branch and by making a pull request to the main branch.

Please view `docs/contribute.md` for more information on contributing.