File size: 3,500 Bytes
7a233a3
b9be4de
7a233a3
b9be4de
7a233a3
db6b619
7a233a3
849b2ae
7a233a3
 
 
 
db6b619
7a233a3
 
 
db6b619
9a7da99
 
 
 
 
 
 
7a233a3
 
 
 
 
 
db6b619
9a7da99
7a233a3
 
 
db6b619
 
849b2ae
7a233a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1da7ee
 
 
7f989d6
 
 
 
 
 
 
 
da88571
7f989d6
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# DL4DS Tutor πŸƒ

Check out the configuration reference at [Hugging Face Spaces Config Reference](https://huggingface.co/docs/hub/spaces-config-reference).

You can find an implementation of the Tutor at [DL4DS Tutor on Hugging Face](https://dl4ds-dl4ds-tutor.hf.space/), which is hosted on Hugging Face [here](https://huggingface.co/spaces/dl4ds/dl4ds_tutor).

## Running Locally

1. **Clone the Repository**
   ```bash
   git clone https://github.com/DL4DS/dl4ds_tutor
   ```

2. **Put your data under the `storage/data` directory**
   - Add URLs in the `urls.txt` file.
   - Add other PDF files in the `storage/data` directory.

3. **To test Data Loading (Optional)**
   ```bash
   cd code
   python -m modules.dataloader.data_loader
   ```

4. **Create the Vector Database**
   ```bash
   cd code
   python -m modules.vectorstore.store_manager
   ```
   - Note: You need to run the above command when you add new data to the `storage/data` directory, or if the `storage/data/urls.txt` file is updated.
   - Alternatively, you can set `["vectorstore"]["embedd_files"]` to `True` in the `code/modules/config/config.yaml` file, which will embed files from the storage directory every time you run the below chainlit command.

5. **Run the Chainlit App**
   ```bash
   chainlit run main.py
   ```

See the [docs](https://github.com/DL4DS/dl4ds_tutor/tree/main/docs) for more information.

## File Structure

```plaintext
code/
 β”œβ”€β”€ modules
 β”‚   β”œβ”€β”€ chat                # Contains the chatbot implementation
 β”‚   β”œβ”€β”€ chat_processor      # Contains the implementation to process and log the conversations
 β”‚   β”œβ”€β”€ config              # Contains the configuration files
 β”‚   β”œβ”€β”€ dataloader          # Contains the implementation to load the data from the storage directory
 β”‚   β”œβ”€β”€ retriever           # Contains the implementation to create the retriever
 β”‚   └── vectorstore         # Contains the implementation to create the vector database
 β”œβ”€β”€ public
 β”‚   β”œβ”€β”€ logo_dark.png       # Dark theme logo
 β”‚   β”œβ”€β”€ logo_light.png      # Light theme logo
 β”‚   └── test.css            # Custom CSS file
 └── main.py

 
docs/                        # Contains the documentation to the codebase and methods used

storage/
 β”œβ”€β”€ data                    # Store files and URLs here
 β”œβ”€β”€ logs                    # Logs directory, includes logs on vector DB creation, tutor logs, and chunks logged in JSON files
 └── models                  # Local LLMs are loaded from here

vectorstores/                # Stores the created vector databases

.env                         # This needs to be created, store the API keys here
```
- `code/modules/vectorstore/vectorstore.py`: Instantiates the `VectorStore` class to create the vector database.
- `code/modules/vectorstore/store_manager.py`: Instantiates the `VectorStoreManager:` class to manage the vector database, and all associated methods.
- `code/modules/retriever/retriever.py`: Instantiates the `Retriever` class to create the retriever.


## Docker 

The HuggingFace Space is built using the `Dockerfile` in the repository. To run it locally, use the `Dockerfile.dev` file.

```bash
docker build --tag dev  -f Dockerfile.dev .
docker run -it --rm -p 8000:8000 dev
```

## Contributing

Please create an issue if you have any suggestions or improvements, and start working on it by creating a branch and by making a pull request to the main branch.