Spaces:
Running
Running
File size: 2,053 Bytes
182ca2f e8ebf39 d251baf bb9453c 182ca2f aa99bda ebe573d d251baf 6170d15 9eb2cb5 7bf070f 35913f3 182ca2f 47ed2dc 7bf070f 182ca2f 6f2a39c 9eb2cb5 6f2a39c bb2c8d3 6170d15 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# DocumentIQA: Scientific Document Insight QA
## Introduction
Question/Answering on scientific documents using LLMs (OpenAI, Mistral, LLama2, etc..).
This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
Differently to most of the project, we focus on scientific articles and we are using [Grobid](https://github.com/kermitt2/grobid) for text extraction instead of the raw PDF2Text converter (which is comparable with most of other solutions) allow to extract only full-text.
**Work in progress**
Demo: https://document-insights.streamlit.app/
## Getting started
- Select the model+embedding combination you want ot use (for LLama2 you must acknowledge their licence both on meta.com and on huggingface. See [here](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)).
- Enter your API Key (Open AI or Huggingface).
- Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress.
- Once the spinner stops, you can proceed to ask your questions
![screenshot1.png](docs%2Fimages%2Fscreenshot1.png)
### Options
#### Context size
Allow to change the number of embedding chunks that are considered for responding. The text chunk are around 250 tokens, which uses around 1000 tokens for each question.
#### Query mode
By default, the mode is set to LLM (Language Model) which enables question/answering. You can directly ask questions related to the document content, and the system will answer the question using content from the document.
If you switch the mode to "Embedding," the system will return specific chunks from the document that are semantically related to your query. This mode helps to test why sometimes the answers are not satisfying or incomplete.
## Acknolwedgement
This project is developed at the [National Institute for Materials Science](https://www.nims.go.jp) (NIMS) in Japan in collaboration with the [Lambard-ML-Team](https://github.com/Lambard-ML-Team).
|