<h1 align=center> Contextual RAG </h1>

![anthropic blog poas](https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F2496e7c6fedd7ffaa043895c23a4089638b0c21b-3840x2160.png&w=3840&q=75)

This is an approach proposed by Anthropic in a recent [blog poas](https://www.anthropic.com/news/contextual-retrieval). It involves improving retrieval by providing each document chunk with an in context summary.

<h2 align=center> Problems </h2>

As one may gather from the explanation, there is a requirement that each chunk be appropriately contextualized with respect to the rest of the document. So essentially the whole document has to be passed into the prompt each time along with the chunk. There are two problems with this:

1. This would be very expensive in terms of input token count.
2. For models with smaller context windows, the whole document may exceed it.( Further, there is a sense in which fitting a whole document into a models context width defeats the point of performing RAG.)


<h2 align=center> Whole Document Summarization </h2>

The solution I have come up with is to instead summarize the document into a more manageable size.

<h3 align=center> Refine </h3>

In [1]:
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain_text_splitters import CharacterTextSplitter
from langchain.document_loaders import PyMuPDFLoader

In [2]:
from langchain.chains.summarize import load_summarize_chain

In [3]:
# from langchain_google_genai import ChatGoogleGenerativeAI
# import os
# from dotenv import load_dotenv

# if not load_dotenv():
#     print("API keys may not have been loaded succesfully")
# google_api_key = os.getenv("GOOGLE_API_KEY")

# llm = ChatGoogleGenerativeAI(model="gemini-pro", api_key=google_api_key)

In [4]:
from langchain_ollama.llms import OllamaLLM

# A lightweigh model for local inference
llm = OllamaLLM(model="llama3.2:1b-instruct-q4_K_M")

In [5]:
loader = PyMuPDFLoader("data/State Machines.pdf")
docs = loader.load()

In [21]:
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=8000, chunk_overlap=0)
split_docs = text_splitter.split_documents(docs)

In [None]:
output_key = "output_text"

In [22]:
prompt = """
                  Please provide a very comprehensive summary of the following text.
                  WHile maintaining lower level detail
                  
                  TEXT: {text}
                  SUMMARY:
                  """

question_prompt = PromptTemplate(
    template=prompt, input_variables=["text"]
)

refine_prompt_template = """
              Write a comprehensive summary of the following text delimited by triple backquotes.
              Your goal will be to give a high level overview while also expounding on some finer details of the text

              ```{text}```
              
            Have your answer in about 1500 words
              """


refine_template = PromptTemplate(
    template=refine_prompt_template, input_variables=["text"]
)

# Load refine chain
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    question_prompt=question_prompt,
    refine_prompt=refine_template,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key=output_key,
)
result = chain({"input_documents": split_docs}, return_only_outputs=True)

In [23]:
from IPython.display import display, Markdown

In [24]:
display(Markdown(result[output_key]))

Here is a summary of the text:

A state machine is a mathematical model that describes how an output signal is generated from an input signal step-by-step. It consists of five main components: 

1. States (representing different states or conditions)
2. Inputs (input signals, such as letters or symbols)
3. Outputs (output signals, which represent the actual output based on the input and state)
4. Update function (a way to modify the current state based on the inputs and outputs)
5. Initial State (the starting point of the machine)

An example is given where a state machine is defined with three states: States, Inputs, Outputs. The initial state is also provided as an option.

The key points are:

* Time is not involved in this model; instead, step numbers refer to the order in which steps occur.
* Each input signal can be represented by an infinite sequence of symbols, such as a natural number sequence (e.g., 0 -> Inputs).
* The state machine evolves or "moves" from one state to another based on the inputs and outputs.

This model is used for various applications, including control systems, data processing, and communication systems.

<h3 align=center> Remarks </h3>

Refine is properly configured but we ran into this error.

```python
ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).
```

This is a problem on the part of our llm provider not the code.

<h3 align=center> Next Steps </h3>

The best approach will be to use local models to achive this kind of heavy inference. For that we will turn to either **Ollama** or hugging face **Transformers**.