Spaces:

izammohammed
/

geminsights

Sleeping

App Files Files Community

izammohammed commited on Feb 8

Commit

dd65c5d

•

1 Parent(s): 773d205

added all of the files

Browse files

Files changed (6) hide show

README.md +1 -13
app.py +127 -1
credentials.json +1 -0
prompt.txt +34 -0
requirements.txt +17 -0
utils.py +11 -0

README.md CHANGED Viewed

@@ -1,13 +1 @@
----
-title: Geminsights
-emoji: 👀
-colorFrom: pink
-colorTo: pink
-sdk: streamlit
-sdk_version: 1.31.0
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


1	+ [Original repository](https://github.com/izam-mohammed/GemInsights)

app.py CHANGED Viewed

	@@ -1 +1,127 @@
1	- ~~hola.py~~

+import streamlit as st
+import pandas as pd
+import os
+from utils import save_json, load_json
+from markdown import markdown
+from utils import load_json
+from autoviz import AutoViz_Class
+import base64
+from google.cloud import aiplatform
+import base64
+import vertexai
+from vertexai.preview.generative_models import GenerativeModel, Part
+import json
+#setup cloud
+aiplatform.init(
+    project = "geminsights",
+    location="us-central1"
+    )
+json_file = json.loads(st.secrets["credentials"], strict=False)
+with open("credentials.json", "w") as f:
+    json.dump(json_file, f, indent=2)
+os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "credentials.json"
+dataframe = None
+st.title("GemInsights 📊")
+st.caption('A gemini powered data analysis tool to get insights from data 🔥')
+file = st.file_uploader(
+    "Pick a dataframe", type=["csv", "xlsx"], accept_multiple_files=False
+)
+if file is not None:
+    _, extension = os.path.splitext(file.name)
+    if extension == ".csv":
+        dataframe = pd.read_csv(file)
+    else:
+        dataframe = pd.read_excel(file)
+    st.write(dataframe.head())
+    st.write(f"updated a dataframe with shape {dataframe.shape}")
+if file is not None:
+    text_input = st.text_input(
+        "Enter something about the data 👇",
+        label_visibility="visible",
+        disabled=False,
+        placeholder="eg:- This is a sales dataframe",
+    )
+    option = st.selectbox(
+        "Which is the target column ? 🎯",
+        tuple(list(dataframe.columns)),
+        index=None,
+        placeholder="Select one column in here",
+    )
+def plot(dataframe, target):
+    AV = AutoViz_Class()
+    dft = AV.AutoViz(
+    "",
+    sep=",",
+    depVar=target,
+    dfte=dataframe,
+    header=0,
+    verbose=2,
+    lowess=False,
+    chart_format="jpg",
+    max_rows_analyzed=500,
+    max_cols_analyzed=20,
+    save_plot_dir="plots",
+    )
+def prompt_make(dataframe, target, info):
+    images = []
+    image_dir = f"plots/{target}"
+    image_files = os.listdir(image_dir)
+    for image_file in image_files:
+        image_path = os.path.join(image_dir, image_file)
+        img = open(image_path, "rb").read()
+        img_bytes = Part.from_data(
+            base64.b64decode(base64.encodebytes(img)), mime_type="image/jpeg"
+        )
+        images.append(img_bytes)
+    with open("prompt.txt", "rb") as file:
+        data = file.read()
+    prompt = f"{data}\n Here are some of the informations related to the dataset - '{info}'"
+    # print(f"{prompt}")
+    # print(images)
+    return prompt, images
+def generate_res(prompt, images):
+    print("prompting ...")
+    model = GenerativeModel("gemini-pro-vision")
+    responses = model.generate_content(
+        [prompt]+images,
+        generation_config={
+            "max_output_tokens": 2048,
+            "temperature": 0.4,
+            "top_p": 1,
+            "top_k": 32
+        },
+    )
+    return responses.text
+def generate(dataframe, text_input, option):
+    plot(dataframe, option)
+    prompt, images = prompt_make(dataframe, option, text_input)
+    res = generate_res(prompt, images)
+    return res
+if st.button("Get Insights", type="primary"):
+    st.write("generating insights ⏳ ... ")
+    # running the pipeline
+    response = generate(dataframe, text_input, option)
+    res = markdown(response)
+    st.markdown(res, unsafe_allow_html=True)
+else:
+    st.write("")

credentials.json ADDED Viewed

	@@ -0,0 +1 @@


1	+

prompt.txt ADDED Viewed

	@@ -0,0 +1,34 @@

+Act as an intelligent data Analyst who communicates in simple English and clear messages to the clients
+give maximum of 10 insights from the data
+We build an end-to-end application that internally involves visualizing datasets, and we aim to extract valuable insights from these visualizations using llm. The insights generated should be beneficial to both companies and end-users. It's crucial that the model refrains from explicitly mentioning the images and provides information in a clear, detailed, and actionable manner.
+give the insights by considering the following points
+Here are important notes for output generation:
+- Analyze the visual elements within the dataset using the visualizations.
+- Identify and describe any prominent trends, patterns, or anomalies observed in the visual representations.
+- Derive insights that are specifically relevant to the industry or domain associated with the dataset.
+- Emphasize actionable information that could be of value to companies operating in that industry.
+- Explore the possibility of making predictions based on the visual content.
+- Formulate insights that would be valuable from an end-user perspective.
+- Consider how the extracted information can enhance user experience, decision-making, or engagement.
+- Do not mention the images directly in your responses. Focus on conveying insights without explicitly stating the visual content.
+- Ensure that the insights are presented in a language suitable for technical and non-technical audiences. I encourage you to give clear, detailed explanations.
+- Prioritize insights that are actionable and can contribute to informed decision-making for both businesses and end-users.
+- If there are any recognized design patterns or industry standards applicable to the analysis, please incorporate and explain them.
+Note to Model:
+- Do not explicitly reference the images in your responses.
+- Focus on providing clear, detailed, and actionable insights.
+- Ensure that the insights are presented in a language suitable for technical and non-technical audiences.
+Remember to adapt the prompt based on the specific details of your dataset and the objectives of your application.
+Give important actionable insights rather than giving all. give as pointwise. don't mention the visualizations of plots in the output.
+don't use too much statistics jargon either.
+Output example:
+  if the visualization indicates customer churn data: give a response like this -
+    - The male customers are staying so long in the business
+    - You have to focus on the happiness rate of each customer
+    - Customers who are longer than 2 years tend to stay longer with the business
+    - Customers in the kid's products category are leaving too early.

requirements.txt ADDED Viewed

	@@ -0,0 +1,17 @@

+google-generativeai
+pandas
+numpy
+matplotlib
+seaborn
+python-box
+pexpect
+streamlit
+dataframe_image
+jinja2
+PyYAML
+autoviz
+ipython
+google-cloud-aiplatform
+markdown
+llama-index
+openpyxl

utils.py ADDED Viewed

	@@ -0,0 +1,11 @@

+import json
+from box import ConfigBox
+def load_json(file):
+    with open(path) as f:
+        content = json.load(f)
+    return ConfigBox(content)
+def save_json(file, content):
+    with open(path, "w") as f:
+        json.dump(data, f, indent=4)