File size: 5,056 Bytes
c549dc5 9e8a430 c549dc5 778df6c a4babd7 bbc648e 778df6c bbc648e a4babd7 778df6c bbc648e f0297ee a4babd7 e924776 a4babd7 4b13120 a4babd7 4b13120 a4babd7 4b13120 a4babd7 4b13120 a4babd7 e924776 4b13120 e924776 a4babd7 4b13120 e924776 a4babd7 e924776 a4babd7 4b13120 a4babd7 4b13120 a4babd7 16d1f50 8c0a20d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
title: KB-VQA
emoji: π₯
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: 1.29.0
app_file: app.py
pinned: false
license: apache-2.0
---
------
# Demonstration Environment
The project demo app can be accessed from the developed [**KB-VQA HF Space**](https://huggingface.co/spaces/m7mdal7aj/KB-VQA), and the entire code can be accessed from [here](https://huggingface.co/spaces/m7mdal7aj/KB-VQA/tree/main).
To run the demo app locally, from the root of the local code repository run `streamlit run app.py`. This will run the whole app. However, to use the **Run Inference Tool**, a GPU is required.
## Project File Structure
Each main python module of the project is extensively documented to guide the reader on what the module role is and how to use it along with its correcponding classes and functions.
Below is the overall file structure of the project:
<pre>
KB-VQA
βββ Files: Various files required for the demo such as samples images, dissertation report ..etc.
βββ models
β βββ deformable-detr-detic: DETIC Object Detection Model.
β βββ yolov5: YOLOv5 Object Detection Model.baseline)
βββ my_model
β βββ KBVQA.py : This module is the central component for implementing the designed model architecture for the Knowledge-Based Visual Question Answering (KB-VQA) project.
β βββ state_manager.py: Manages the user interface and session state to facilitate the Run Inference tool of the Streamlit demo app.
β βββ LLAMA2
β β βββ LLAMA2_model.py: Used for loading LLaMA-2 model to be fine-tuned.
β βββ captioner
β β βββ image_captioning.py: Provides functionality for generating captions for images.
β βββ detector
β β βββ object_detection.py: Used to detect objects in images using object detection models.
β βββ fine_tuner
β β βββ fine_tuner.py: Main Fine-Tuning Script for LLaMa-2 Chat models.
β β βββ fine_tuning_data_handler.py: Handles and prepares the data for fine-tuning LLaMA-2 Chat models.
β β βββ fine_tuning_data
β β β βββfine_tuning_data_detic.csv: Fine-tuning data prepared by the prompt engineering module using DETIC detector.
β β β βββfine_tuning_data_yolov5.csv: Fine-tuning data prepared by the prompt engineering module using YOLOv5. detector.
β βββ results
β β βββ Demo_Images: Contains a pool of images used for the demo app.
β β βββ evaluation.py: Provides a comprehensive framework for evaluating the KB-VQA model.
β β βββ demo.py: Provides a comprehensive framework for visualizing and demonstrating the results of the KB-VQA evaluation.
β β βββ evaluation_results.xlsx : This file contains all the evaluation results based on the evaluation data.
β βββ tabs
β β βββ home.py: Displays an introduction to the application with brief background along with the demo tools description.
β β βββ results.py: Manages the interactive Streamlit demo for visualizing model evaluation results and analysis.
β β βββ run_inference.py: Responsible for the 'run inference' tool to test and use the fine-tuned models.
β β βββ model_arch.py: Displays the model architecture and accompanying abstract and design details
β β βββ dataset_analysis.py: Provides tools for visualizing dataset analyses.
β βββ utilities
β β βββ ui_manager.py: Manages the user interface for the Streamlit application, handling the creation and navigation of various tabs.
β β βββ gen_utilities.py: Provides a collection of utility functions and classes commonly used across various parts
β βββ config (All Configurations files are kept separated and stored as ".py" for easy reading - this will change after the project submission.)
β β βββ kbvqa_config.py: Configuration parameters for the main KB-VQA model.
β β βββ LLAMA2_config.py: Configuration parameters for LLaMA-2 model.
β β βββ captioning_config.py : Configuration parameters for the captioning model (InstructBLIP).
β β βββ dataset_config.py: Configuration parameters for the dataset processing.
β β βββ evaluation_config.py: Configuration parameters for the KB-VQA model evaluation.
β β βββ fine_tuning_config.py: Configurable parameters for the fine-tuning nodule.
β β βββ inference_config.py: Configurable parameters for the Run Inference tool in the demo app.
βββ app.py: main entry point for streamlit - first page in the streamlit app)
βββ README.md (readme - this file)
βββ requirements.txt: Requirements file for the whole project that includes all the requirements for running the demo app on the HuggingFace space environment.
</pre>
**Author: [**Mohammed Bin Ali Alhaj**](https://www.linkedin.com/in/m7mdal7aj)**\n
**Email: m7md.7.al7aj@gmail.com** |