File size: 6,909 Bytes
aa9fe38 484de35 c87bfdc aa9fe38 c87bfdc aa9fe38 189bdf9 c87bfdc 189bdf9 095898c c87bfdc 095898c c87bfdc 524a7ff c87bfdc d998b63 c87bfdc fefd0d0 c87bfdc 814ed9a 524a7ff c87bfdc 7b5b75c f95cb86 c87bfdc 4579a89 1aca5be c87bfdc 6dfa05e c87bfdc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
import streamlit as st
import streamlit.components.v1 as components
def run_home() -> None:
"""
Displays the home page for the Knowledge-Based Visual Question Answering (KB-VQA) project using Streamlit.
This function sets up the main home page for demonstrating the project.
Returns:
None
"""
st.markdown("<br>" * 2, unsafe_allow_html=True)
st.markdown("""
<div style="text-align: justify;">
Welcome to the interactive application for the **Knowledge-Based Visual Question Answering (KB-VQA)**
project. This application is an integral part of a
[Master’s dissertation in Artificial Intelligence](https://info.online.bath.ac.uk/msai/) at the
[University of Bath](https://www.bath.ac.uk/). As we delve into the fascinating world of VQA, I invite you
to explore the intersection of visual perception, language understanding, and cutting-edge AI research.
</div>""",
unsafe_allow_html=True)
st.markdown("<br>" * 1, unsafe_allow_html=True)
st.markdown("### Background")
with st.expander("Read Background"):
st.write("""
<div style="text-align: justify;">
Since its inception by **Alan Turing** in 1950, the **Turing Test** has been a fundamental benchmark for
evaluating machine intelligence against human standards. As technology evolves, so too must the criteria
for assessing AI. The **Visual Turing Test** represents a modern extension that includes visual cognition
within the scope of AI evaluation. At the forefront of this advancement is **Visual Question Answering
(VQA)**, a field that challenges AI systems to perceive, comprehend, and articulate insights about
visual inputs in natural language. This progression reflects the complex interplay between perception
and cognition that characterizes human intelligence, positioning VQA as a crucial metric for gauging
AI’s ability to emulate human-like understanding.
Mature VQA systems hold transformative potential across various domains. In robotics, VQA systems can
enhance autonomous decision-making by enabling robots to interpret and respond to visual cues. In
medical imaging and diagnosis, VQA systems can assist healthcare professionals by accurately
interpreting complex medical images and providing insightful answers to diagnostic questions, thereby
enhancing both the speed and accuracy of medical assessments. In manufacturing, VQA systems can optimize
quality control processes by enabling automated systems to identify defects and ensure product
consistency with minimal human intervention. These advancements underscore the importance of developing
robust VQA capabilities, as they push the boundaries of the Visual Turing Test and bring us closer to
achieving true human-like AI cognition.
Unlike other vision-language tasks, VQA requires many Computer Vision sub-tasks to be solved in the process,
including: **Object recognition**, **Object detection**, **Attribute classification**, **Scene
classification**, **Counting**, **Activity recognition**, **Spatial relationships among objects**,
and **Common-sense reasoning**. These VQA tasks often do not require external factual knowledge and only
in rare cases require common-sense reasoning. Furthermore, VQA models cannot derive additional knowledge
from existing VQA datasets should a question require it, therefore **Knowledge-Based Visual Question
Answering (KB-VQA)** has been introduced. KB-VQA is a relatively new extension to VQA with datasets
representing a knowledge-based VQA task where the visual question cannot be answered without external
knowledge, where the essence of this task is centred around knowledge acquisition and integration with
the visual contents of the image.
</div>""",
unsafe_allow_html=True)
st.markdown("<br>" * 1, unsafe_allow_html=True)
st.write("""
<div style="text-align: justify;">
This application showcases the advanced capabilities of the KB-VQA model, empowering users to seamlessly
upload images, pose questions, and obtain answers derived from both visual and textual data.
By leveraging sophisticated Multimodal Learning techniques, this project bridges the gap between visual
perception and linguistic interpretation, effectively merging these modalities to provide coherent and
contextually relevant responses. This research not only showcases the cutting-edge progress in artificial
intelligence but also pushes the boundaries of AI systems towards passing the **Visual Turing Test**, where
machines exhibit **human-like** understanding and reasoning in processing and responding to visual
information.
<br>
<br>
### Tools:
- **Dataset Analysis**: Provides an overview of the KB-VQA datasets and displays various analysis of the
OK-VQA dataset.
- **Model Architecture**: Displays the model architecture and accompanying abstract and design details for
the Knowledge-Based Visual Question Answering (KB-VQA) model.
- **Results**: Manages the interactive Streamlit demo for visualizing model evaluation results and analysis.
It provides an interface for users to explore different aspects of the model performance and evaluation
samples.
- **Run Inference**: This tool allows users to run inference to test and use the fine-tuned KB-VQA model
using various configurations.
</div>""",
unsafe_allow_html=True)
st.markdown("<br>" * 3, unsafe_allow_html=True)
st.write(" ###### Developed by: [Mohammed Bin Ali AlHaj](https://www.linkedin.com/in/m7mdal7aj)")
st.write("""
**Credit:**
* The project uses [LLaMA-2](https://ai.meta.com/llama/) for its reasoning capabilities and implicit knowledge
to derive answers from the supplied visual context. It is made available under
[Meta LlaMA license](https://ai.meta.com/llama/license/).
* This application is built on [Streamlit](https://streamlit.io), providing an interactive and user-friendly
interface.
""")
|