import streamlit as st import streamlit.components.v1 as components def run_home() -> None: """ Displays the home page for the Knowledge-Based Visual Question Answering (KB-VQA) project using Streamlit. This function sets up the main home page for demonstrating the project. Returns: None """ st.markdown("""
\n\n\n**Welcome to the interactive application for the Knowledge-Based Visual Question Answering (KB-VQA) project. This application is an integral part of a [Master’s dissertation in Artificial Intelligence](https://info.online.bath.ac.uk/msai/) at the [University of Bath](https://www.bath.ac.uk/). As we delve into the fascinating world of VQA, I invite you to explore the intersection of visual perception, language understanding, and cutting-edge AI research.**
""", unsafe_allow_html=True) st.markdown("### Background") with st.expander("Read Background"): st.write("""
Since its inception by **Alan Turing** in 1950, the **Turing Test** has been a fundamental benchmark for evaluating machine intelligence against human standards. As technology evolves, so too must the criteria for assessing AI. The **Visual Turing Test** represents a modern extension that includes visual cognition within the scope of AI evaluation. At the forefront of this advancement is **Visual Question Answering (VQA)**, a field that challenges AI systems to perceive, comprehend, and articulate insights about visual inputs in natural language. This progression reflects the complex interplay between perception and cognition that characterizes human intelligence, positioning VQA as a crucial metric for gauging AI’s ability to emulate human-like understanding. Mature VQA systems hold transformative potential across various domains. In robotics, VQA systems can enhance autonomous decision-making by enabling robots to interpret and respond to visual cues. In medical imaging and diagnosis, VQA systems can assist healthcare professionals by accurately interpreting complex medical images and providing insightful answers to diagnostic questions, thereby enhancing both the speed and accuracy of medical assessments. In manufacturing, VQA systems can optimize quality control processes by enabling automated systems to identify defects and ensure product consistency with minimal human intervention. These advancements underscore the importance of developing robust VQA capabilities, as they push the boundaries of the Visual Turing Test and bring us closer to achieving true human-like AI cognition. Unlike other vision-language tasks, VQA requires many Computer Vision sub-tasks to be solved in the process, including: **Object recognition**, **Object detection**, **Attribute classification**, **Scene classification**, **Counting**, **Activity recognition**, **Spatial relationships among objects**, and **Commonsense reasoning**. These VQA tasks often do not require external factual knowledge and only in rare cases require common-sense reasoning. Furthermore, VQA models cannot derive additional knowledge from existing VQA datasets should a question require it, therefore **Knowledge-Based Visual Question Answering (KB-VQA)** has been introduced. KB-VQA is a relatively new extension to VQA with datasets representing a knowledge-based VQA task where the visual question cannot be answered without external knowledge, where the essence of this task is centred around knowledge acquisition and integration with the visual contents of the image.
""", unsafe_allow_html=True) st.write("""
This application showcases the advanced capabilities of the KB-VQA model, empowering users to seamlessly upload images, pose questions, and obtain answers derived from both visual and textual data. By leveraging sophisticated Multimodal Learning techniques, this project bridges the gap between visual perception and linguistic interpretation, effectively merging these modalities to provide coherent and contextually relevant responses. This research not only showcases the cutting-edge progress in artificial intelligence but also pushes the boundaries of AI systems towards passing the **Visual Turing Test**, where machines exhibit **human-like** understanding and reasoning in processing and responding to visual information. ### Tools: - **Dataset Analysis**: Provides an overview of the KB-VQA datasets and displays various analysis of the OK-VQA dataset. - **Model Architecture**: Displays the model architecture and accompanying abstract and design details for the Knowledge-Based Visual Question Answering (KB-VQA) model. - **Results**: Manages the interactive Streamlit demo for visualizing model evaluation results and analysis. It provides an interface for users to explore different aspects of the model performance and evaluation samples. - **Run Inference**: This tool allows users to run inference to test and use the fine-tuned KB-VQA model using various configurations.
""", unsafe_allow_html=True) st.markdown("
" * 1, unsafe_allow_html=True) st.write(" ##### Developed by: [Mohammed Bin Ali AlHaj](https://www.linkedin.com/in/m7mdal7aj)") st.markdown("
" * 2, unsafe_allow_html=True) st.write(""" **Credit:** * The project uses [LLaMA-2](https://ai.meta.com/llama/) for its reasoning capabilities and implicit knowledge to derive answers from the supplied visual context. It is made available under [Meta LlaMA license](https://ai.meta.com/llama/license/). * This application is built on [Streamlit](https://streamlit.io), providing an interactive and user-friendly interface. """)