KB-VQA

Running

App Files Files Community

KB-VQA / my_model /tabs /home.py

m7mdal7aj

Update my_model/tabs/home.py

1aca5be verified 6 months ago

raw

history blame contribute delete

6.91 kB

	import streamlit as st
	import streamlit.components.v1 as components


	def run_home() -> None:
	"""
	Displays the home page for the Knowledge-Based Visual Question Answering (KB-VQA) project using Streamlit.
	This function sets up the main home page for demonstrating the project.

	Returns:
	None
	"""

	st.markdown("<br>" * 2, unsafe_allow_html=True)
	st.markdown("""
	<div style="text-align: justify;">

	Welcome to the interactive application for the Knowledge-Based Visual Question Answering (KB-VQA)
	project. This application is an integral part of a
	[Master’s dissertation in Artificial Intelligence](https://info.online.bath.ac.uk/msai/) at the
	[University of Bath](https://www.bath.ac.uk/). As we delve into the fascinating world of VQA, I invite you
	to explore the intersection of visual perception, language understanding, and cutting-edge AI research.
	</div>""",
	unsafe_allow_html=True)

	st.markdown("<br>" * 1, unsafe_allow_html=True)

	st.markdown("### Background")
	with st.expander("Read Background"):
	st.write("""
	<div style="text-align: justify;">

	Since its inception by Alan Turing in 1950, the Turing Test has been a fundamental benchmark for
	evaluating machine intelligence against human standards. As technology evolves, so too must the criteria
	for assessing AI. The Visual Turing Test represents a modern extension that includes visual cognition
	within the scope of AI evaluation. At the forefront of this advancement is **Visual Question Answering
	(VQA)**, a field that challenges AI systems to perceive, comprehend, and articulate insights about
	visual inputs in natural language. This progression reflects the complex interplay between perception
	and cognition that characterizes human intelligence, positioning VQA as a crucial metric for gauging
	AI’s ability to emulate human-like understanding.

	Mature VQA systems hold transformative potential across various domains. In robotics, VQA systems can
	enhance autonomous decision-making by enabling robots to interpret and respond to visual cues. In
	medical imaging and diagnosis, VQA systems can assist healthcare professionals by accurately
	interpreting complex medical images and providing insightful answers to diagnostic questions, thereby
	enhancing both the speed and accuracy of medical assessments. In manufacturing, VQA systems can optimize
	quality control processes by enabling automated systems to identify defects and ensure product
	consistency with minimal human intervention. These advancements underscore the importance of developing
	robust VQA capabilities, as they push the boundaries of the Visual Turing Test and bring us closer to
	achieving true human-like AI cognition.

	Unlike other vision-language tasks, VQA requires many Computer Vision sub-tasks to be solved in the process,
	including: Object recognition, Object detection, Attribute classification, **Scene
	classification, Counting, Activity recognition, Spatial relationships among objects**,
	and Common-sense reasoning. These VQA tasks often do not require external factual knowledge and only
	in rare cases require common-sense reasoning. Furthermore, VQA models cannot derive additional knowledge
	from existing VQA datasets should a question require it, therefore **Knowledge-Based Visual Question
	Answering (KB-VQA)** has been introduced. KB-VQA is a relatively new extension to VQA with datasets
	representing a knowledge-based VQA task where the visual question cannot be answered without external
	knowledge, where the essence of this task is centred around knowledge acquisition and integration with
	the visual contents of the image.
	</div>""",
	unsafe_allow_html=True)

	st.markdown("<br>" * 1, unsafe_allow_html=True)

	st.write("""
	<div style="text-align: justify;">

	This application showcases the advanced capabilities of the KB-VQA model, empowering users to seamlessly
	upload images, pose questions, and obtain answers derived from both visual and textual data.
	By leveraging sophisticated Multimodal Learning techniques, this project bridges the gap between visual
	perception and linguistic interpretation, effectively merging these modalities to provide coherent and
	contextually relevant responses. This research not only showcases the cutting-edge progress in artificial
	intelligence but also pushes the boundaries of AI systems towards passing the Visual Turing Test, where
	machines exhibit human-like understanding and reasoning in processing and responding to visual
	information.
	<br>
	<br>
	### Tools:

	- Dataset Analysis: Provides an overview of the KB-VQA datasets and displays various analysis of the
	OK-VQA dataset.
	- Model Architecture: Displays the model architecture and accompanying abstract and design details for
	the Knowledge-Based Visual Question Answering (KB-VQA) model.
	- Results: Manages the interactive Streamlit demo for visualizing model evaluation results and analysis.
	It provides an interface for users to explore different aspects of the model performance and evaluation
	samples.
	- Run Inference: This tool allows users to run inference to test and use the fine-tuned KB-VQA model
	using various configurations.
	</div>""",
	unsafe_allow_html=True)
	st.markdown("<br>" * 3, unsafe_allow_html=True)
	st.write(" ###### Developed by: [Mohammed Bin Ali AlHaj](https://www.linkedin.com/in/m7mdal7aj)")
	st.write("""
	Credit:
	* The project uses [LLaMA-2](https://ai.meta.com/llama/) for its reasoning capabilities and implicit knowledge
	to derive answers from the supplied visual context. It is made available under
	[Meta LlaMA license](https://ai.meta.com/llama/license/).
	* This application is built on [Streamlit](https://streamlit.io), providing an interactive and user-friendly
	interface.
	""")