Spaces:
Sleeping
Sleeping
Update my_model/utilities/ui_manager.py
Browse files
my_model/utilities/ui_manager.py
CHANGED
@@ -14,11 +14,10 @@ class UIManager():
|
|
14 |
self.tabs = {
|
15 |
"Home": self.display_home,
|
16 |
"Dataset Analysis": self.display_dataset_analysis,
|
17 |
-
"
|
18 |
"Run Inference": self.display_run_inference,
|
19 |
"Dissertation Report": self.display_dissertation_report,
|
20 |
-
"Code": self.display_code
|
21 |
-
"More Pages will follow .. ": self.display_placeholder
|
22 |
}
|
23 |
|
24 |
state_manager = StateManager()
|
@@ -67,7 +66,7 @@ class UIManager():
|
|
67 |
with col2:
|
68 |
#st.image("Files/mm.jpeg")
|
69 |
st.header("Abstract")
|
70 |
-
st.write("""
|
71 |
This research explores the task of Knowledge-Based Visual Question Answering (KB-VQA), it examines the influence of Pre-Trained Large Language Models (PT-LLMs) and Pre-Trained Multimodal Models (PT-LMMs), which have transformed the machine learning landscape by utilizing expansive, pre-trained knowledge repositories to tackle complex tasks, thereby enhancing KB-VQA systems.
|
72 |
\nAn examination of existing Knowledge-Based Visual Question Answering (KB-VQA) methodologies led to a refined approach that converts visual content into the linguistic domain, creating detailed captions and object enumerations. This process leverages the implicit knowledge and inferential capabilities of PT-LLMs. The research refines the fine-tuning of PT-LLMs by integrating specialized tokens, enhancing the models’ ability to interpret visual contexts. The research also reviews current image representation techniques and knowledge sources, advocating for the utilization of implicit knowledge in PT-LLMs, especially for tasks that do not require specialized expertise.
|
73 |
\nRigorous ablation experiments conducted to assess the impact of various visual context elements on model performance, with a particular focus on the importance of image descriptions generated during the captioning phase. The study includes a comprehensive analysis of major KB-VQA datasets, specifically the OK-VQA corpus, and critically evaluates the metrics used, incorporating semantic evaluation with GPT-4 to align the assessment with practical application needs.
|
@@ -91,10 +90,10 @@ class UIManager():
|
|
91 |
st.write("This is a Place Holder until the contents are uploaded.")
|
92 |
|
93 |
|
94 |
-
def
|
95 |
"""Displays the Finetuning and Evaluation Results page."""
|
96 |
|
97 |
-
st.title("
|
98 |
st.write("This page demonstrates the fine-tuning and model evaluation results")
|
99 |
st.write("\n")
|
100 |
evaluator = KBVQAEvaluator()
|
|
|
14 |
self.tabs = {
|
15 |
"Home": self.display_home,
|
16 |
"Dataset Analysis": self.display_dataset_analysis,
|
17 |
+
"Results": self.display_results,
|
18 |
"Run Inference": self.display_run_inference,
|
19 |
"Dissertation Report": self.display_dissertation_report,
|
20 |
+
"Code": self.display_code
|
|
|
21 |
}
|
22 |
|
23 |
state_manager = StateManager()
|
|
|
66 |
with col2:
|
67 |
#st.image("Files/mm.jpeg")
|
68 |
st.header("Abstract")
|
69 |
+
st.write("""\n\nNavigating the frontier of the Visual Turing Test, this research delves into multimodal learning to bridge the gap between visual perception and linguistic interpretation, a foundational challenge in artificial intelligence. It scrutinizes the integration of visual cognition and external knowledge, emphasizing the pivotal role of the Transformer model in enhancing language processing and supporting complex multimodal tasks.
|
70 |
This research explores the task of Knowledge-Based Visual Question Answering (KB-VQA), it examines the influence of Pre-Trained Large Language Models (PT-LLMs) and Pre-Trained Multimodal Models (PT-LMMs), which have transformed the machine learning landscape by utilizing expansive, pre-trained knowledge repositories to tackle complex tasks, thereby enhancing KB-VQA systems.
|
71 |
\nAn examination of existing Knowledge-Based Visual Question Answering (KB-VQA) methodologies led to a refined approach that converts visual content into the linguistic domain, creating detailed captions and object enumerations. This process leverages the implicit knowledge and inferential capabilities of PT-LLMs. The research refines the fine-tuning of PT-LLMs by integrating specialized tokens, enhancing the models’ ability to interpret visual contexts. The research also reviews current image representation techniques and knowledge sources, advocating for the utilization of implicit knowledge in PT-LLMs, especially for tasks that do not require specialized expertise.
|
72 |
\nRigorous ablation experiments conducted to assess the impact of various visual context elements on model performance, with a particular focus on the importance of image descriptions generated during the captioning phase. The study includes a comprehensive analysis of major KB-VQA datasets, specifically the OK-VQA corpus, and critically evaluates the metrics used, incorporating semantic evaluation with GPT-4 to align the assessment with practical application needs.
|
|
|
90 |
st.write("This is a Place Holder until the contents are uploaded.")
|
91 |
|
92 |
|
93 |
+
def display_results(self):
|
94 |
"""Displays the Finetuning and Evaluation Results page."""
|
95 |
|
96 |
+
st.title("Results")
|
97 |
st.write("This page demonstrates the fine-tuning and model evaluation results")
|
98 |
st.write("\n")
|
99 |
evaluator = KBVQAEvaluator()
|