Spaces:
Running
on
Zero
Running
on
Zero
fancyfeast
commited on
Commit
β’
3b7cfa9
1
Parent(s):
272707e
Please, god, please work this time. No more commits.
Browse files
app.py
CHANGED
@@ -13,24 +13,21 @@ MODEL_PATH = "fancyfeast/llama-joycaption-alpha-two-vqa-test-1"
|
|
13 |
TITLE = "<h1><center>JoyCaption Alpha Two - VQA Test - (2024-11-25a)</center></h1>"
|
14 |
DESCRIPTION = """
|
15 |
<div>
|
16 |
-
<p>π¨π¨π¨ BY USING THIS SPACE YOU AGREE THAT YOUR QUERIES (but not images) <i>MAY</i> BE LOGGED AND COLLECTED ANONYMOUSLY π¨π¨π¨</p>
|
17 |
<p>π§ͺπ§ͺπ§ͺ This an experiment to see how well JoyCaption Alpha Two can learn to answer questions about images and follow instructions.
|
18 |
-
I've only finetuned it on 600 examples, so it is highly experimental, very weak, broken, and volatile
|
19 |
-
I thought it was performing surprisingly well and wanted to share
|
20 |
-
<p
|
21 |
-
|
|
|
22 |
"Write a caption but don't use any ambigious language, and make sure you mention that the image is from Instagram.", and
|
23 |
"Output JSON with the following properties: 'skin_tone', 'hair_style', 'hair_length', 'clothing', 'background'." Remember that this was only finetuned on
|
24 |
600 VQA/instruction examples, so it is _very_ limited right now. Expect it to frequently fallback to its base behavior of just writing image descriptions.
|
25 |
Expect accuracy to be lower. Expect glitches. Despite that, I've found that it will follow most queries I've tested it with, even outside its training,
|
26 |
with enough coaxing and re-rolling.</p>
|
27 |
-
<p
|
28 |
-
I cannot see what images you send, and frankly, I don't want to. But knowing what kinds of instructions
|
29 |
-
help guide me in building JoyCaption's VQA dataset.
|
30 |
-
|
31 |
-
direct how JoyCaption writes descriptions and captions. So I'm building my own dataset, that will be made public. So, with peace and love, this space logs the text
|
32 |
-
queries. As always, the model itself is completely public and free to use outside of this space. And, of course, I have no control nor access to what HuggingFace,
|
33 |
-
which are graciously hosting this space, log.</p>
|
34 |
</div>
|
35 |
"""
|
36 |
|
@@ -170,7 +167,6 @@ textbox = gr.MultimodalTextbox(file_types=["image"], file_count="single")
|
|
170 |
|
171 |
with gr.Blocks() as demo:
|
172 |
gr.HTML(TITLE)
|
173 |
-
gr.Markdown(DESCRIPTION)
|
174 |
chat_interface = gr.ChatInterface(
|
175 |
fn=chat_joycaption,
|
176 |
chatbot=chatbot,
|
@@ -201,11 +197,7 @@ with gr.Blocks() as demo:
|
|
201 |
gr.Checkbox(label="Help improve JoyCaption by logging your text query", value=True, render=False),
|
202 |
],
|
203 |
)
|
204 |
-
|
205 |
-
def new_trim_history(self, message, history_with_input):
|
206 |
-
return message, []
|
207 |
-
|
208 |
-
chat_interface._process_msg_and_trim_history = new_trim_history.__get__(chat_interface, chat_interface.__class__)
|
209 |
|
210 |
|
211 |
if __name__ == "__main__":
|
|
|
13 |
TITLE = "<h1><center>JoyCaption Alpha Two - VQA Test - (2024-11-25a)</center></h1>"
|
14 |
DESCRIPTION = """
|
15 |
<div>
|
|
|
16 |
<p>π§ͺπ§ͺπ§ͺ This an experiment to see how well JoyCaption Alpha Two can learn to answer questions about images and follow instructions.
|
17 |
+
I've only finetuned it on 600 examples, so it is **highly experimental, very weak, broken, and volatile**. But for only training 600 examples,
|
18 |
+
I thought it was performing surprisingly well and wanted to share.</p>
|
19 |
+
<p>**This model cannot see any chat history.**</p>
|
20 |
+
<p>π§π¬πΈ Unlike JoyCaption Alpha Two, you can ask this finetune questions about the image, like "What is he holding in his hand?", "Where might this be?",
|
21 |
+
and "What are they wearing?". It can also follow instructions, like "Write me a poem about this image",
|
22 |
"Write a caption but don't use any ambigious language, and make sure you mention that the image is from Instagram.", and
|
23 |
"Output JSON with the following properties: 'skin_tone', 'hair_style', 'hair_length', 'clothing', 'background'." Remember that this was only finetuned on
|
24 |
600 VQA/instruction examples, so it is _very_ limited right now. Expect it to frequently fallback to its base behavior of just writing image descriptions.
|
25 |
Expect accuracy to be lower. Expect glitches. Despite that, I've found that it will follow most queries I've tested it with, even outside its training,
|
26 |
with enough coaxing and re-rolling.</p>
|
27 |
+
<p>π¨π¨π¨ If the "Help improve JoyCaption" box is checked, the _text_ query you write will be logged and I _might_ use it to help improve JoyCaption.
|
28 |
+
It does not log images, user data, etc; only the text query. I cannot see what images you send, and frankly, I don't want to. But knowing what kinds of instructions
|
29 |
+
and queries users want JoyCaption to handle will help guide me in building JoyCaption's VQA dataset. This dataset will be made public. As always, the model itself is completely
|
30 |
+
public and free to use outside of this space. And, of course, I have no control nor access to what HuggingFace, which are graciously hosting this space, collects.</p>
|
|
|
|
|
|
|
31 |
</div>
|
32 |
"""
|
33 |
|
|
|
167 |
|
168 |
with gr.Blocks() as demo:
|
169 |
gr.HTML(TITLE)
|
|
|
170 |
chat_interface = gr.ChatInterface(
|
171 |
fn=chat_joycaption,
|
172 |
chatbot=chatbot,
|
|
|
197 |
gr.Checkbox(label="Help improve JoyCaption by logging your text query", value=True, render=False),
|
198 |
],
|
199 |
)
|
200 |
+
gr.Markdown(DESCRIPTION)
|
|
|
|
|
|
|
|
|
201 |
|
202 |
|
203 |
if __name__ == "__main__":
|