clementperrot (Clement Perrot)

New activity in aws-neuron/optimum-neuron-cache about 1 month ago

[Cache Request] NousResearch/Hermes-3-Llama-3.1-8B

#235 opened about 1 month ago by

clementperrot

Reacted to clem's post with 👍 11 months ago

Post

Is synthetic data the future of AI? 🔥🔥🔥

@HugoLaurencon @Leyo & @VictorSanh are introducing HuggingFaceM4/WebSight , a multimodal dataset featuring 823,000 pairs of synthetically generated HTML/CSS codes along with screenshots of the corresponding rendered websites to train GPT4-V-like models 🌐💻

While crafting their upcoming foundation vision language model, they faced the challenge of converting website screenshots into usable HTML/CSS codes. Most VLMs suck at this and there was no public dataset available for this specific task, so they decided to create their own.

They prompted existing LLMs to generate 823k HTML/CSS codes of very simple websites. Through supervised fine-tuning of a vision language model on WebSight, they were able to generate the code to reproduce a website component, given a screenshot.

You can explore the dataset here: HuggingFaceM4/WebSight

What do you think?

12 replies

·

updated a model 11 months ago

clementperrot/test

Updated Jan 12

Clement Perrot

AI & ML interests

Recent Activity

Organizations

clementperrot's activity

[Cache Request] NousResearch/Hermes-3-Llama-3.1-8B

clementperrot/test