Vik Korrapati PRO

vikhyatk

AI & ML interests

None yet

Recent Activity

updated a collection about 18 hours ago
pixmo
updated a model 1 day ago
vikhyatk/moondream-next
View all activity

Organizations

Blog-explorers's profile picture ZeroGPU Explorers's profile picture moondream's profile picture Social Post Explorers's profile picture

vikhyatk's activity

Reacted to Xenova's post with โค๏ธ๐Ÿ”ฅ 12 days ago
view post
Post
4655
Have you tried out ๐Ÿค— Transformers.js v3? Here are the new features:
โšก WebGPU support (up to 100x faster than WASM)
๐Ÿ”ข New quantization formats (dtypes)
๐Ÿ› 120 supported architectures in total
๐Ÿ“‚ 25 new example projects and templates
๐Ÿค– Over 1200 pre-converted models
๐ŸŒ Node.js (ESM + CJS), Deno, and Bun compatibility
๐Ÿก A new home on GitHub and NPM

Get started with npm i @huggingface/transformers.

Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3
  • 3 replies
ยท
posted an update about 1 month ago
view post
Post
1629
Just released a dataset with 7000+ hours of synthetically generated lo-fi music. vikhyatk/lofi
posted an update 3 months ago
view post
Post
4470
Pushed a new update to vikhyatk/moondream2 today. TextVQA up from 60.2 to 65.2, DocVQA up from 61.9 to 70.5.

Space has been updated to the new model if you want to try it out! vikhyatk/moondream2
Reacted to Csplk's post with ๐Ÿ”ฅ 3 months ago
view post
Post
2282
# Offensive Security Reconnaissance Continued with Public Facing Industrial Control System HMIs using Moondream

Building on my previous experiments with Moondream for physical security reconnaissance planning automation (https://huggingface.co/posts/Csplk/926337297827024), I've now turned my attention to exploring the potential of this powerful image-text-text model for offensive security reconnaissance in the realm of Industrial Control Systems (ICS).
ICS HMIs (Human-Machine Interfaces) are increasingly exposed to the public internet, often without adequate security measures in place. This presents a tantalizing opportunity for malicious actors to exploit vulnerabilities and gain unauthorized access to critical infrastructure.

Using Moondream with batch processing ( Csplk/moondream2-batch-processing), I've been experimenting with analyzing public facing ICS ( Csplk/ICS_UIs) HMI ( Csplk/HMI) screenshots from shodan to identify types of exposed ICS system HMIs, how they are operated and how malicious actors with access to these systems could cause damage to physical infrastructure. Feeding images of HMIs and pre-defined text prompts to Moondream batch processing successfully (unconfirmed accuracy levels) extracted information about the underlying systems, including

1. **System type**
2. **Possible Operation Details**
3. **Malicious Actor Outcomes**

Next steps:
* I have a longer and more in depth blog write up in the works that will cover the previous and this post's approaches for experiments for sharing via HF community blog posts soon.
* I plan to continue refining my Moondream-based tool to improve its accuracy and effectiveness in processing public facing ICS HMIs.
* As mentioned before, offensive security with moondream focused HF Space once its fleshed out.

Thanks again to @vikhyatk for the incredible Moondream model. vikhyatk/moondream2
replied to their post 4 months ago
view reply

It's in the same repo, uploaded with the tag "2024-07-23" you can pass in as revision when instantiating the model.

replied to their post 4 months ago
posted an update 4 months ago
view post
Post
3231
๐Ÿš€ Exciting news! We've just launched "Thundermoon" - the latest version of Moondream, our open-source vision language model! ๐ŸŒ™

Key improvements in this release:
1. Massive leap in OCR capabilities
2. Enhanced document understanding
3. Significant boosts across key metrics:
* DocVQA: 61.9 (โ†‘103%)
* TextVQA: 60.2 (โ†‘5.2%)
* GQA: 64.9 (โ†‘2.9%)

What does this mean? Moondream can now tackle complex document analysis tasks with unprecedented accuracy for a model of its size. From deciphering handwritten notes to interpreting data tables, the applications are vast.

Check out the image for a glimpse of Moondream in action, effortlessly extracting insights from a 1944 sugar industry document!

Why it matters:
* Democratizing AI: As an open-source project, we're making advanced vision AI accessible to all developers.
* Efficiency: Proving that smaller models can deliver big results.
* Real-world impact: From historical document analysis to modern business intelligence, the potential use cases are exciting.

Curious to try it out? Try out the live demo here! https://moondream.ai/playground
ยท
Reacted to Csplk's post with ๐Ÿคฏ 5 months ago
view post
Post
1381
# Offensive Physical Security Reconnaissance Planning Automation with public facing RTSP streams and Moondream


After some late night casual hacking about on VLMs for criminal attack vector reconnaissance automaton experiments using Moondream (as usual) based image-text-text with pre defined text prompts that are tuned for extracting weakness or customer identity and monitory based theft physical red team engagement reconnaissance and vector of malicious or criminal activity Working on a space. Thanks again for such a wonderful blessing of super power image-text-to-text model with minimal computational power needed @vikhyatk

I have started actually implementing a custom little tool with both static html space sand python gradio spaces on the go which I shall share as hf spaces when done them.

---

vikhyatk/moondream2

vikhyatk/moondream2
  • 1 reply
ยท
posted an update 6 months ago
posted an update 6 months ago
view post
Post
3065
Just released a new version of vikhyatk/moondream2 - now supporting higher resolution images (up to 756x756)!

TextVQA score (which measures the model's ability to read and reason about text in images) is up from 53.1 to 57.2 (+7.7%). Other visual question answering and counting benchmark results are up ~0.5%.
posted an update 7 months ago
view post
Post
1754
Cool new dataset from @isidentical - isidentical/moondream2-coyo-5M-captions

The VeCLIP paper showed a +3% gain while only using 14% of the data by synthetically captioning like this. You get diversity from the alt text (middle column) without having to deal with all of the noise.
  • 1 reply
ยท
posted an update 7 months ago
view post
Post
3052
Updated the vikhyatk/lnqa dataset to include images, so you no longer need to separately download them from OpenImages!
Reacted to radames's post with โค๏ธ๐Ÿ”ฅ 8 months ago
view post
Post
2756
Following up on @vikhyatk 's Moondream2 update and @santiagomed 's implementation on Candle, I quickly put togheter the WASM module so that you could try running the ~1.5GB quantized model in the browser. Perhaps the next step is to rewrite it using https://github.com/huggingface/ratchet and run it even faster with WebGPU, @FL33TW00D-HF .

radames/Candle-Moondream-2

ps: I have a collection of all Candle WASM demos here radames/candle-wasm-examples-650898dee13ff96230ce3e1f
posted an update 8 months ago
view post
Post
3328
Released a new version of vikhyatk/moondream2 today! Primarily focused on improving OCR and captioning (e.g. "Describe this image", "Describe this image in one sentence"), but also seeing general improvement across all benchmarks.
  • 1 reply
ยท
posted an update 8 months ago
posted an update 8 months ago
view post
Post
2233
Just released a dataset with 1.5M image question/answers! vikhyatk/lnqa
replied to their post 9 months ago
view reply

Definitely, I'm planning to set up a blog some time soon.

posted an update 9 months ago
view post
Post
New moondream update out with significantly improved OCR performance (among other benchmarks)!
vikhyatk/moondream2
ยท