Caleb Fahlgren PRO

cfahlgren1

AI & ML interests

None yet

Recent Activity

Articles

Organizations

Hugging Face's profile picture Datasets Maintainers's profile picture Hugging Face OSS Metrics's profile picture Hugging Face TB Research's profile picture ChatDB's profile picture Cognitive Computations's profile picture nltpt-q's profile picture DuckDB Text-2-SQL Bench's profile picture open/ acc's profile picture Bluesky Community's profile picture

cfahlgren1's activity

posted an update 8 days ago
view post
Post
859
observers ๐Ÿ”ญ - automatically log all OpenAI compatible requests to a dataset๐Ÿ’ฝ

โ€ข supports any OpenAI compatible endpoint ๐Ÿ’ช
โ€ข supports DuckDB, Hugging Face Datasets, and Argilla as stores

> pip install observers

No complex framework. Just a few lines of code to start sending your traces somewhere. Let us know what you think! @davidberenstein1957 and I will continue iterating!

Here's an example dataset that was logged to Hugging Face from Ollama: cfahlgren1/llama-3.1-awesome-chatgpt-prompts
replied to their post 11 days ago
posted an update 11 days ago
view post
Post
856
You can create charts, leaderboards, and filters on top of any Hugging Face dataset in less than a minute

โ€ข ASCII Bar Charts ๐Ÿ“Š
โ€ข Powered by DuckDB WASM โšก
โ€ข Download results to Parquet ๐Ÿ’ฝ
โ€ข Embed and Share results with friends ๐Ÿ“ฌ

Do you have any interesting queries?
Reacted to davanstrien's post with โค๏ธ 11 days ago
replied to their post 11 days ago
view reply

Heavy is the head that wears the crown

Reacted to fracapuano's post with โค๏ธ 11 days ago
view post
Post
995
Sharing what we have built over the course of the weekend at the @llamameta hackathon, by Cerebral Valley in London ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‘‡

@gabrycina @calebgcc and I competed with 200+ participants and 50+ teams for a 24-hrs sprint centered around hacking for impact! We focused on applications of robotics to those in need of assisted living, moving our focus to enable greater autonomy and accessibility of robotics in everyday life.

complete list of assets ๐Ÿ‘‡
๐Ÿค— trained robotics policies
v1:
- fracapuano/moss-pills
- fracapuano/moss-cup
v2:
- fracapuano/meta-grasp

๐Ÿค— datasets
v1:
- fracapuano/pills
- fracapuano/cup
v2:
- fracapuano/cupim


You can find a live demo of our submission at: https://x.com/_fracapuano/status/1858102728691458554

If you want to know more about how we collected 100GB+ of data, trained multiple RL-policies using @lerobot and used Llama-3.2 models to handle user interactions and switch between tasks, go ahead and have a look! Also, don't be a stranger, and reach out ๐Ÿฆพ

Our project is fully open-source, for the community (and ourselves, ๐Ÿ‘จโ€๐Ÿณ) to build! A huge thank you to @cadene for the help (and the robot ๐Ÿคญ) - truly feeling these hugs-vibes ๐Ÿค— , and to @thomwolf and @clem for sharing our work across

Little extra:
โžก๏ธ Our ๐Ÿง EEG waves๐Ÿง -based control of the ๐Ÿฆพrobotic arm๐Ÿฆพ
Reacted to LukeNeumann's post with ๐Ÿคฏ 11 days ago
view post
Post
1202
Nine years ago, I uploaded the first 8K resolution video to YouTube and I've been stockpiling 8K footage ever since: https://www.youtube.com/watch?v=sLprVF6d7Ug&t

Should @Overlaiapp release the first open-source 8K video dataset?

Could anyone even fine tune a model with this?๐Ÿ˜…
ยท
replied to LukeNeumann's post 11 days ago
view reply

Would be massive! Let us know if you need any help ๐Ÿค—

Reacted to dvilasuero's post with ๐Ÿš€๐Ÿค— 11 days ago
posted an update 11 days ago
Reacted to nyuuzyou's post with ๐Ÿ”ฅ 12 days ago
view post
Post
946
๐Ÿ–ผ๏ธ Introducing Public Domain Pictures Dataset - nyuuzyou/publicdomainpictures

Dataset highlights:
- 644,412 public domain images with comprehensive metadata from publicdomainpictures.net
- English language metadata including titles, descriptions, and keywords
- Each entry contains rich metadata including:
- Unique image ID and full-size image URLs
- Detailed titles and descriptions
- Keyword/tag collections
- Creator attribution
- Released to the public domain under Creative Commons Zero (CC0) license
  • 2 replies
ยท
posted an update 12 days ago
view post
Post
2939
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
ยท
replied to victor's post 12 days ago
view reply

There is no harm, but if I turn the notification on or off for an unapproved gated repo, an error message will appear.

Hey @John6666 , this should be fixed very soon.

Reacted to erikkaum's post with ๐Ÿ”ฅ 12 days ago
view post
Post
1675
A while ago I started experimenting with compiling the Python interpreter to WASM.

To build a secure, fast, and lightweight sandbox for code execution โ€” ideal for running LLM-generated Python code.

- Send code simply as a POST request
- 1-2ms startup times

Hack away:
https://github.com/ErikKaum/runner
posted an update 15 days ago
view post
Post
2212
Why use Google Drive when you can have:

โ€ข Free storage with generous limits๐Ÿ†“
โ€ข Dataset Viewer (Sorting, Filtering, FTS) ๐Ÿ”
โ€ข Third Party Library Support
โ€ข SQL Console ๐ŸŸง
โ€ข Security ๐Ÿ”’
โ€ข Community, Reach, and Visibility ๐Ÿ“ˆ

It's a no brainer!

Check out our post on what you get instantly out of the box when you create a dataset.
https://huggingface.co/blog/researcher-dataset-sharing
  • 1 reply
ยท
replied to maxiw's post 16 days ago
view reply

Yeah for sure! Would be cool to see links to the leaderboards of these to see more than top 5 and see where most of the community is ๐Ÿ‘€ @maxiw

Maybe like top 100 or top 500 with sql console saved link

replied to m-ric's post 16 days ago
replied to m-ric's post 16 days ago
Reacted to m-ric's post with ๐Ÿ‘€ 16 days ago
view post
Post
3694
๐—ง๐—ต๐—ฒ ๐—ป๐—ฒ๐˜…๐˜ ๐—ฏ๐—ถ๐—ด ๐˜€๐—ผ๐—ฐ๐—ถ๐—ฎ๐—น ๐—ป๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ ๐—ถ๐˜€ ๐—ป๐—ผ๐˜ ๐Ÿฆ‹, ๐—ถ๐˜'๐˜€ ๐—›๐˜‚๐—ฏ ๐—ฃ๐—ผ๐˜€๐˜๐˜€! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

โš™๏ธ Computed with the great dataset maxiw/hf-posts
โš™๏ธ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me
ยท