6 2 15

its5Q

https://t.me/dno5iq

its5Q

AI & ML interests

None yet

Recent Activity

liked a dataset 5 days ago

alpindale/two-million-bluesky-posts

liked a model 21 days ago

KimberleyJSN/melbandroformer

New activity about 2 months ago

Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24:Проблема c кавычками в json

View all activity

Organizations

Posts 2

Post

1281

Continuing my streak by releasing the Wikireading dataset: a large collection of scraped non-fiction books predominantly in Russian language.
its5Q/wikireading

Here's the highlights:
- ~7B tokens, or ~28B characters, making it a great candidate for use in pretraining
- Contains non-fiction works from many knowledge domains
- Includes both the original HTML and extracted text of book chapters

Post

1083

Made public a dataset of scraped teletype articles.

Here's the overview:
- 3.3 million articles, predominantly in Russian and English
- Includes original HTML, extracted text and metadata
- All articles were run through language identification
- Includes all public articles up until April 2024

its5Q/teletype

Collections 1

models 1

its5Q/rugpt3large_mailqa

Text Generation • Updated Jun 5, 2023 • 30 • 4

datasets 7