OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. ā¢ 8 items ā¢ Updated 5 days ago ā¢ 74
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages Paper ā¢ 2410.23825 ā¢ Published 28 days ago ā¢ 3
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages Paper ā¢ 2410.23825 ā¢ Published 28 days ago ā¢ 3 ā¢ 2
CommonCrawl Collection Large web-mined general corpus based on CommonCrawl. ā¢ 6 items ā¢ Updated 27 days ago
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages Paper ā¢ 2410.23825 ā¢ Published 28 days ago ā¢ 3