Separate training data by country
#117
by
wponhf
- opened
Greetings, I sent a question to bigscience-contact@googlegroups.com, but have not received a response. If I am asking this question in the wrong forum, I apologize. Are there any resources available to to understand how to isolate or categorize the English-sourced training data according to its country of origin? Thanks.
You can find this information (when available) in the data card deck available here, under Speaker Locations:
Data Cards per Source