tokenizer-arena / stats /compression_rate /CohereForAI.aya-101 @ cc100.en.diff.json
xu-song's picture
add compression_rate details
a4208a2
[
{
"text": "and yeah im a boy,and no, im not g*y, im a nice guy. i dont love his songs or anything , but he's not that bad tbh.",
"decoded_text": "and yeah im a boy,and no, im not g*y, im a nice guy. i dont love his songs or anything, but he's not that bad tbh.",
"diff": [
"delete text[86:87] --> decoded_text[86:86] ' ' --> ''"
],
"n_oov_chars": 0,
"oov_ratio": 0.0,
"oov_charset": "[]"
},
{
"text": "Justin serenaded wonderful or better than a great I like popular songs, particularly as it is talented. all those who hate Justin are g**s because they feel jealous of him because he is handsome at the same time a rising singer and a small age. I myself appreciate the wonderful artist with this beautiful and talented .",
"decoded_text": "Justin serenaded wonderful or better than a great I like popular songs, particularly as it is talented. all those who hate Justin are g**s because they feel jealous of him because he is handsome at the same time a rising singer and a small age. I myself appreciate the wonderful artist with this beautiful and talented.",
"diff": [
"delete text[318:319] --> decoded_text[318:318] ' ' --> ''"
],
"n_oov_chars": 0,
"oov_ratio": 0.0,
"oov_charset": "[]"
},
{
"text": "Soften the landing zones with a pair of Rubber Mats , made from dyed rubber chips, heat compressed and available in dark green or brick red.",
"decoded_text": "Soften the landing zones with a pair of Rubber Mats, made from dyed rubber chips, heat compressed and available in dark green or brick red.",
"diff": [
"delete text[51:52] --> decoded_text[51:51] ' ' --> ''"
],
"n_oov_chars": 0,
"oov_ratio": 0.0,
"oov_charset": "[]"
},
{
"text": "​EEI Members have access to a wide range of reports, publications, communications, and other resources. In order to access the resources below, a member log in is required.",
"decoded_text": "EEI Members have access to a wide range of reports, publications, communications, and other resources. In order to access the resources below, a member log in is required.",
"diff": [
"delete text[0:1] --> decoded_text[0:0] '\\u200b' --> ''"
],
"n_oov_chars": 1,
"oov_ratio": 0.005813953488372093,
"oov_charset": "[\"​\"]"
},
{
"text": "​Launched in 2017, AUPSE is a senior executive knowledge exchange and peer-to-peer networking platform created to accelerate operational excellence in the African electric power sector.",
"decoded_text": "Launched in 2017, AUPSE is a senior executive knowledge exchange and peer-to-peer networking platform created to accelerate operational excellence in the African electric power sector.",
"diff": [
"delete text[0:1] --> decoded_text[0:0] '\\u200b' --> ''"
],
"n_oov_chars": 1,
"oov_ratio": 0.005405405405405406,
"oov_charset": "[\"​\"]"
},
{
"text": "We're not so rough and over the top these days, so they miiiiight survive ._.",
"decoded_text": "We're not so rough and over the top these days, so they miiiiight survive._.",
"diff": [
"delete text[73:74] --> decoded_text[73:73] ' ' --> ''"
],
"n_oov_chars": 0,
"oov_ratio": 0.0,
"oov_charset": "[]"
},
{
"text": "Just finished Hulse's \"Black River\" and simply adored the book. So pretty, overall, and much like the Kent Haruf novels, such as \"Plainsong\" that I've enjoyed over the years. \"Black River\" is surely one of the best five I've read this year. Solid Pulitzer choice, in my opinion. Side note: As I've mentioned before, I surely don't understand all of the hoopla surrounding \"The Sellout,\" with so many other worthy contenders. But, what do I know? I'm only a reader. :-) Read on ...",
"decoded_text": "Just finished Hulse's \"Black River\" and simply adored the book. So pretty, overall, and much like the Kent Haruf novels, such as \"Plainsong\" that I've enjoyed over the years. \"Black River\" is surely one of the best five I've read this year. Solid Pulitzer choice, in my opinion. Side note: As I've mentioned before, I surely don't understand all of the hoopla surrounding \"The Sellout,\" with so many other worthy contenders. But, what do I know? I'm only a reader. :-) Read on...",
"diff": [
"replace text[476:480] --> decoded_text[476:479] ' ...' --> '...'"
],
"n_oov_chars": 0,
"oov_ratio": 0.0,
"oov_charset": "[]"
},
{
"text": "I really don't understand all of the hoopla over THE SELLOUT. Just a so-so book, in my opinion. Minor work. I struggled through it, and can never get back the time spent on that tome. EILEEN and HONEYDEW are sooooooo much better, not to mention THE TURNER HOUSE, TSAR, DID YOU EVER, and others. I'm reading DELICIOUS FOODS right now, and think it's a major-serious contender as well. BLACK RIVER is next on my list, and I can't wait. But, what do I know? :-) Read on ...",
"decoded_text": "I really don't understand all of the hoopla over THE SELLOUT. Just a so-so book, in my opinion. Minor work. I struggled through it, and can never get back the time spent on that tome. EILEEN and HONEYDEW are sooooooo much better, not to mention THE TURNER HOUSE, TSAR, DID YOU EVER, and others. I'm reading DELICIOUS FOODS right now, and think it's a major-serious contender as well. BLACK RIVER is next on my list, and I can't wait. But, what do I know? :-) Read on...",
"diff": [
"replace text[466:470] --> decoded_text[466:469] ' ...' --> '...'"
],
"n_oov_chars": 0,
"oov_ratio": 0.0,
"oov_charset": "[]"
},
{
"text": "I have also read The Shore ,Alex, yes I agree its very good, maybe a chance. The last years I have just waited to last in the year to see who the genral public have been siding and gone for that, from a collectors point of view, it would be nice if something won which did not have a 100,000 in the first print run.",
"decoded_text": "I have also read The Shore,Alex, yes I agree its very good, maybe a chance. The last years I have just waited to last in the year to see who the genral public have been siding and gone for that, from a collectors point of view, it would be nice if something won which did not have a 100,000 in the first print run.",
"diff": [
"delete text[26:27] --> decoded_text[26:26] ' ' --> ''"
],
"n_oov_chars": 0,
"oov_ratio": 0.0,
"oov_charset": "[]"
},
{
"text": "Moving to K-W can be confusing for anybody: how can you explain King Street, that runs north, south, east and west ?! Or streets like King and Weber, that are sometimes parallel, and yet cross each other in two places ? For someone new to the country, adjusting to life here can be even much more confusing.",
"decoded_text": "Moving to K-W can be confusing for anybody: how can you explain King Street, that runs north, south, east and west?! Or streets like King and Weber, that are sometimes parallel, and yet cross each other in two places? For someone new to the country, adjusting to life here can be even much more confusing.",
"diff": [
"delete text[114:115] --> decoded_text[114:114] ' ' --> ''",
"delete text[217:218] --> decoded_text[216:216] ' ' --> ''"
],
"n_oov_chars": 0,
"oov_ratio": 0.0,
"oov_charset": "[]"
}
]