|
--- |
|
license: unknown |
|
--- |
|
# enwiki_to_arwiki_categories Dataset |
|
|
|
This dataset contains mappings between English Wikipedia categories and their corresponding Arabic Wikipedia categories. |
|
|
|
## Files |
|
|
|
### langlinks.json |
|
|
|
This file contains the original mappings as downloaded from the Hugging Face Hub. It contains 818,354 mappings. |
|
|
|
### filtered_data.json |
|
|
|
This file contains the mappings after filtering out those that do not contain a 4-digit year. It contains 231,349 mappings. |
|
|
|
### cats_2000.json |
|
|
|
This file contains the mappings after replacing all 4-digit years with the year 2000. It contains 20,913 mappings. |
|
|
|
### cats_2000_contry.json |
|
|
|
This file contains the mappings after replacing all 4-digit years with the year 2000 and replacing country names with `country` word. It contains 538 mappings. |
|
|
|
|