enwiki_to_arwiki_categories Dataset
This dataset contains mappings between English Wikipedia categories and their corresponding Arabic Wikipedia categories.
Files
langlinks.json
This file contains the original mappings as downloaded from the Hugging Face Hub. It contains 818,354 mappings.
filtered_data.json
This file contains the mappings after filtering out those that do not contain a 4-digit year. It contains 231,349 mappings.
cats_2000.json
This file contains the mappings after replacing all 4-digit years with the year 2000. It contains 20,913 mappings.
cats_2000_contry.json
This file contains the mappings after replacing all 4-digit years with the year 2000 and replacing country names with country
word. It contains 538 mappings.