Ibrahemqasim's picture
Update README.md
f2fca4b verified
metadata
license: unknown

enwiki_to_arwiki_categories Dataset

This dataset contains mappings between English Wikipedia categories and their corresponding Arabic Wikipedia categories.

Files

langlinks.json

This file contains the original mappings as downloaded from the Hugging Face Hub. It contains 818,354 mappings.

filtered_data.json

This file contains the mappings after filtering out those that do not contain a 4-digit year. It contains 231,349 mappings.

cats_2000.json

This file contains the mappings after replacing all 4-digit years with the year 2000. It contains 20,913 mappings.

cats_2000_contry.json

This file contains the mappings after replacing all 4-digit years with the year 2000 and replacing country names with country word. It contains 538 mappings.