File size: 802 Bytes
f3c2012
 
 
ccbe58c
 
 
 
 
 
 
 
8bf81f5
ccbe58c
 
 
8bf81f5
ccbe58c
 
 
8bf81f5
 
f2fca4b
 
 
8bf81f5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
license: unknown
---
# enwiki_to_arwiki_categories Dataset

This dataset contains mappings between English Wikipedia categories and their corresponding Arabic Wikipedia categories.

## Files

### langlinks.json

This file contains the original mappings as downloaded from the Hugging Face Hub. It contains 818,354 mappings.

### filtered_data.json

This file contains the mappings after filtering out those that do not contain a 4-digit year. It contains 231,349 mappings.

### cats_2000.json

This file contains the mappings after replacing all 4-digit years with the year 2000. It contains 20,913 mappings.

### cats_2000_contry.json

This file contains the mappings after replacing all 4-digit years with the year 2000 and replacing country names with `country` word. It contains 538 mappings.