File size: 1,003 Bytes
8ed89e0
 
8ac019e
 
 
 
 
 
 
 
 
8ed89e0
8ac019e
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
---
license: cc-by-4.0
language:
- en
pipeline_tag: text-classification
tags:
- glam
- lam
- subject indexing
- annif
- hogwarts
---
# Hogwarts Sorting Hat using Annif and its fastText backend

The model is the output of [this Annif tutorial exercise](https://github.com/NatLibFi/Annif-tutorial/blob/master/exercises/OPT_hogwarts.md).

> The original Sorting Hat reads the thoughts of the student, but Annif generally does not have access to that kind of information, so we will simply use the name of the student as input. We will train a fastText model on the names of characters from the Harry Potter novels whose house is known. To make it possible to generalize the model to new, unseen names, we will use character n-grams to split all names into chunks of 1 to 4 characters - for example harry becomes [h, ha, har, harr, a, ar, arr, arry ...]. fastText can do this when given the minn and maxn parameters, which set the minimum and maximum length of character n-grams to generate from input text.