File size: 1,839 Bytes
c075f91
a7f099f
c075f91
a7f099f
c075f91
a7f099f
 
 
 
 
 
c075f91
a7f099f
 
c075f91
a7f099f
 
c075f91
a7f099f
 
c075f91
a7f099f
 
c075f91
a7f099f
 
c075f91
a7f099f
 
 
c075f91
a7f099f
 
c075f91
a7f099f
c075f91
a7f099f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
library_name: fastai
---
# Details

## Background
In July and August 2022, I researched with a professor at UMBC in the Department of Computer Science on basic natural language processing.
I learned through the fastai fastbook and our task was to create a resume classifier. The professor found a dataset of resumes online
and gave me the task to manually label each text file as a resume or not (2-resume, 1-kind of, 0-not a resume). After that, I learned
through fastai and under the guidance of the professor on how to train the model. I trained it many times but not continuously so I 
needed to learn how to freeze and unfreeze the model. I also trained over night for a couple of days and reached an accuracy of 90%.

Recently, I looked back on this project and wanted to make it a little more official by creating a small testing interface program and
by uploading it onto github/huggingface.

## Files
Here are the files you'll find in this repository

### resume_learner.pth
This is the file of the trained model

### main.ipynb
This is the jupyter notebook on loading the model and running specific tests on it

### test.txt
This is a file to feed into the model in main.ipynb if you want to copy paste a large chunk of text

## Observations
In all honesty, this is not a very good model but it provided the basics for me on how to create a language learning model.
I will say it successfully predicts resumes pretty well, but some weird cases where it doesn't is when it sees texts like

- "hi"
- "this is not a resume"

Things like this because they are very short files.

However, I believe this is because the training data was mainly resumes, so it can classify whether a text file **is** a resume.
There wasn't much data showing whether a text file was **not** a resume so the model could not determine that very well.