Shushant
/

ApplicantTrackingSystemBERT

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Shushant commited on Mar 15, 2023

Commit

4f62b6c

•

1 Parent(s): 57ccd9f

description added

Files changed (1) hide show

README.md +14 -7

README.md CHANGED Viewed

@@ -4,6 +4,11 @@ tags:
 model-index:
 - name: training_bert
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -11,24 +16,26 @@ should probably proofread and complete it, then remove this comment. -->
 # training_bert
-This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 4.0495
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -96,4 +103,4 @@ The following hyperparameters were used during training:
 - Transformers 4.25.1
 - Pytorch 1.8.0+cu111
 - Datasets 2.7.1
-- Tokenizers 0.13.2

 model-index:
 - name: training_bert
   results: []
+license: mit
+language:
+- en
+metrics:
+- perplexity
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # training_bert
+This model is a fine-tuned version of [Bert Base Uncased](https://huggingface.co/) on dataset composed of different jobs posted in several job platforms and thousands of resumes.
 It achieves the following results on the evaluation set:
 - Loss: 4.0495
 ## Model description
+Pretraining done on bert base architecture.
 ## Intended uses & limitations
+This model can be used to generate contextual embeddings for textual data used in Applicant Tracking Systems such as resumes, jobs and cover letters.
+The embeddings can be further used to perform other NLP downstream tasks such as classification, Named Entity Recognition and so on.
+## Training and evaluation data
+THe training corpus is developed using about 40000 resumes and 2000 jobs posted scrapped from different job portals. This is a preliminary dataset
+for the experimentation. THe corpus size is about 2.35 GB of textual data. Similary evaluation data contains few resumes and jobs making about 12 mb of textual data.
 ## Training procedure
+For the pretraining of masked language model, Trainer API from Huggingface is used. The pretraining took about 6 hrs 40 mins.
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - Transformers 4.25.1
 - Pytorch 1.8.0+cu111
 - Datasets 2.7.1
+- Tokenizers 0.13.2