README.md · pascalrai/large-BERT-NER-email at 3bd867a2e8dda4f460a0d63ded4041f29dd6985c

metadata

library_name: transformers
tags: []
widget:
  - text: >-
      Thank you for approaching me about the collaboration. You can talk to my
      manager, Kritik at 9874512563 or kritik.jun@asdf.com
    example_title: Email 1
  - text: Call me on 9874569874
    example_title: Email 2
  - text: >-
      You can email me at adsf@gmail.com or call directly on 9999988888. The
      point of contact would be my manager Manish Neupane
    example_title: Email 3

Overview:

The Model is fine-tuned for 3 class + "0" class.
The Dataset is custom annotated and contains 400 texts and the model was trained on the split of 0.76, 0.12, and 0.12.

The validation classification report is as follows:

Class	Precision	Recall	f1
0	1.00	1.00	1.00
1	0.98	1.00	0.91
2	0.95	0.89	0.92
3	0.8	0.88	0.84
macro-avg	0.93	0.94	0.94

The test classification report is as follows:

Class	Precision	Recall	f1
0	1.00	1.00	1.00
1	0.98	1.00	0.99
2	0.66	0.97	0.79
3	0.84	0.78	0.81
macro-avg	0.87	0.94	0.90

Possible future direction:

Clean data to a good enough format as much as possible.
Increase the data as much as possible. (Make sure to have data that is seen in real use cases.)
Ponder: Is it possible to use sth like Grammarly to clean the sentences before tokenization such that proper nouns are Capital and the grammer is correct such that a pattern is formed?