Potential usage to vectorize skill labels

#3
by Yansu - opened

Hello,
I am a student looking to build a semantic search engine over parsed resume data. Each data point (i.e. resume) comes with a list of skill labels that I am looking to map onto a dense vector space. Would this model be suitable to generate embeddings that capture the semantic similarity between the skills, irrespective of context ?
Thank you for the work that you do.

Owner

Hi @Yansu , thanks for your interest!

I believe the power of creating dense embeddings is actually using the context surrounding the skills, as some skills could mean different things without the context, e.g., python programming and handling a python (i.e., the snake).

What you could try first is to detect the skills with this model, and then initially create sparse vectors (e.g., count or other statistics).

Otherwise, if your resume data is big enough, you could then try to train your own skipgrams/word2vec models.

Lastly, you could use this model to directly embed the full context where the skill is and only extract the embedding of the skill subwords.

Hope this gives you some ideas!

jjzha changed discussion status to closed

Sign up or log in to comment