LSTM and CRF models for entity extraction from job descriptions

Dijkman, Bjorn

Named Entity Recognition (NER) is a sub-task of information extraction in which named entities are classified in unstructured text, typically including the names of people, organizations, locations and quantities. In this research, NER is applied to data science job descriptions collected from indeed.com with the goal of extracting the programming skills an applicant needs, which spoken languages are required, how much experience is asked for and if any educational background is preferred. Different Long Short Term Memory (LSTM) methods and a Conditional Random Field (CRF) are compared to each other. Although LSTM models have theoretical advantages over CRF models due to their ability to capture long term dependencies within a sentence, the CRF model obtains the highest overall accuracy with a F1-score of 0.86. The high F1-score for CRFs can partly be attributed to its ability to classify multi-token chunks well, which are entities that consist of more than one word. The methods are compared on different subsets of the data, and this research shows that LSTM based methods need more data to perform well.

Additional Metadata
Keywords	entity extraction, LSTM model, CRF model, Named Entity Recognition (NER)
Thesis Advisor	Groenen, P.J.F.
Persistent URL	hdl.handle.net/2105/51126
Series	Business Economics
Organisation	Erasmus School of Economics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Dijkman, B.N. van (Bjorn). (2019, August 22). LSTM and CRF models for entity extraction from job descriptions. Business Economics. Retrieved from http://hdl.handle.net/2105/51126

LSTM and CRF models for entity extraction from job descriptions

Publication

Publication

About

LSTM and CRF models for entity extraction from job descriptions

Publication

Publication

Workflow

Workflow

Add Content