A common problem in geocoding is that the description of a location as requested by a user differs from the description of the location in the database, due to misspellings, alternative spelling, abbreviations, different word orders, or addition or omission of words. This thesis is about the development of an algorithm to determine which location in the database is the targeted location and is motivated by the question of CQM to improve their current algorithm, applied in a route planner. The thesis consists of two parts. In the first part, we compare several similarity measures on the quality of matching town or street names. Testing on four datasets showed that the trigrams Jaccard and trigrams Dice's similarity measures are best to use for both town and street names. In this part also a zip code matching algorithm is proposed to determine the best zip code in the database, given an erroneously input zip code. In the second part, we propose a matching algorithm to find the targeted location in the database, given an input location. The algorithm uses the findings from the first part and fo- cuses on strings in the database that match exactly with an input string. An important assumption is that the input strings from different types are presented in different fields and the country is given via a land code. The scope of the algorithm is Europe. Comparisons with the current matching algorithm of CQM on three datasets showed a significant improvement. The percentage of correctly returned locations is for all three sets above 97%, yielding improvements of 16.76%, 14.86% and 4.52%.

Milea, D.V.
hdl.handle.net/2105/14891
Econometrie
Erasmus School of Economics

Ranzijn, B.A. (2013, October 30). A Geocoding Algorithm Based On A Comparative Study Of Address Matching Techniques. Econometrie. Retrieved from http://hdl.handle.net/2105/14891