With the increasing importance of Customer Due Diligence, _nancial institutions are forced to digitize and categorize their paper archives. Currently, document classi_cation is primarily realized using textual information. This paper proposes to complement textual features with visual features, using a convolutional neural network and transfer learning. The proposed approach is tested on both a small real-world data set for a large Dutch bank, and on the larger academic RVL-CDIP data set. It is found that using the combined approach yields better classi_cation performance than using only textual or visual features. For the RVL-CDIP data set the proposed method achieves state-of-the-art accuracy of 93.51%, exceeding previous results based on solely visual features. For the smaller real-world data set, the combined method scores marginally better than the benchmark set using only textual features, while being computationally much more expensive. Therefore, it is concluded that adding visual features using deep learning is a favorable approach to increase document classi_cation performance, given that the data and computational resources are available.

Additional Metadata
Keywords text classification, document image classification, feature combination, tf-idf, convolutional neural network
Thesis Advisor Birbil, S.I.
Persistent URL hdl.handle.net/2105/50604
Series Econometrie
Citation
Zijlstra, A.W. (2019, October 24). Classifying Documents using both Textual and Visual Features. Econometrie. Retrieved from http://hdl.handle.net/2105/50604