Knowledge-Enhanced Text Classification: Descriptive Modelling and New Approaches

Martinez-Alvarez, Miguel

dc.contributor.author	Martinez-Alvarez, Miguel
dc.date.accessioned	2017-10-09T13:21:10Z
dc.date.available	2017-10-09T13:21:10Z
dc.date.issued	2014-07-09
dc.date.submitted	2017-10-09T11:20:45.768Z
dc.identifier.citation	Martinez-Alvarez. M. 2014. Knowledge-Enhanced Text Classification: Descriptive Modelling and New Approaches. Queen Mary University of London	en_US
dc.identifier.uri	http://qmro.qmul.ac.uk/xmlui/handle/123456789/27205
dc.description	PhD	en_US
dc.description.abstract	The knowledge available to be exploited by text classification and information retrieval systems has significantly changed, both in nature and quantity, in the last years. Nowadays, there are several sources of information that can potentially improve the classification process, and systems should be able to adapt to incorporate multiple sources of available data in different formats. This fact is specially important in environments where the required information changes rapidly, and its utility may be contingent on timely implementation. For these reasons, the importance of adaptability and flexibility in information systems is rapidly growing. Current systems are usually developed for specific scenarios. As a result, significant engineering effort is needed to adapt them when new knowledge appears or there are changes in the information needs. This research investigates the usage of knowledge within text classification from two different perspectives. On one hand, the application of descriptive approaches for the seamless modelling of text classification, focusing on knowledge integration and complex data representation. The main goal is to achieve a scalable and efficient approach for rapid prototyping for Text Classification that can incorporate different sources and types of knowledge, and to minimise the gap between the mathematical definition and the modelling of a solution. On the other hand, the improvement of different steps of the classification process where knowledge exploitation has traditionally not been applied. In particular, this thesis introduces two classification sub-tasks, namely Semi-Automatic Text Classification (SATC) and Document Performance Prediction (DPP), and several methods to address them. SATC focuses on selecting the documents that are more likely to be wrongly assigned by the system to be manually classified, while automatically labelling the rest. Document performance prediction estimates the classification quality that will be achieved for a document, given a classifier. In addition, we also propose a family of evaluation metrics to measure degrees of misclassification, and an adaptive variation of k-NN.	en_US
dc.language.iso	en	en_US
dc.publisher	Queen Mary University of London	en_US
dc.rights	The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without the prior written consent of the author
dc.subject	Electronic Engineering and Computer Science	en_US
dc.subject	Information retrieval	en_US
dc.subject	text classification	en_US
dc.subject	Semi-Automatic Text Classification	en_US
dc.subject	Document Performance Prediction	en_US
dc.title	Knowledge-Enhanced Text Classification: Descriptive Modelling and New Approaches	en_US
dc.type	Thesis	en_US

Files in this item

Name:: Martinez_Alvarez_M_PhD_final_0 ...
Size:: 4.680Mb
Format:: application/

View/Open

This item appears in the following Collection(s)

Theses [4235]
Theses Awarded by Queen Mary University of London

Show simple item record