Knowledge-Enhanced Text Classification: Descriptive Modelling and New Approaches
Publisher
Metadata
Show full item recordAbstract
The knowledge available to be exploited by text classification and information retrieval systems
has significantly changed, both in nature and quantity, in the last years. Nowadays, there are
several sources of information that can potentially improve the classification process, and systems
should be able to adapt to incorporate multiple sources of available data in different formats.
This fact is specially important in environments where the required information changes rapidly,
and its utility may be contingent on timely implementation. For these reasons, the importance
of adaptability and flexibility in information systems is rapidly growing. Current systems are
usually developed for specific scenarios. As a result, significant engineering effort is needed to
adapt them when new knowledge appears or there are changes in the information needs.
This research investigates the usage of knowledge within text classification from two different
perspectives. On one hand, the application of descriptive approaches for the seamless modelling
of text classification, focusing on knowledge integration and complex data representation. The
main goal is to achieve a scalable and efficient approach for rapid prototyping for Text Classification
that can incorporate different sources and types of knowledge, and to minimise the gap
between the mathematical definition and the modelling of a solution.
On the other hand, the improvement of different steps of the classification process where knowledge
exploitation has traditionally not been applied. In particular, this thesis introduces two
classification sub-tasks, namely Semi-Automatic Text Classification (SATC) and Document Performance
Prediction (DPP), and several methods to address them. SATC focuses on selecting
the documents that are more likely to be wrongly assigned by the system to be manually classified,
while automatically labelling the rest. Document performance prediction estimates the
classification quality that will be achieved for a document, given a classifier. In addition, we also
propose a family of evaluation metrics to measure degrees of misclassification, and an adaptive
variation of k-NN.
Authors
Martinez-Alvarez, MiguelCollections
- Theses [4201]