POLIS: a probabilistic summarisation logic for structured documents

Forst, Jan Frederik

dc.contributor.author	Forst, Jan Frederik
dc.date.accessioned	2011-02-07T15:40:07Z
dc.date.available	2011-02-07T15:40:07Z
dc.date.issued	2009
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/467
dc.description	PhD	en_US
dc.description.abstract	As the availability of structured documents, formatted in markup languages such as SGML, RDF, or XML, increases, retrieval systems increasingly focus on the retrieval of document-elements, rather than entire documents. Additionally, abstraction layers in the form of formalised retrieval logics have allowed developers to include search facilities into numerous applications, without the need of having detailed knowledge of retrieval models. Although automatic document summarisation has been recognised as a useful tool for reducing the workload of information system users, very few such abstraction layers have been developed for the task of automatic document summarisation. This thesis describes the development of an abstraction logic for summarisation, called POLIS, which provides users (such as developers or knowledge engineers) with a high-level access to summarisation facilities. Furthermore, POLIS allows users to exploit the hierarchical information provided by structured documents. The development of POLIS is carried out in a step-by-step way. We start by defining a series of probabilistic summarisation models, which provide weights to document-elements at a user selected level. These summarisation models are those accessible through POLIS. The formal definition of POLIS is performed in three steps. We start by providing a syntax for POLIS, through which users/knowledge engineers interact with the logic. This is followed by a definition of the logics semantics. Finally, we provide details of an implementation of POLIS. The final chapters of this dissertation are concerned with the evaluation of POLIS, which is conducted in two stages. Firstly, we evaluate the performance of the summarisation models by applying POLIS to two test collections, the DUC AQUAINT corpus, and the INEX IEEE corpus. This is followed by application scenarios for POLIS, in which we discuss how POLIS can be used in specific IR tasks.	en_US
dc.language.iso	en	en_US
dc.subject	Computer Science	en_US
dc.title	POLIS: a probabilistic summarisation logic for structured documents	en_US
dc.type	Thesis	en_US
dc.rights.holder	The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without the prior written consent of the author

Files in this item

Name:: FORSTPOLIS2009.pdf
Size:: 1.687Mb
Format:: application/

View/Open

This item appears in the following Collection(s)

Theses [4125]
Theses Awarded by Queen Mary University of London

Show simple item record