• Login
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    Techniques for improving efficiency and scalability for the integration of information retrieval and databases 
    •   QMRO Home
    • Queen Mary University of London Theses
    • Theses
    • Techniques for improving efficiency and scalability for the integration of information retrieval and databases
    •   QMRO Home
    • Queen Mary University of London Theses
    • Theses
    • Techniques for improving efficiency and scalability for the integration of information retrieval and databases
    ‌
    ‌

    Browse

    All of QMROCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects
    ‌
    ‌

    Administrators only

    Login
    ‌
    ‌

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Techniques for improving efficiency and scalability for the integration of information retrieval and databases

    View/Open
    WUTechniquesFor2010.pdf (2.278Mb)
    Publisher
    Queen Mary University of London
    Metadata
    Show full item record
    Abstract
    This thesis is on the topic of integration of Information Retrieval (IR) and Databases (DB), with particular focuses on improving efficiency and scalability of integrated IR and DB technology (IR+DB). The main purpose of this study is to develop efficient and scalable techniques for supporting integrated IR and DB technology, which is a popular approach today for handling complex queries over text and structured data. Our specific interest in this thesis is how to efficiently handle queries over large-scale text and structured data. The work is based on a technology that integrates probability theory and relational algebra, where retrievals for text and data are to be expressed in probabilistic logical programs such as probabilistic relational algebra or probabilistic Datalog. To support efficient processing of probabilistic logical programs, we proposed three optimization techniques that focus on aspects covered logical and physical layers, which include: scoring-driven query optimization using scoring expression, query processing with top-k incorporated pipeline, and indexing with relational inverted index. Specifically, scoring expressions are proposed for expressing the scoring or probabilistic semantics of implied scoring functions of PRA expressions, so that efficient query execution plan can be generated by rule-based scoring-driven optimizer. Secondly, to balance efficiency and effectiveness so that to improve query response time, we studied methods for incorporating topk algorithms into pipelined query execution engine for IR+DB systems. Thirdly, the proposed relational inverted index integrates IR-style inverted index and DB-style tuple-based index, which can be used to support efficient probability estimation and aggregation as well as conventional relational operations. Experiments were carried out to investigate the performances of proposed techniques. Experimental results showed that the efficiency and scalability of an IR+DB prototype have been improved, while the system can handle queries efficiently on considerable large data sets for a number of IR tasks.
    Authors
    Wu, Hengzhi
    URI
    https://qmro.qmul.ac.uk/xmlui/handle/123456789/374
    Collections
    • Theses [3321]
    Copyright statements
    The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without the prior written consent of the author
    Twitter iconFollow QMUL on Twitter
    Twitter iconFollow QM Research
    Online on twitter
    Facebook iconLike us on Facebook
    • Site Map
    • Privacy and cookies
    • Disclaimer
    • Accessibility
    • Contacts
    • Intranet
    • Current students

    Modern Slavery Statement

    Queen Mary University of London
    Mile End Road
    London E1 4NS
    Tel: +44 (0)20 7882 5555

    © Queen Mary University of London.