Show simple item record

dc.contributor.authorMarkatopoulou, Foteini
dc.date.accessioned2018-09-18T15:42:37Z
dc.date.available2018-09-18T15:42:37Z
dc.date.issued2018-09-06
dc.date.submitted2018-09-18T15:23:36.743Z
dc.identifier.citationMarkatopoulou, F. 2018. Machine Learning Architectures for Video Annotation and Retrieval. Queen Mary University of Londonen_US
dc.identifier.urihttp://qmro.qmul.ac.uk/xmlui/handle/123456789/44693
dc.descriptionPhDen_US
dc.description.abstractIn this thesis we are designing machine learning methodologies for solving the problem of video annotation and retrieval using either pre-defined semantic concepts or ad-hoc queries. Concept-based video annotation refers to the annotation of video fragments with one or more semantic concepts (e.g. hand, sky, running), chosen from a predefined concept list. Ad-hoc queries refer to textual descriptions that may contain objects, activities, locations etc., and combinations of the former. Our contributions are: i) A thorough analysis on extending and using different local descriptors towards improved concept-based video annotation and a stacking architecture that uses in the first layer, concept classifiers trained on local descriptors and improves their prediction accuracy by implicitly capturing concept relations, in the last layer of the stack. ii) A cascade architecture that orders and combines many classifiers, trained on different visual descriptors, for the same concept. iii) A deep learning architecture that exploits concept relations at two different levels. At the first level, we build on ideas from multi-task learning, and propose an approach to learn concept-specific representations that are sparse, linear combinations of representations of latent concepts. At a second level, we build on ideas from structured output learning, and propose the introduction, at training time, of a new cost term that explicitly models the correlations between the concepts. By doing so, we explicitly model the structure in the output space (i.e., the concept labels). iv) A fully-automatic ad-hoc video search architecture that combines concept-based video annotation and textual query analysis, and transforms concept-based keyframe and query representations into a common semantic embedding space. Our architectures have been extensively evaluated on the TRECVID SIN 2013, the TRECVID AVS 2016, and other large-scale datasets presenting their effectiveness compared to other similar approaches.en_US
dc.language.isoenen_US
dc.publisherQueen Mary University of Londonen_US
dc.rightsThe copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without the prior written consent of the author
dc.subjectElectronic Engineering and Computer Scienceen_US
dc.subjectMachine Learning Architecturesen_US
dc.subjectvideo annotationen_US
dc.titleMachine Learning Architectures for Video Annotation and Retrievalen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Theses [4223]
    Theses Awarded by Queen Mary University of London

Show simple item record