Show simple item record

dc.contributor.authorFoteinopoulou, N
dc.date.accessioned2024-08-09T09:39:53Z
dc.date.available2024-08-09T09:39:53Z
dc.identifier.urihttps://qmro.qmul.ac.uk/xmlui/handle/123456789/98711
dc.description.abstractHuman communication encompasses a rich mixture of verbal and non-verbal cues, significantly contributing to conveying emotion and mental state. While humans instinctively recognise such cues, explaining and defining them in natural language is challenging, rendering tasks in affective computing more complex than other discriminative tasks, e.g. object detection, as the ground truths are ambiguous in classic supervised settings. The primary focus of this thesis is to develop automated methods to predict human emotion and negative symptoms of schizophrenia, primarily from non-verbal facial cues. By acknowledging the substantial methodological and contextual overlap between predicting human emotion and assessing negative symptoms of schizophrenia—closely linked to affect and emotion—we adopt a concurrent approach, focusing on three key challenges predominantly tied to the nature of ground truth labels. More specifically, through this thesis, we aim to address a) the label uncertainty in human affect, which stems from the inherently noisy nature of human emotions and can be seen in practice by annotators' disagreement, b) labels that describe a broader behaviour resulting in low-label resolution and c) the vast variability of human affect, which results in subjective emotional descriptions when expressed in natural language. Firstly, we propose a method that addresses label uncertainty in continuous affect. We assume that each ground truth label is a univariate Gaussian distribution with a mean equal to the ground truth mean and an unknown variance that is predicted by the network. The Kullback-Leibler-based loss minimises the distance between the Gaussian ground truth and the Dirac delta prediction. We show that the proposed loss improves convergence and a relationship between the estimated variance and noisy samples. Secondly, we propose a deep learning approach for continuous affect and symptom estimation in long video samples that learns from the clip and the batch context. Contrary to previous works that addressed the problem either using statistical representations or trimming videos into shorter clips, we propose using features from the wider video when making a clip prediction. We also introduce a novel loss as an auxiliary task, named the relational regression loss that aligns the continuous label vector distances in the mini-batch to those of the latent features. The ablation studies show that both components offer significant performance improvements to both tasks. Finally, we develop a novel vision-language model that utilises sample-level text descriptions as natural language supervision to learn semantically rich representations for each sample to address the intra-class variability of emotional expression. Then, during inference, we use category-level descriptions for each emotion in a zero-shot approach rather than the typical class prototypes previously used in zero-shot Facial Expression Recognition. We also use the vision modality as a backbone for the downstream task of schizophrenia symptom estimation. The method shows significant improvement compared to baseline methods and outperforms previous works on both tasks, showing the benefit of more fine-grained approaches.en_US
dc.language.isoenen_US
dc.publisherQueen Mary University of Londonen_US
dc.titleDeep Learning Methods for Affect and Schizophrenia Symptom Estimationen_US
dc.typeThesisen_US
pubs.notesNot knownen_US
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US
qmul.funderEPSRC DTP Studentship::Engineering and Physical Sciences Research Councilen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Theses [4248]
    Theses Awarded by Queen Mary University of London

Show simple item record