Cross-Modal Fusion Techniques for Utterance-Level Emotion Recognition from Text and Speech

Luo, J; Phan, H; Reiss, J

dc.contributor.author	Luo, J
dc.contributor.author	Phan, H
dc.contributor.author	Reiss, J
dc.date.accessioned	2024-07-11T10:38:45Z
dc.date.available	2024-07-11T10:38:45Z
dc.date.issued	2023-01-01
dc.identifier.issn	1520-6149
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/97996
dc.description.abstract	Multimodal emotion recognition (MER) is a fundamental complex research problem due to the uncertainty of human emotional expression and the heterogeneity gap between different modalities. Audio and text modalities are particularly important for a human participant in understanding emotions. Although many successful attempts have been designed multimodal representations for MER, there still exist multiple challenges to be addressed: 1) bridging the heterogeneity gap between multimodal features and model inter- and intramodal interactions of multiple modalities; 2) effectively and efficiently modeling the contextual dynamics in the conversation sequence. In this paper, we propose Cross-Modal RoBERTa (CM-RoBERTa) model for emotion detection from spoken audio and corresponding transcripts. As the core unit of the CM-RoBERTa, parallel self- and cross- attention is designed to dynamically capture inter- and intra-modal interactions of audio and text. Specially, the mid-level fusion and residual module are employed to model longterm contextual dependencies and learn modality-specific patterns. We evaluate the approach on the MELD dataset and the experimental results show the proposed approach achieves the state-of-art performance on the dataset.	en_US
dc.title	Cross-Modal Fusion Techniques for Utterance-Level Emotion Recognition from Text and Speech	en_US
dc.type	Conference Proceeding	en_US
dc.identifier.doi	10.1109/ICASSP49357.2023.10096885
pubs.notes	Not known	en_US
pubs.publication-status	Published	en_US

Files in this item

Name:: Cross-Modal_Fusion_Techniques_ ...
Size:: 1.827Mb
Format:: application/
Description:: Published version

View/Open

This item appears in the following Collection(s)

Electronic Engineering and Computer Science [3475]

Show simple item record