Show simple item record

dc.contributor.authorLakhal, Men_US
dc.date.accessioned2022-11-02T15:25:49Z
dc.date.issued2022
dc.identifier.urihttps://qmro.qmul.ac.uk/xmlui/handle/123456789/82213
dc.description.abstractThis thesis aims at reproducing the video of an indoor scene as seen from another, targeted, view using modalities such as depth and skeleton as guidance. However, synthesizing the video containing a moving person is challenging due to the camera placement in the scene that causes scale difference and self-occlusion. The other key challenge is maintaining temporal consistency across the synthesized frames. Current state-of-the-art methods focus on synthesizing each frame separately, which can cause the loss of the motion information contained in the input view. Therefore, we need to model the temporal consistency for a smooth transitioning between the synthesized frames. We consider a neural network-based approach and use the body skeleton as a driving cue, visible texture transfer for self-occlusion, and recurrent neural network to maintain temporal consistency in the feature space. We propose a 2D-based synthesis network that specifically disentangles the encoding of the input image and the target pose which allows learning better features that lead to better image synthesis. We also propose a training strategy based on a pixel-wise loss function that improves high-frequency details to enhance the visual quality of the synthesized images. Moreover, we propose a novel masking scheme to account for the scale difference and the spatial shift and deformation between the input and output skeleton. We propose a new formulation of the 2D-based synthesis network to address the temporal consistency constraint on the synthesized multi-view frames. In particular, we extend recurrent neural networks to learn a spatiotemporal feature space that preserves the texture and approximates the targeted view. In addition, we propose a hybrid approach combining a direct texture transfer of the visible pixel from the input to the targeted view and a 3D-based synthesis network for refinement. Experimental results on standard image and multi-view video benchmarks improve existing alternatives in terms of visual quality and the smoothness of the synthesized frames.en_US
dc.language.isoenen_US
dc.titleModality-Based Multi-View Indoor Video Synthesisen_US
pubs.notesNot knownen_US
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Theses [4235]
    Theses Awarded by Queen Mary University of London

Show simple item record