Modality-Based Multi-View Indoor Video Synthesis

Lakhal, M

dc.contributor.author	Lakhal, M	en_US
dc.date.accessioned	2022-11-02T15:25:49Z
dc.date.issued	2022
dc.identifier.uri	https://qmro.qmul.ac.uk/xmlui/handle/123456789/82213
dc.description.abstract	This thesis aims at reproducing the video of an indoor scene as seen from another, targeted, view using modalities such as depth and skeleton as guidance. However, synthesizing the video containing a moving person is challenging due to the camera placement in the scene that causes scale difference and self-occlusion. The other key challenge is maintaining temporal consistency across the synthesized frames. Current state-of-the-art methods focus on synthesizing each frame separately, which can cause the loss of the motion information contained in the input view. Therefore, we need to model the temporal consistency for a smooth transitioning between the synthesized frames. We consider a neural network-based approach and use the body skeleton as a driving cue, visible texture transfer for self-occlusion, and recurrent neural network to maintain temporal consistency in the feature space. We propose a 2D-based synthesis network that specifically disentangles the encoding of the input image and the target pose which allows learning better features that lead to better image synthesis. We also propose a training strategy based on a pixel-wise loss function that improves high-frequency details to enhance the visual quality of the synthesized images. Moreover, we propose a novel masking scheme to account for the scale difference and the spatial shift and deformation between the input and output skeleton. We propose a new formulation of the 2D-based synthesis network to address the temporal consistency constraint on the synthesized multi-view frames. In particular, we extend recurrent neural networks to learn a spatiotemporal feature space that preserves the texture and approximates the targeted view. In addition, we propose a hybrid approach combining a direct texture transfer of the visible pixel from the input to the targeted view and a 3D-based synthesis network for refinement. Experimental results on standard image and multi-view video benchmarks improve existing alternatives in terms of visual quality and the smoothness of the synthesized frames.	en_US
dc.language.iso	en	en_US
dc.title	Modality-Based Multi-View Indoor Video Synthesis	en_US
pubs.notes	Not known	en_US
rioxxterms.funder	Default funder	en_US
rioxxterms.identifier.project	Default project	en_US

Files in this item

Name:: MIL_PhD_Thesis.pdf
Size:: 42.25Mb
Format:: application/
Description:: PhD Thesis

View/Open

This item appears in the following Collection(s)

Theses [4235]
Theses Awarded by Queen Mary University of London

Show simple item record