We propose a pipeline to realize the multimodal 6-DoF immersive VR experiences. We applied a carefully designed rig to
(a) simultaneously capture multi-view video and audio. The
(b1) presents our reconstruction of dynamic light field based on STG [2] while
(b2)
demonstrates the construction process of sound field. We have achieved better results than the original algorithm in long-term dynamic
scenes by incorporating affine color transformation and t-dimensional density control. Ultimately, we achieve a 6-DoF immersive experience in both light and sound fields, and also benchmark on recent representative methods like 4DGS [3] and 4Drotor [4] to demonstrate
the effectiveness of both our dataset and baseline method.
The handheld monocular camera is easy to move, providing limited perspectives from various locations. In contrast, the fixed camera arrays, while stationary, offers dense perspectives within a limited range. We aim to combine the advantages of both to design an effective data collection system
and strategy for fully immersive VR experience -- a feature not found in existing datasets.