Neural Surface Reconstruction of Dynamic Scenes

with Monocular RGB-D Camera

NeurIPS 2022 (Spotlight)

Hongrui Cai1     Wanquan Feng1     Xuetao Feng2      Yan Wang2      Juyong Zhang1
1University of Science and Technology of China     2Alibaba Group    

TL;DR: We propose Neural-DynamicReconstruction (NDR), a template-free method to recover high-fidelity geometry (shown in right), motions and appearance (shown in left) of a dynamic scene from a monocular RGB-D camera.

Abstract

We propose Neural-DynamicReconstruction (NDR), a template-free method to recover high-fidelity geometry and motions of a dynamic scene from a monocular RGB-D camera. In NDR, we adopt the neural implicit function for surface representation and rendering such that the captured color and depth can be fully utilized to jointly optimize the surface and deformations. To represent and constrain the non-rigid deformations, we propose a novel neural invertible deforming network such that the cycle consistency between arbitrary two frames is automatically satisfied. Considering that the surface topology of dynamic scene might change over time, we employ a topology-aware strategy to construct the topology-variant correspondence for the fused frames. NDR also further refines the camera poses in a global optimization manner. Experiments on public datasets and our collected dataset demonstrate that NDR outperforms existing monocular dynamic reconstruction methods.

Method

We adopt the neural SDF and radiance field to respectively represent the high-fidelity geometry and appearance in the canonical space (shown in top). In our framework, each RGB-D frame can be integrated into the canonical representation. We propose a novel neural deformation representation (shown in bottom) that implies a continuous bijective map between observation and canonical space. The designed invertible module applies a cycle consistency constraint through the whole RGB-D video; meanwhile, it fits the natural properties of non-rigid motion well.

Results

We test our NDR on multiple datasets (KillingFusion dataset [Slavcheva et al. 2017], DeepDeform dataset [Bozic et al. 2020] and our collected dataset) which covers various object classes and challenging cases.

Human Body and Head

Plant, Toy and Cloth

Reconstruction Showcase

Comparisons

We compare our NDR with RGB-D based methods and recent RGB based method.

Comparisons with RGB-D Based Methods

Comparisons with RGB Based Method

Ablation Studies

We also evaluate several key components of our NDR regarding their effects on the final reconstruction result.

BibTeX

If you find NDR useful for your work please cite:

@inproceedings{Cai2022NDR,
  author    = {Hongrui Cai and Wanquan Feng and Xuetao Feng and Yan Wang and Juyong Zhang},
  title     = {Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera},
  booktitle = {Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS)},
  year      = {2022}
}

Acknowledgements

This research was partially supported by the National Natural Science Foundation of China (No.62122071, No.62272433), the Fundamental Research Funds for the Central Universities (No. WK3470000021), and Alibaba Group through Alibaba Innovation Research Program (AIR). The opinions, findings, conclusions, and recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies or the government. We thank the authors of OcclusionFussion for sharing the fusion results of several RGB-D sequences. We also thank the authors of BANMo for their suggestions on experimental parameter settings. Special thanks to Prof. Weiwei Xu for providing some help.