D3-Human: Dynamic Disentangled Digital Human from Monocular Video

Honghu Chen     Bo Peng     Yunfan Tao          Juyong Zhang
University of Science and Technology of China    

Abstract

We introduce D3-Human, a method for reconstructing Dynamic Disentangled Digital Human geometry from monocular videos. Past monocular video human reconstruction primarily focuses on reconstructing undecoupled clothed human bodies or only reconstructing clothing, making it difficult to apply directly in applications such as animation production. The challenge in reconstructing decoupled clothing and body lies in the occlusion caused by clothing over the body. To this end, the details of the visible area and the plausibility of the invisible area must be ensured during the reconstruction process. Our proposed method combines explicit and implicit representations to model the decoupled clothed human body, leveraging the robustness of explicit representations and the flexibility of implicit representations. Specifically, we reconstruct the visible region as SDF and propose a novel human manifold signed distance field (hmSDF) to segment the visible clothing and visible body, and then merge the visible and invisible body. Extensive experimental results demonstrate that, compared with existing reconstruction schemes, D3-Human can achieve high-quality decoupled reconstruction of the human body wearing different clothing, and can be directly applied to clothing transfer and animation production.

Method

The optimization process is divided into two steps: template generation and detailed deformation. The object is initialized as a DMTet representation, and is optimized to form a complete clothed human. An optimizable HmSDF function separates the clothing and body regions, with missing parts filled by SMPL. After generating the disentangled template, we use two MLPs to model detailed deformations for each frame of the body and clothing meshes separately. Finally, the meshes are transformed to the observed space using a forward LBS deformation, supervised by images, normal maps, and parsing masks with a differentiable renderer.

Reconstruction Results

We present the reconstruction results of the sequence and compare them with related methods.

UDF or hmSDF?

We present ablation studies on template generation using UDF and hmSDF.

Applicaions

Reconstructing disentangled clothing and body not only makes applications more convenient and allows for direct animation using physics-based methods, but also enables clothing exchange for clothing transfer, and facilitates the animation of the disentangled clothed human after clothing transfer.