Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction


1Purdue University, 2Korea University, 3Honda Research Institute USA
ACM MM 2025
Teaser
(left) In human–object interaction (HOI) scenarios, occlusions frequently affect both the human and the object. (middle) We inpaint the occluded regions while preserving temporal consistency for both entities across frames. (right) Leveraging the temporally consistent image sequences, we reconstruct the human and object using a 3D Gaussian splatting representation, enabling animatable 3D HOI applications.

Abstract


We introduce a novel framework for reconstructing dynamic human-object interactions from monocular video that overcomes challenges associated with occlusions and temporal inconsistencies. Traditional 3D reconstruction methods typically assume static objects or full visibility of dynamic subjects, leading to degraded performance when these assumptions are violated-particularly in scenarios where mutual occlusions occur. To address this, our framework leverages amodal completion to infer the complete structure of partially obscured regions. Unlike conventional approaches that operate on individual frames, our method integrates temporal context, enforcing coherence across video sequences to incrementally refine and stabilize reconstructions. This template-free strategy adapts to varying conditions without relying on predefined models, significantly enhancing the recovery of intricate details in dynamic scenes. We validate our approach using 3D Gaussian Splatting on challenging monocular videos, demonstrating superior precision in handling occlusions and maintaining temporal stability compared to existing techniques.

Video


Overview


Teaser
Overview of Our Framework: Given a Human-Object Interaction (HOI) monocular video, our framework performs amodal completion through (1) Bidirectional Temporal Feature Warping via optical flow, (2) Temporal Fusion Attention for aggregating multi-frame context, (3) Template-free Occlusion Identification using 2D and 3D cues, and (4) temporally-aware amodal completion. This design enables temporally consistent and accurate amodal completion in complex HOI scenarios.

Application in 3D Reconstruction


Our Occlusion-aware, Temporally Consistent Amodal Completion framework enables photo-realistic and animatable 3D Human–Object interaction reconstructions from monocular video using 3D Gaussian Splatting.

Conditioned on motion trajectories from the input video, our method enables realistic animation of Novel Human–Object pairs while preserving geometry, appearance, and temporal coherence.


Qualitative Comparison


Teaser
Qualitative comparison on BEHAVE (Square Table, Small Table) and InterCap (Skateboard). Our method produces accurate and temporally consistent completions of occluded regions.

Quantitative Comparison


Teaser
Quantitative Comparison on BEHAVE and InterCap for Amodal Completion and Temporal Consistency: Our method achieves consistently strong performance, highlighting its robustness to occlusion and temporal challenges. Bold and underline denote the best and second-best scores.

BibTeX

@article{doh2025occlusion,
      title={Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction},
      author={Doh, Hyungjun and Lee, Dong In and Chi, Seunggeun and Huang, Pin-Hao and Lee, Kwonjoon and Kim, Sangpil and Ramani, Karthik},
      journal={arXiv preprint arXiv:2507.08137},
      year={2025}
    }