SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model

Yukai Shi^1,3, Weiyu Li^2,4, Zihao Wang⁴, Hongyang Li³, Xingyu Chen³, Ping Tan^2,4, Lei Zhang³

¹ Tsinghua University ² HKUST ³ IDEA Research ⁴ LightIllusions

Scene Image	Normal Map	3D Scene

Scene Image	Normal Map	3D Scene

Scene Image	Normal Map	3D Scene

Scene Image	Normal Map	3D Scene

Scene Image	Normal Map	3D Scene

Scene Image	Normal Map	3D Scene

Abstract

We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation priors, existing methods struggle to simultaneously produce high-quality geometry and accurate poses under severe occlusion and open-set settings. To address these issues, we first decouple the de-occlusion model from 3D object generation, and enhance it by leveraging image datasets and collected de-occlusion datasets for much more diverse open-set occlusion patterns. Then, we propose a unified pose estimation model that integrates global and local mechanisms for both self-attention and cross-attention to improve accuracy. Besides, we construct an open-set 3D scene dataset to further extend the generalization of the pose estimation model. Comprehensive experiments demonstrate the superiority of our decoupled framework on both indoor and open-set scenes. Our codes and datasets will be released.

Framework

Our framework consists of three main components:

Scene Perception: Understanding the input scene structure
3D Object Generation under Occlusion: Decoupled de-occlusion model for robust object generation
Pose Estimation: Unified pose estimation model with global and local attention mechanisms

We decouple the de-occlusion model from 3D object generation. We construct a unified pose estimation model that incorporates both global and local attention mechanisms.

🚀 Open Source Progress

✅ Dataset: Available
✅ Inference Code: Released
✅ Training Code: Released

Note The open-source release uses FLUX Kontext as the de-occlusion model and Step1X-3D as the 3D generation model. This is a bit different from the exact implementation described in the paper.

🛠️ Installation

Install Python dependencies for python 3.10:

pip install -r requirements.txt

Install MoGe for depth estimation:

MoGe repo: https://github.com/microsoft/MoGe
Please follow the official MoGe repository instructions for installation.

Install Step1x-3D for 3D obejct generation:

git clone --depth 1 --branch main https://github.com/stepfun-ai/Step1X-3D.git

Download checkpoints from Hugging Face, and place in the corresponding folders ( ckpts/ ):

SceneMaker checkpoints: https://huggingface.co/horizon171852/SceneMakerSceneMaker

🎬 Demo (Segmentation + De-occlusion + 3D object generation + Pose estimation)

bash run_gradio.sh

🎯 Single Pose Estimation

Select corresponding checkpoints (indoor / open-set), and run scripts:

bash run_generation.sh

🏋️ Training

Download required datasets:

InstPIFu: https://github.com/GAP-LAB-CUHK-SZ/InstPIFu
MIDI-3D: https://github.com/VAST-AI-Research/MIDI-3D
SceneMaker OpenSet Dataset: https://huggingface.co/datasets/LightillusionsLab/

Select config in configs/image-to-scene-diffusion and Run scripts:

bash run_train.sh

Citation

If you find our work useful in your research, please consider citing:

@article{shi2025scenemaker,
  title={SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model},
  author={Shi, Yukai and Li, Weiyu and Wang, Zihao and Li, Hongyang and Chen, Xingyu and Tan, Ping and Zhang, Lei},
  journal={arXiv preprint arXiv:2512.10957},
  year={2025}
}

Acknowledgement

We would like to thank the authors of the following projects for their excellent work and open-source contributions:

MoGe - Monocular depth estimation
SAM - Segment Anything Model for image segmentation
DINO-X - Grounding segementation
CraftsMan - 3D object generation
Step1x-3D - 3D object generation
Hunyuan3D - 3D object generation
MIDI3D - Multi-instance 3D scene generation
InstPIFu - Indoor 3D scene generation

Their contributions have been invaluable to the development of SceneMaker.

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Step1X-3D		Step1X-3D
assets		assets
configs/image-to-scene-diffusion		configs/image-to-scene-diffusion
craftsman		craftsman
data		data
image-to-scene-diffusion		image-to-scene-diffusion
outputs		outputs
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_mp4_to_gif.py		convert_mp4_to_gif.py
requirements.txt		requirements.txt
run_generation.sh		run_generation.sh
run_gradio.sh		run_gradio.sh
run_train.sh		run_train.sh
scene_deocc.py		scene_deocc.py
scene_generation.py		scene_generation.py
scenemaker_app.py		scenemaker_app.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model

Abstract

Framework

🚀 Open Source Progress

🛠️ Installation

🎬 Demo (Segmentation + De-occlusion + 3D object generation + Pose estimation)

🎯 Single Pose Estimation

🏋️ Training

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model

Abstract

Framework

🚀 Open Source Progress

🛠️ Installation

🎬 Demo (Segmentation + De-occlusion + 3D object generation + Pose estimation)

🎯 Single Pose Estimation

🏋️ Training

Citation

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages