Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Rui Chen1,2,   Jianfeng Zhang2*,   Yixun Liang1,3,   Guan Luo2,4,   Weiyu Li1,3,  

Jiarui Liu

1,3
Xiu Li2,   Xiaoxiao Long1,3,   Jiashi Feng2,   Ping Tan1,3*

*Corresponding authors

1 The Hong Kong University of Science and Technology

2 Bytedance Seed

3 LightIllusions

4 Tsinghua University

Abstract

Recent 3D content generation pipelines commonly employ Variational Autoencoders (VAEs) to encode shapes into compact latent representations for diffusion-based generation. However, the widely adopted uniform point sampling strategy in Shape VAE training often leads to a significant loss of geometric details, limiting the quality of shape reconstruction and downstream generation tasks. We present Dora-VAE, a novel approach that enhances VAE reconstruction through our proposed sharp edge sampling strategy and a dual cross-attention mechanism. By identifying and prioritizing regions with high geometric complexity during training, our method significantly improves the preservation of fine-grained shape features. Such sampling strategy and the dual attention mechanism enable the VAE to focus on crucial geometric details that are typically missed by uniform sampling approaches. To systematically evaluate VAE reconstruction quality, we additionally propose Dora-bench, a benchmark that quantifies shape complexity through the density of sharp edges, introducing a new metric focused on reconstruction accuracy at these salient geometric features. Extensive experiments on the Dora-bench demonstrate that Dora-VAE achieves comparable reconstruction quality to the state-of-the-art dense XCube-VAE while requiring a latent space at least 8x smaller (1,280 vs. > 10,000 codes).

Video

The reconstructed results of Dora-VAE in Dora-bench (Point clouds are interactable)

GT geometry

Reconstructed geometry (ours)

Sharp edge sampling (ours)

Uniform sampling

More Results

Image to 3D

More Results

character control

The 3D asset generated by our model is ready for diffusion-based character control in modern 3D engines, such as Unity 3D, in real-time.

Method

We present Dora-VAE for high-quality 3D reconstruction, and Dora-Bench for 3D VAE evaluation. The improved reconstruction quality offered by Dora-VAE can directly boost the performance ceiling of diffusion models, enabling higher-quality generation results under the same training conditions.

Dora-VAE

For each input mesh, we augment the uniformly sampled point cloud Pu with more important points Pa sampled by our proposed sharp edge sampling strategy, which forms the dense point cloud Pd. During the encoding process, we compute the attention for Pu and Pa separately via a simple-yet-effective dual cross-attention mechanism and sum the results for self-attention to compute the latent code z.

Dora-bench

To enable more rigorous evaluation of VAE performance, we propose Dora-bench, a benchmark that systematically categorizes test shapes based on their geometric complexity. Unlike previous methods that use randomly selected test sets, we measure shape complexity using the number of salient edges and classify shapes into four levels. We curate test shapes from multiple public datasets including GSO, ABO, Meta, and Objaverse test sets to ensure diverse geometric complexities.