Recent 3D content generation pipelines commonly employ Variational Autoencoders (VAEs) to encode shapes into compact latent representations for diffusion-based generation. However, the widely adopted uniform point sampling strategy in Shape VAE training often leads to a significant loss of geometric details, limiting the quality of shape reconstruction and downstream generation tasks.
We present Dora-VAE, a novel approach that enhances VAE reconstruction through our proposed sharp edge sampling strategy and a dual cross-attention mechanism. By identifying and prioritizing regions with high geometric complexity during training, our method significantly improves the preservation of fine-grained shape features. Such sampling strategy and the dual attention mechanism enable the VAE to focus on crucial geometric details that are typically missed by uniform sampling approaches.
To systematically evaluate VAE reconstruction quality, we additionally propose Dora-bench, a benchmark that quantifies shape complexity through the density of sharp edges, introducing a new metric focused on reconstruction accuracy at these salient geometric features. Extensive experiments on the Dora-bench demonstrate that Dora-VAE achieves comparable reconstruction quality to the state-of-the-art dense XCube-VAE while requiring a latent space at least 8x smaller (1,280 vs. > 10,000 codes).
GT geometry
Reconstructed geometry (ours)
Sharp edge sampling (ours)
Uniform sampling
The 3D asset generated by our model is ready for diffusion-based character control in modern 3D engines, such as Unity 3D, in real-time.
We present Dora-VAE for high-quality 3D reconstruction, and Dora-Bench for 3D VAE evaluation.
The improved reconstruction quality offered by Dora-VAE can directly boost the performance ceiling
of diffusion models, enabling higher-quality generation results under the same training conditions.
For each input mesh, we augment the uniformly sampled point cloud Pu
with more important points Pa sampled by our proposed sharp edge sampling strategy,
which forms the dense point cloud Pd. During the encoding process, we compute the attention
for Pu and Pa separately via a simple-yet-effective dual cross-attention mechanism and
sum the results for self-attention to compute the latent code z.
To enable more rigorous evaluation of VAE performance, we propose Dora-bench,
a benchmark that systematically categorizes test shapes based on their geometric complexity.
Unlike previous methods that use randomly selected test sets,
we measure shape complexity using the number of salient edges
and classify shapes into four levels. We curate test shapes from multiple public datasets
including GSO, ABO, Meta, and Objaverse test sets to ensure diverse geometric complexities.