Recent advancements in text-to-3D generation have shown remarkable results by leveraging 3D priors in combination with 2D diffusion. However, the 3D priors adopted by previous methods lacked details and complex structural information, restricting them to generating objects with simple structures. This limitation poses challenges in generating objects with complex structures, such as bonsai. In this paper, we propose 3DBonsai, a novel text-to-3D framework for generating 3D bonsai with complex structures. Technically, we first design a trainable 3D space colonization algorithm to produce bonsai structures, which are then enhanced through random sampling and point cloud augmentation to serve as the 3D Gaussian priors. We then propose two pipelines for bonsai generation with different structure levels. One is fine structure conditioned generation, where 3DBonsai directly initializes the 3D Gaussian with the 3D structure prior to render bonsai with detailed and complex structures. Another is coarse structure conditioned generation, in which 3DBonsai uses a multi-view structure consistency learning module to align the 2D and 3D structures. Moreover, we have compiled a unified 2D and 3D Chinese-style bonsai dataset. Our experimental results demonstrate that 3DBonsai significantly outperforms existing methods, providing a new benchmark for structure-aware 3D bonsai generation.
Due to the mesh size, the mesh displayed on the page has been compressed by a factor of 10, which may result in reduced clarity.