Diffusion models have brought about a revolution in text-to-image generation, offering remarkable quality and creativity. However, it’s worth noting that their multi-step sampling procedure is recognized for its sluggishness, often demanding numerous inference steps to achieve desirable outcomes. In this paper, the authors introduce an innovative one-step generative model derived from the open-source Stable Diffusion (SD) model.
They discovered that a straightforward attempt to distil SD led to complete failure due to a significant issue: the suboptimal coupling of noise and images, which greatly hindered the distillation process. To overcome this challenge, the researchers turned to Rectified Flow, a recent advancement in generative models that incorporates probabilistic flows. Rectified Flow incorporates a unique technique called reflow, which gradually straightens the trajectory of probability flows.
This, in turn, reduces the transport cost between the noise distribution and the image distribution. This improvement in coupling greatly facilitates the distillation process, addressing the initial problem. The above image demonstrates the working of Instaflow.
Utilization of a one-step diffusion-based text-to-image generator is evidenced by an FID (Fréchet Inception Distance) score of 23.3 on the MS COCO 2017-5k dataset, which represents a substantial improvement over the previous state-of-the-art technique known as progressive distillation (37.2 → 23.3 in FID). Furthermore, by employing an expanded network featuring 1.7 billion parameters, the researchers have managed to enhance the FID even further, achieving a score of 22.4. This one-step model is referred to as “InstaFlow.”
On the MS COCO 2014-30k dataset, InstaFlow demonstrates exceptional performance with an FID of 13.1 in just 0.09 seconds, making it the best performer in the ≤ 0.1-second category. This outperforms the recent StyleGAN-T model (13.9 in 0.1 second). Notably, the training of InstaFlow is accomplished with a relatively low computational cost of only 199 A100 GPU days.
Based on these results, researchers have proposed the following contributions:
Improving One-Step SD: The training of the 2-Rectified Flow model did not fully converge, investing 75.2 A100 GPU days. This is only a fraction of the training cost of the original SD (6250 A100 GPU days). By scaling up the dataset, model size, and training duration, researchers believe the performance of one-step SD will improve significantly.
One-Step ControlNet: By applying our pipeline to train ControlNet models, it is possible to get one-step ControlNets capable of generating controllable contents within milliseconds.
Personalization for One-Step Models: By fine-tuning SD with the training objective of diffusion models and LORA, users can customize the pre-trained SD to generate specific content and styles.
Neural Network Structure for One-Step Generation: With the advancement of creating one-step SD models using text-conditioned reflow and distillation, several intriguing directions arise:
(1) exploring alternative one-step structures, such as successful architectures used in GANs, that could potentially surpass the U-Net in terms of quality and efficiency;
(2) leveraging techniques like pruning, quantization, and other approaches for building efficient neural networks to make one-step generation more computationally affordable while minimizing potential degradation in quality.
Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
The post Meet InstaFlow: A Novel One-Step Generative AI Model Derived from the Open-Source StableDiffusion (SD) appeared first on MarkTechPost.