Diffusion models explained. How does OpenAI's GLIDE work?

87,606

2,306 0

Published 2022-03-23

Diffusion models beat GANs in image synthesis, GLIDE generates images from text descriptions, surpassing even DALL-E in terms of photorealism! Check out this video to learn how diffusion models work. Enjoy the visuals!
SPONSOR: Weights & Biases 👉 wandb.me/ai-coffee-break

❓ Check out our daily #MachineLearning Quiz Questions: youtube.com/c/AICoffeeBreak/community
➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring....

Recommended videos:
📺 DALL-E video:    • OpenAI's DALL-E explained. How GPT-3 ...
📺 GAN explained video:    • GANs explained | Generative Adversari...
📺 CLIP video:    • OpenAI’s CLIP explained! | Examples, ...

Papers:
📜 GLIDE paper: Nichol, Alex, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021). arxiv.org/abs/2112.10741
🔗 GLIDE mini, demo: huggingface.co/spaces/valhalla/glide-text2im
📜 Diffusion models for image generation: Dhariwal, Prafulla, and Alexander Nichol. "Diffusion models beat GANs on image synthesis." Advances in Neural Information Processing Systems 34 (2021). arxiv.org/abs/2105.05233
📜 Original diffusion models paper: Sohl-Dickstein, Jascha, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. "Deep unsupervised learning using nonequilibrium thermodynamics." In International Conference on Machine Learning, pp. 2256-2265. PMLR, 2015. arxiv.org/abs/1503.03585
🔗 Check out this awesome blogpost by Lilian Weng: lilianweng.github.io/lil-log/2021/07/11/diffusion-…
🔗 Flow-based models: lilianweng.github.io/lil-log/2018/10/13/flow-based…
🔗 DALL-E blog post: openai.com/blog/dall-e/
💻 If interested in the basic code of diffusion models, here is a wonderful annotated diffusion model from 🤗: huggingface.co/blog/annotated-diffusion

Outline:
00:00 Diffusion models are cool
00:33 Weights & Biases (Sponsor)
01:51 4 types of generative models (in 2022)
05:13 Diffusion models explained
08:27 Why are diffusion models good at photorealism? – Diffusion models beat GANs
10:36 GLIDE explained
12:16 Classifier-guided diffusion, CLIP-guided diffusion
13:56 Classifier-free guidance

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Don Rosenthal, Dres. Trost GbR, banana.dev -- Kyle Morris, Joel Ang, Julián Salazar, Edvard Grødem

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: www.patreon.com/AICoffeeBreak
Ko-fi: ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

------------------------------------
🔗 Links:
AICoffeeBreakQuiz: youtube.com/c/AICoffeeBreak/community
Twitter: twitter.com/AICoffeeBreak
Reddit: www.reddit.com/r/AICoffeeBreak/
YouTube: youtube.com/AICoffeeBreak

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research

Video contains the rock emoji designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0

Music 🎵 : Tell Me That I Can't (Instr

All Comments (21)

@Mrbits01 2 years ago

As I was about to go and generate the avocado armchair, I heard you say no avocado armchair. My disappointment is immeasurable and my day is ruined.
@MachineLearningStreetTalk 2 years ago

Amazing production quality! Here we go!!
@AICoffeeBreak 2 years ago

Sorry, the upload seems buggy. Re-uploading did not help. I'll wait to see if this gets better over time. Did you try turning it off and on again? 🤖
@LecrazyMaffe 1 year ago

This video offers one of the best explanations for classifier-free guidance.
@CristianGarcia 2 years ago

Something not stated in the video is that Diffusion Models are WAY easier to train than GANs. Although it requires you to code the forward and backward diffusion procedures, training is rather stable which is more gratifying. Might release a tutorial on training diffusion models on a toy-ish dataset in the near future :)
@alfcnz 2 years ago

Nice high-level summary. Thanks!
@ElieAtik 1 year ago

This is the only video that goes into how OpenAI used text/tokens in combination with the diffusion model in order to achieve such results. That was very helpful.
@tylerk3130 2 years ago

Thank you for the first effective high-level explanation of Diffusion I've found. Truly, I do not know how I went so long in this space not knowing about your channel.
@emiliomorales2843 2 years ago

I was waiting for this Leticia, love your channel, thank you
@OP-yw3ws 6 months ago

You explained the CFG so well. I was trying to wrap my head around it for a while!
@jonahturner2969 2 years ago

Love your channel! Cat videos get millions of views. Your videos might get in the thousands of views, but they have a huge impact by explaining high level concepts to people who can actually use them. Please keep up your exceptional work
@alexandrupapiu3310 1 year ago

This was soo informative. And the humour was spot on!
@undergrad4980 2 years ago

Great explanation. Thank you.
@ArjunKumar123111 2 years ago

I'm here to speculate Ms Coffee Bean knew the existence of DALLE 2... Convenient timing...
@phizc 9 months ago

Wow what a difference a few months make. Dall-E 2 in April, Midjourney in July, and Stable Diffusion in August. Hi from the future 😊.
@samanthaqiu3416 2 years ago

I love Yannic, but boy do I like your articulate presentation? I think I do
@amirarsalanrajabi5171 1 year ago

Just found your channel yesterday and I'm loving it! Way to go !
@r00t257 1 year ago

love your video so much! lots of helpful intuition 🌻🌻💮Thanks ms. coffee bean a lot
@balcaenpunch 2 years ago

At @3:55, in "227" the two "2s" written differently - I have never seen someone else other than myself do this! Cheers, Letitia. Great video.
@tripzero0 2 years ago

I finally understand diffusion! (Not really but moreso than before)