Here we are, the main event! Finally, onto diffusion models!
This was, luckily, just as interesting as I was hoping, and very easy to move through.
The lesson has no book chapters, and on quick glance it seems like none of the future lessons do either. They warn pretty strongly that much of the content soon be outdated, because of the pace of research, but that many of the concepts in the lessons are fairly foundational to how neural nets work.
My sense also is that I'm going to start to read a lot more papers. I read some ML research while in my school's lab, but most of this graphics stuff is new to me. This week, I read the U-Net paper, which I felt I mostly grokked, but likely need to revisit further into the course. A personal note that:
Anyways, Jeremy's lesson was pretty interesting even just from the POV of someone who has casually used diffusion models before. In particular, it was interesting to see how random seeds work, and to see the "insides" of the model. Basically: it was neat to see that diffusion is a gradual process, and that you can look at a model's results after different inference steps. (One of my instincts: there's probably some really interesting conceptual stuff you could do artistically, re only partially diffused models - or that this is extremely under-explored!)
Other stuff that's generally available in models was pretty interesting, like guidance scale, negative prompts, "strength" in image-to-image generation, and general stuff around fine-tuning. A friend's startup places people's faces into diffusion-generated images; it was interesting to see that this was (likely) done with textual inversion, where an obscure concept is trained into the model. One of the weirder things to think about was that you could fine tune with an image captioning model - or, basically, that you could feed training data to the model using another model! Pretty cool.
In working through this course, I'm most excited about learning how to read/implement papers, and try new things — Jeremy mentions that this stuff gets to postgrad level, and it's pretty neat there's resources to get to the so-called 'frontier of knowledge' here!
I'm pretty compelled by the idea of doing more of this, but I sense I'm nearing the point where I could risk hitting "tutorial hell:" I'm sure watching the Andrej Karpathy videos would help me understand backpropagation better (and maybe I'll watch some of them), but at a certain point, I think the best way to learn is through implementation and figuring out what I'm missing. Seems like there's tons of stories online of people who keep doing Massive Open Online Courses without actually applying what they're doing.
This said: I got a ton many years ago out of doing freeCodeCamp; I only got a lot out of it because freeCodeCamp gave me enough basics to reach escape velocity on my own, with projects and in work settings, and I think that's the same way to view all of the stuff here. Basically: 'just enough formal training' is the goal here. And: I tend to prefer self-paced / higher quality online stuff to traditional instruction, so this sort of 'formal / informal' course, I think, will be perfect :)
I need a case in which to implement this stuff: I've been working on a tool for storyboarding with image models. Many people asked for an ability to do consistent characters — I think I could pretty easily implement some of the training from the course to build this.
For my own use, I'm going to explain the broad process of how diffusion models work, at a high level, as explained in the video:
I feel decent but not perfect here, which I think is exactly the level of understanding to go for at this point: working knowledge, such that I feel pretty confident I could dive into implementation.
For notebooks, it seems like half of this notebook is used for Lesson 9 and the other half for Lesson 10. The parts for Lesson 9 I recreated in my own notebook here.
It was neat to have Stable Diffusion working in code, versus in Automatic1111 or something similar. For example, here's the CoLab diffusion result for "a 23 year old software engineer sitting at a desk in front of a compute rscreen"
As mentioned above, I was pretty interested in looking at the middle inference steps. Here's what it looks like at steps = [8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128] (very very cool: I want to look more at this in the future!)
There's also stuff in the notebook for image to image. Here's an example with the photo I use on the cover of my site. Interesting some of the artifacting in the black square zone, but generally quite cool!
I tried this again using different guidance scales, g = [1.1,3,7,14] which was quite interesting:
Perhaps my favorite part of the whole lesson was the Dreambooth part of hte notebook, where I used a process to fine tune a model on the concept <lucas-gelfond>
. You can see the version of the dreambook notebook I used here
I fed it some images, and it actually really does look like me!
I uploaded this to the HuggingFace "library of concepts" which I thought was pretty funny.
Anyways, I'm really, really loving this, like, this is some of the most interesting stuff I've ever worked on hands down. Excited for more updates as I get further!
Stuff I want to follow up on:
Anyways: I'm really loving this. More soon!