opinions, artificial intelligence, computing,

Stable Diffusion and the Democratization of AI Art

Ara Ara Follow Sep 09, 2022 · 7 mins read
Stable Diffusion and the Democratization of AI Art
Share this

It was a lovely early August night back in 2015 when I took my first Computer Information Systems class in university and heard for the first time about the imminent threat of AI-assisted software development which could compromise our future careers as programmers. It seemed far-fetched then and up until not long ago, when advanced Artificial Intelligence went from something that you would often read about in books and hear of in academic studies but never actually get to interact with to the multitude of intelligent chatbots, essays, and scripts written with the help of machine learning, and, more recently, full-blown text to image generation. I remember like it was yesterday: a set of 4, maybe 6, images of Seinfeld’s George Costanza with an AK-47 rifle in a Twitter post with a link to “DALL-E 2”, which later on became Craiyon because it was not the actual DALL-E 2. Suffices it to say, love at first sight. Or love at the first output, should I say?

“DALL-E 2” was my gateway to AI art and kept me distracted for a good while. The thought of generating anything I could think of within a few seconds (while not fighting the error messages because the machines were always busy) was quite mind-blowing. It’s something I was aware of before but back then it was not open to everyone and it seemed as if it would be like that for a while. It didn’t take anywhere as long as I imagined, much to my surprise. Granted, it was far from perfect and miles away from the actual DALL-E 2 results. Faces were blurry, the outputs were small and you couldn’t use it for anything else than generating funny memes to post on social media. It simply was not powerful enough. It wasn’t long until self-hosted versions were made available, allowing you to create your own “render farm” of sorts as long as you had plenty of VRAM to back you up and free time. Unfortunately enough, the “homemade” alternative wasn’t any better when it came down to the overall output quality. It was still just a funny meme generator.

The real deal, DALL-E 2, was the ultimate and untouchable cream of the crop until not long ago: realistic results, high-resolution output, good for basically anything you could think of except for real people, for legal reasons, or whatever. Pretty limited for memes because of that one technicality, but amazing for anything else you could ever want to generate. But like almost everything in life, DALL-E 2 was just too good to be true: invite-only, artists-only, huge waitlist, limited prompts, expensive if you wanted to go berserk with the generations (if you are experienced enough with AI generation, you know that sometimes it takes more than one try to get that perfect output). I didn’t even manage to get an invite. Signed up in advance, waited and waited, friends that signed up after got theirs, and I was still waiting. To this day, I am still waiting for an invite. Knowing full well what it was capable of, yet unable to set my imagination to roam free, there was nothing I could do but wait for an invite or an alternative.

I then heard of Midjourney, which could be the alternative I was waiting for, except it wasn’t. The fact it worked within Discord and had a limited trial before you had to buy a plan made me steer away from it for several months, only touching it very recently with the sole purpose of comparing the outputs. I still don’t fully dig the fact it works on Discord and people can see your inputs. What if you want to generate something questionable that you don’t want people to know? Or maybe a magic prompt that yields fantastic results and you are not willing to share it publicly? Granted, art is all about sharing and it’s part of the spirit, but still.

Fast forward to September, I got linked to a GitHub page of an easy-to-install self-hosted instance of the brand new Stable Diffusion AI, which was supposedly quite good according to this one friend that sent me the URL. Frustratingly enough, I didn’t get it to work at first because Anaconda does not like to reside outside of your C drive, which I couldn’t use because there was no free space. Had to resort to the many web versions of Stable Diffusion, where I found myself getting good results rather consistently, and decided to finally do something about my local installation: spent a good couple of hours deleting temp files and went from less than 4 GB free to 25. More than enough for all the python libraries and stuff that were required by the software.

Bliss. At long last, I was finally generating art and memes directly from my computer, at an acceptable resolution of 512x512 and an average of maybe 30 seconds per output. Not too shabby, eh? I was still using the same style of simple prompts that I would originally use on Crayion back in the day, so the first results, although much better than Craiyon, were still not quite spectacular nor mind-blowing, which was a little concerning given how hard it was forcing my RTX 2070. The outputs were nothing like the stuff on Reddit and it made me wonder time and again whether I either ran out of creativity or missed out on something.

As an AI art newbie, I was missing out on something massive: prompts matter. I should have figured it out, but the more you describe it, the better it gets, as it tends to be more literal as long as you hold its hand and guide it to the result you want rather than giving a broader prompt and allowing the model to have a lot of “creative freedom”, filling in the many gaps that were left by the lack of descriptivity. Once I figured out how to conjure better prompts, the results went from “ok that kind of looks like it” to “wow!”. Granted, I generate a good 4 to 6 images of the same prompt, and maybe 2 or 3 of the total come out great, while the rest is acceptable at best. This isn’t particularly a problem but illustrates the fact that getting the perfect output requires descriptive prompts and patience. Generally speaking. There are times when simpler prompts do the trick just as decently. It is quite literal with certain stuff.

It has been an amazing journey so far. From the early days of “DALL-E 2” to Stable Diffusion, I learned a lot and my output folder keeps growing bigger and bigger with lots of art and memes. It still surprises me that it didn’t take long from the low-resolution meme days to what it is now, and what it can be if I had more time to invest and patience: there is a modified version of Stable Diffusion that can generate up to 1024x1024 outputs with 8 GB of VRAM or less, which is impressive given I currently can’t generate anything higher than 512x512 without running out of video memory. It won’t be long until there is a way to go as high as 1920x1080 with the same 8 GB of VRAM, and then I’ll be able to generate my wallpapers!

The fact Stable Diffusion is open-source, very much unlike everything else, means it won’t be long until there is a version for basically every purpose and specific hardware demands. There are quite a few low VRAM variants already, and even a CPU-bound modification that allows you to run SD without a dedicated graphics card. Granted, with a significantly slower generation time and resolution, but still. The sole fact you can do that is mind-blowing, even more so when you consider that AI image generation was a far-fetched dream even to academic folk, let alone for consumers.

Join Newsletter
Get the latest news right in your inbox. We never spam!
Ara
Written by Ara Follow
A 25 year old Social Communicator that loves writing about games (mainly simulators), somewhat into music and IT, even more so if it’s hypervisor stuff or old x86 emulators, which explains the randomness of this blog. I also have a YouTube channel which is very much like this blog when it comes to how random it is: from your average game benchmark to tutorials on how to install UNIX System V