Blog maker

How the author of DALL-E Mini created the ultimate meme creator – and a new era for AI

In the future, we may remember 2022 as the year when AI generated art matured.

Thanks in part to DALL-E Mini (now known as Craiyon), programmers and ordinary Joes alike were suddenly able to turn a brief text prompt into a detailed image from scratch. Social media buzzed on collages of nine panels of players toilet, Yoda robs liquor storeand mushroom clouds in the style of Monet.

This trendy tool was created by a machine learning engineer named Boris Dayma in July 2021 for a competition organized by Google and hugging face, a startup that hosts open-source machine learning tools on its website. Suddenly, DALL-E Mini has become the internet’s favorite toy, largely due to its ease of access.

The concept was inspired by an artistic creation model called DALL-E 1, which was unveiled in 2021 by a machine learning research organization called OpenAI. While OpenAI kept DALL-E 1 a secret, Dayma’s DALL-E Mini was open to anyone with an internet connection.

Thanks to Boris Dayma, we now have this image of Yoda robbing a liquor store.Crayon

OpenAI was founded in 2015 with an idealistic name and a to promise to offer its work free of charge to AI researchers. The organization has since reneged on that promise, turning to profit-making and inking a $1 billion partnership with Microsoft. This year it released its more powerful and higher budget DALL-E 2. It costs money to use, unlike Dayma’s Craiyon – in fact, it changed its name to avoid confusion with OpenAI’s models.

Just as OpenAI did with its controversial language model, GPT-3, the company plans to Licence DALL-E 2 for use by corporate customers.

But the future of AI art doesn’t necessarily look like walled gardens with quotas and entry fees. Shortly after the birth of DALL-E 2, a fledgling startup named Stability AI released an open source model called Steady broadcast, which is free to use. Anyone can download and run Stable Diffusion themselves; the only (admittedly steep) barrier was a sufficiently powerful computer. With Craiyon, internet users now have a few free options to make the bizarre images of their dreams come true.

To dive into the origins of this AI making memes frenzy, we spoke with Boris Dayma, the machine learning engineer who helmed DALL-E Mini.

This interview has been edited and condensed for clarity.

Where did you get the idea?

Early last year, OpenAI released a blog about DALL-E 1, which was this cool AI model that could draw images from any text prompt. There had already been other projects around that. But it was the first one that looked impressive.

The only problem was that the code was not published. Nobody could play with it.

So a bunch of people decided they wanted to try to replicate it, and that got me really interested, and I was like, ‘I want to try too. It is one of the coolest AI apps. I want to learn how it works and I want to try to do it myself. So when I saw this, I immediately tweeted, “OK, I’m going to build this.”

I did nothing for six months.

What finally changed?

In July last year, HuggingFace and Google… held a community event, like a competition to develop AI models.

You could choose the subject you wanted, and in exchange, you would have access to their computers, which are much better than what people usually have at home. And you would have access to support from HuggingFace engineers and Google engineers. I thought it was a great opportunity to learn and play with.

I proposed the project: DALL-E Mini. Let’s try to replicate DALL-E — or, not necessarily replicate, but try to achieve the same results, even if we build it a little differently. Let’s see how it works, learn and experience it.

How was this first version?

It wasn’t what we have now. Now the [current] model is much, much more powerful. But it was already impressive.

When it started, after a day or two, you put “view of the beach at night”, and you had something rather dark. “Beach view during the day” – you would have something clear. You couldn’t necessarily recognize the beach yet, but we were like, “Oh, my God, this is actually learning something.”

A majestic “snowy mountain” rendered by Craiyon.Crayon

At the end of similar days of training, the model was actually able to do landscapes quite well, which was very impressive. We put “snowy mountain” and it worked. It was really exciting. Yeah, actually, we were even surprised that it worked!

But, you know, we did a lot of things real fast [during the competition]and there was still so much to optimize.

It wasn’t until several months later that it became popular. What do you think caused its popularity to explode?

I was surprised how popular it became. But I think it’s because when we made the model public, some people realized it could do things that were, like, funny pictures and memes and things like that. They realized that some famous personalities were actually recognizable, even if they weren’t necessarily perfectly drawn. You can recognize them and put them in funny situations, and the model is able to do that.

He reached a point where he was suddenly able to compose more complex prompts and also able to recognize more people. I think it went viral.

DALL-E Mini fans realized they could depict crudely drawn celebrities doing just about anything.Crayon

What did you think of the funny images?

It was something I didn’t expect. Throughout the process when I developed the model, my testing prompts were very basic. My most creative prompt was “the Eiffel Tower on the moon.” Maybe I wouldn’t have noticed that he could do such creative things without the use of a larger audience, I would say.

People have made the model in all sorts of situations. … Sometimes I’m surprised at what it can draw. Recently, people have for example used “octopus assembly Ikea furniture.” Or, like, “a store robbed by teddy bears, seen from a CCTV camera.” It’s crazy that it works at all.

Craiyon’s interpretation of an “octopus assembling Ikea furniture”.Crayon

Does Craiyon still have a place in a world with DALL-E 2?

I think there are a lot of advantages.

One of the first is that it democratizes, in a way, access to AI technology. The picture maker app I think is a really cool app, whether you do it for work, because it’s useful for you, or even just for entertainment. Making people have fun, making funny memes – I think that has great value.

Give access to everyone rather than to those who can afford it [DALL-E 2] or the restricted group of users who have access to it, I think it allows people to benefit from the same technology as well. Free is also very important to us.

Also, one of the problems you run into when few people can access a large model is that there’s a higher risk for wrong wronget cetera, because only a few people are able to create and control it.

It was HORIZONS, a newsletter that explores today’s innovations shaping tomorrow’s world. Subscribe for free.