Machine learning systems called DALL-E, GPT and PaLM are making a splash with their incredible ability to generate creative work.
DALL·E 2 is here! It can generate images mongolia cell phone database from text, like "teddy bears working on new AI research on the moon in the 1980s".
It's so fun, and sometimes beautiful..twitter.com/3zOu30IqCZ
— Sam Altman (@sama) April 6, 2022
These systems are known as “foundation models” and are not all hype and party tricks. So how does this new approach to AI work? And will it be the end of human creativity and the start of a deep-fake nightmare?
1. What are foundation models?
Foundation models work by training a single huge system on large amounts of general data, then adapting the system to new problems. Earlier models tended to start from scratch for each new problem.
DALL-E 2, for example, was trained to match pictures (such as a photo of a pet cat) with the caption (“Mr. Fuzzyboots the tabby cat is relaxing in the sun”) by scanning hundreds of millions of examples. Once trained, this model knows what cats (and other things) look like in pictures.
But the model can also be used for many other interesting AI tasks, such as generating new images from a caption alone (“Show me a koala dunking a basketball”) or editing images based on written instructions (“Make it look like this monkey is paying taxes”).
Our newest system DALL·E 2 can create realistic images and art from a description in natural language.
— OpenAI (@OpenAI) April 6, 2022
Foundation models run on “deep neural networks”, which are loosely inspired by how the brain works. These involve sophisticated mathematics and a huge amount of computing power, but they boil down to a very sophisticated type of pattern matching.

For example, by looking at millions of example images, a deep neural network can associate the word “cat” with patterns of pixels that often appear in images of cats – like soft, fuzzy, hairy blobs of texture. The more examples the model sees (the more data it is shown), and the bigger the model (the more “layers” or “depth” it has), the more complex these patterns and correlations can be.
Foundation models are, in one sense
just an extension of the “deep learning” paradigm that has dominated AI research for the past decade. However, they exhibit un-programmed or “emergent” behaviours that can be both surprising and novel.
For example, Google’s PaLM language model seems to be able to produce explanations for complicated metaphors and jokes. This goes beyond simply imitating the types of data it was originally trained to process.