5 things to know about foundation models and the next generation of AI

Shafia1030 · Post by **Shafia1030** » Thu Sep 25, 2025 12:01 pm

If you’ve seen photos of a teapot shaped like an avocado or read a well-written article that veers off on slightly weird tangents, you may have been exposed to a new trend in artificial intelligence (AI).

Machine learning systems called DALL-E, GPT and PaLM are making a splash with their incredible ability to generate creative work.

DALL·E 2 is here! It can generate images mongolia cell phone database from text, like "teddy bears working on new AI research on the moon in the 1980s".

It's so fun, and sometimes beautiful..twitter.com/3zOu30IqCZ

— Sam Altman (@sama) April 6, 2022

These systems are known as “foundation models” and are not all hype and party tricks. So how does this new approach to AI work? And will it be the end of human creativity and the start of a deep-fake nightmare?

1. What are foundation models?
Foundation models work by training a single huge system on large amounts of general data, then adapting the system to new problems. Earlier models tended to start from scratch for each new problem.

DALL-E 2, for example, was trained to match pictures (such as a photo of a pet cat) with the caption (“Mr. Fuzzyboots the tabby cat is relaxing in the sun”) by scanning hundreds of millions of examples. Once trained, this model knows what cats (and other things) look like in pictures.

But the model can also be used for many other interesting AI tasks, such as generating new images from a caption alone (“Show me a koala dunking a basketball”) or editing images based on written instructions (“Make it look like this monkey is paying taxes”).

Our newest system DALL·E 2 can create realistic images and art from a description in natural language.
— OpenAI (@OpenAI) April 6, 2022

Foundation models run on “deep neural networks”, which are loosely inspired by how the brain works. These involve sophisticated mathematics and a huge amount of computing power, but they boil down to a very sophisticated type of pattern matching.

For example, by looking at millions of example images, a deep neural network can associate the word “cat” with patterns of pixels that often appear in images of cats – like soft, fuzzy, hairy blobs of texture. The more examples the model sees (the more data it is shown), and the bigger the model (the more “layers” or “depth” it has), the more complex these patterns and correlations can be.

Foundation models are, in one sense

just an extension of the “deep learning” paradigm that has dominated AI research for the past decade. However, they exhibit un-programmed or “emergent” behaviours that can be both surprising and novel.

For example, Google’s PaLM language model seems to be able to produce explanations for complicated metaphors and jokes. This goes beyond simply imitating the types of data it was originally trained to process.