OpenAI Announces New AI Model, Code-Named Strawberry, That Solves Difficult Problems Step-by-Step

0
55


OpenAI made the last major breakthrough in artificial intelligence by increasing the size of its models to dizzying proportions, when it introduced GPT-4 last year. The company today announced a new development that marks a change in approach—a model that can logically “reason” many difficult problems and is smarter than existing AI without a large scale-up.

The new model, dubbed OpenAI o1, can solve problems that plague existing AI models, including the most powerful current OpenAI model, GPT-4o. Instead of calling out an answer in one step, as a large language model would normally do, it reasons through the problem, effectively thinking out loud as a human might, before arriving at the correct result.

“This is what we consider a new paradigm in these models,” Mira Murati, OpenAI’s chief technology officer, told WIRED. “It’s better at dealing with very complex reasoning tasks.”

The new model is named Strawberry within OpenAI, and is not a successor to GPT-4o but a complement to it, the company said.

Murati said OpenAI is currently working on its next master model, GPT-5, which will be larger than its predecessor. But while the company still believes that scale will help unlock new capabilities from AI, the GPT-5 will likely also include the reasoning technology introduced today. “There are two paradigms,” says Murati. “The scaling paradigm and this new paradigm. We hope to bring them together.”

LLMs typically derive their answers from large neural networks that feed massive amounts of training data. They can show impressive linguistic and logical abilities, but traditionally struggle with surprisingly simple problems such as basic math questions that involve reasoning.

Murati said OpenAI o1 uses reinforcement learning, which involves giving a model positive feedback when it gets it right and negative feedback when it doesn’t, to improve its reasoning process. “The model sharpens its thinking and refines the techniques it uses to get the answer,” he said. Reinforcement learning has enabled computers to play games with superhuman skill and perform useful tasks such as designing computer chips. Technique is also a key ingredient to making an LLM a useful and efficient chatbot.

Mark Chen, vice president of research at OpenAI, demonstrated the new model to WIRED, using it to solve some problems that its predecessor, GPT-4o, couldn’t do. It includes an advanced chemistry question and the following mind-bending mathematical puzzle: “A princess is the same age as a prince when the princess is twice as old as the prince when the princess’s age is half the sum of their current age. What is the age of the prince and princess?” (The correct answer is that the prince is 30, and the princess is 40).

“The (new) model is learning to think for itself, rather than trying to imitate the way people think,” as a conventional LLM would, Chen said.

OpenAI says its new model performs better on several problem sets, including those focused on coding, math, physics, biology, and chemistry. In the American Invitational Mathematics Examination (AIME), a test for math students, the GPT-4o solved an average of 12 percent of the problems while the o1 got 83 percent correct, according to the company.

LEAVE A REPLY

Please enter your comment!
Please enter your name here