How far can scaling laws in AI go? History tells us we can’t be so sure.

0
4


Sam Altman, CEO of OpenAI—perhaps the most prominent face of the artificial intelligence (AI) boom, which accelerated with the launch of ChatGPT in 2022—loves the laws of scaling.

These widely admired rules of thumb, linking the size of an AI model to its capabilities, are the basis of the AI ​​industry’s dizzying race to acquire powerful computer chips, build unimaginably large data centers, and reopen decommissioned nuclear power plants.

As Altman argued in a blog post earlier this year, the idea is that the “intelligence” of an AI model “is approximately equal to the logarithm of the resources used to train and run it,” meaning that consistently better performance can be achieved by exponentially increasing the scale of data and the computing power involved.

First observed in 2020 and refined in 2022, scaling laws for large language models (LLM) are based on drawing lines on graphs of experimental data. For engineers, they offer a simple formula that indicates the size of the next model and the expected performance increase. Will scaling laws continue to escalate as AI models get larger and larger? AI companies are betting hundreds of billions of dollars that it will, but history suggests it’s not always that simple.

Scaling laws aren’t just for AI

Scaling laws can be wonderful. Modern aerodynamics is based on them, for example.

Using an elegant mathematical example called Buckingham’s π theorem, engineers figured out how to compare small models in wind tunnels or test basins with full-scale airplanes and ships, making sure some key numbers matched.

These scaling ideas inform the design of almost everything that flies or floats, as well as industrial fans and pumps.

Another famous scaling idea underpinned the decades-long boom of the silicon chip revolution. Moore’s Law—the idea that the number of tiny switches called transistors on a microchip would double about every two years—helped designers create the powerful little computing technology we have today.

But there is a catch: not all “scaling laws” are laws of nature. Some are purely mathematical and can be maintained indefinitely. Others are simply data-fitted lines that work perfectly until you stray too far from the circumstances in which they were measured or designed.

We recommend: AI-induced psychosis: the danger of humans and machines hallucinating together

When scaling laws fail

History is full of painful reminders of scale laws gone wrong. A classic example is the collapse of the Tacoma Narrows Bridge in 1940.

The bridge was designed by extending what had worked for smaller bridges to a longer, thinner one. The engineers assumed that the same scaling arguments would be valid: if a certain relationship between stiffness and bridge length worked before, it should work again.

Instead, the moderate winds triggered an unexpected instability called aeroelastic vibration. The bridge deck tore, collapsing just four months after it opened.

Likewise, even the “laws” of microchip manufacturing had an expiration date. For decades, Moore’s Law (the number of transistors doubles every two years) and Dennard scaling (more smaller transistors running at higher speeds with the same power consumption) were surprisingly reliable guides for chip design and industry roadmaps.

However, as transistors became small enough to be measured in nanometers, these simple scaling rules began to collide with strict physical limits.

When the transistors’ gates were reduced to just a few atoms thick, they began to leak current and behave unpredictably. Operating voltages could no longer be reduced as they were lost in the background noise.

Over time, reduction was no longer the solution. Chips have become more powerful, but now through new designs rather than simply downscaling.

Laws of nature or general rules?

The language model scaling curves that Altman celebrates are real and, until now, have been extraordinarily useful.

They told the researchers that the models would continue to improve if given enough data and computing power. They also demonstrated that previous systems were not fundamentally limited, but simply had not been allocated enough resources.

But, without a doubt, these are curves that have been fitted to the data. They look less like the derived mathematical scaling laws used in aerodynamics and more like the useful rules of thumb used in microchip design, which means they probably won’t work forever.

Language model scaling rules don’t necessarily reflect real-world problems, such as limitations in the availability of high-quality data for training or the difficulty of getting AI to handle novel tasks, let alone security constraints or the economic difficulties of building data centers and power grids. There is no law of nature or theorem that guarantees that intelligence scales forever.

You may be interested: AI provides emotional support to employees, but is it a valuable tool or a threat to privacy?

Investing in the curves

So far, the AI ​​scaling curves seem pretty smooth, but the financial curves are another story.

Deutsche Bank recently warned of a “funding gap” in AI, based on Bain Capital estimates indicating an $800 billion mismatch between projected AI revenues and the investment in chips, data centers and energy needed to sustain current growth.

JP Morgan, for its part, has estimated that the broader AI industry could need around $650 billion in annual revenue just to earn a modest 10% return on planned AI infrastructure development.

We are still discovering what kind of law governs border LLMs. Reality may continue to influence current scaling rules, or new bottlenecks (data, energy, user willingness to pay) may tip the curve.

Altman is betting that LLM escalation laws will remain in place. If so, it might be worth developing huge amounts of computing power, since the gains are predictable. On the other hand, the banks’ growing unease is a reminder that some escalation stories can turn out like Tacoma Narrows: beautiful curves in one context, hiding a nasty surprise in the next.

*Nathan Garland is Professor of Applied Mathematics and Physics at Griffith University.

This text was originally published in The Conversation

Do you use Facebook more? Leave us a like to be informed


LEAVE A REPLY

Please enter your comment!
Please enter your name here