Applying the concept of scientific validity could evaluate the true scope of technological innovations such as AI

0
5


Technological innovations can seem relentless. In computer science, some claim that “a year in machine learning is equivalent to a century in any other field.” But how do you know if these advances are pure advertising or reality?

Failures multiply quickly when there is an avalanche of new technologies, especially when these developments are not adequately tested or fully understood. Even technological innovations from trusted laboratories and organizations sometimes result in dismal failures.

Consider IBM Watson, an AI program that the company hailed as a revolutionary tool for cancer treatment in 2011. However, instead of evaluating the tool based on patient outcomes, IBM used less relevant, possibly even irrelevant, measures, such as expert ratings instead of patient outcomes. As a result, IBM Watson not only failed to provide doctors with reliable and innovative treatment recommendations, but also suggested harmful ones.

When ChatGPT launched in November 2022, interest in AI expanded rapidly across industry and science, along with increasingly exaggerated claims about its effectiveness. But as the vast majority of companies see their attempts to incorporate generative AI fail, questions are being raised about whether the technology lives up to what its developers promised.

In a world of rapid technological change, a pressing question arises: How can we determine whether a new technological marvel really works and is safe?

Borrowing scientific language, this question focuses on validity, that is, the soundness, reliability and consistency of a statement. Validity is the final verdict on whether a scientific claim accurately reflects reality. We can think of it as quality control for science: it helps researchers know if a drug really cures a disease, if a health-tracking app really improves fitness, or if a model of a black hole accurately describes its behavior in space.

How to assess the validity of new technologies and innovations has been unclear, in part because science has focused primarily on validating claims about the natural world.

In our work as researchers studying how to evaluate science in various disciplines, we develop a framework for evaluating the validity of any design, whether a new technology or a policy. We believe that establishing clear and consistent standards of validity and learning how to evaluate it can empower people to make informed decisions about technology and determine whether a new technology will truly deliver on its promise.

You may be interested: Mexico will host a forum for cooperation in science, technology and innovation

Validity is the basis of knowledge

Historically, validity focused primarily on ensuring the precision of scientific measurements, such as the accuracy of measuring temperature by a thermometer or the accuracy of assessing anxiety by a psychological test. Over time, it became clear that there are various types of validity.

Different scientific fields have their own ways of assessing validity. Engineers test new designs against safety and performance standards. Medical researchers use controlled experiments to verify that treatments are more effective than existing options.

Researchers in different disciplines use different types of validity, depending on the type of claim they are making.

Internal validity asks whether the relationship between two variables is truly causal. A medical researcher, for example, might conduct a randomized controlled trial to make sure that a new drug was responsible for patients’ recovery and not some other factor, such as the placebo effect.

External validity refers to generalizability: whether the results would hold up outside the laboratory or in a larger or different population. An example of low external validity is that many initial studies that work in mice do not always translate to humans.

Construct validity, on the other hand, focuses on meaning. Psychologists and social scientists use it to determine whether a test or survey truly reflects the idea it is intended to measure. Does a perseverance scale reflect true perseverance or just stubbornness?

Finally, ecological validity asks whether something works in the real world, and not just under ideal laboratory conditions. A behavioral model or AI system can work great in a simulation, but fail when human behavior, noisy data, or institutional complexity come into play.

In all of these types of validity, the goal is the same: to ensure that scientific tools—from laboratory experiments to algorithms—faithfully correspond to the reality they seek to explain.

We recommend: AI boom drives trillion-dollar tech valuations and record bets on chipmakers

Evaluation of technological claims

We developed a method to help researchers from various disciplines clearly evaluate the reliability and effectiveness of their inventions and theories. The design science validity framework identifies three critical types of claims that researchers typically make about the usefulness of a technology, innovation, theory, model, or method.

First, a criterion claim holds that a discovery offers beneficial results, usually exceeding current standards. These statements justify the usefulness of the technology by showing clear advantages over existing alternatives.

For example, developers of generative AI models like ChatGPT may see greater tech engagement the more it flatters and agrees with the user. As a result, they can program the technology to be more accommodating—a characteristic called fawning—in order to increase user retention. AI models meet the criterion claim that users find them more flattering than talking to people. However, this does little to improve the effectiveness of technology in tasks such as helping to solve mental health problems or relationship problems.

Second, a causal claim addresses how specific components or characteristics of a technology directly contribute to its success or failure. In other words, it is a statement that demonstrates that researchers know what makes a technology effective and exactly why it works.

When analyzing AI models and excessive flattery, the researchers found that interacting with more flattering models reduced users’ willingness to resolve interpersonal conflicts and increased their belief that they were right. The causal claim here is that the flattery feature of AI reduces the user’s desire to resolve conflicts.

Third, a contextual statement specifies where and under what conditions a technology is expected to work effectively. These claims explore whether the benefits of a technology or system generalize beyond the laboratory and can reach other populations and settings.

In the same study, researchers examined how excessive flattery affected users’ actions in other data sets, including the “Am I the Asshole?” on Reddit. They found that AI models were more affirming of users’ decisions than people, even when the user described manipulative or harmful behavior. This supports the contextual claim that the sycophantic behavior of an AI model applies to different conversational contexts and populations.

Validity as a consumer

Understanding the validity of consumer scientific and technological innovations is essential for both scientists and the general public. For scientists, it is a guide to ensure that their inventions are rigorously evaluated. And for the public, it means knowing that the tools and systems they rely on—like health apps, medications, and financial platforms—are truly safe, effective, and beneficial.

Here’s how you can use validity to understand the scientific and technological innovations around you.

Since it is difficult to compare all the features of two technologies with each other, focus on the features you value most about a technology or model. For example, do you prefer a chatbot to be accurate or better protect privacy? Examine the claims made about it in this regard and see if it is as good as claimed.

Consider not only the types of claims that are made about a technology, but also those that are not made. For example, does a chatbot company address bias in its model? It is the key to knowing whether this is baseless and potentially dangerous hype or genuine progress.

By understanding validity, organizations and consumers can move beyond the hype and get to the truth behind the latest technologies.

*Kai R. Larsen is a professor of Information Systems at the University of Colorado Boulder; Roman Lukyanenko is an Associate Professor of Commerce at the University of Virginia and Thomas H. Davenport He is a professor of Information Technology and Management at Babson College

This text was originally published in The Conversation

Do you like to get informed through Google News? Follow our Showcase to have the best stories


LEAVE A REPLY

Please enter your comment!
Please enter your name here