Home Financial News OpenAI will show how models do on hallucination tests

Financial News

OpenAI will show how models do on hallucination tests

By

Internationalfinancialnews.com

-

14 May 2025

55

Sam Altman Co-founder and CEO of OpenAI speaks during the Italian Tech Week 2024 at OGR Officine Grandi Riparazioni on September 25, 2024 in Turin, Italy.

Stefano Guidi | Getty Images News | Getty Images

OpenAI on Wednesday announced a new “safety evaluations hub,” a webpage where it will publicly display artificial intelligence models’ safety results and how they perform on tests for hallucinations, jailbreaks and harmful content, such as “hateful content or illicit advice.”

OpenAI said it used the safety evaluations “internally as one part of our decision making about model safety and deployment,” and that while system cards release safety test results when a model is launched, OpenAI will from now on “share metrics on an ongoing basis.”

“We will update the hub periodically as part of our ongoing company-wide effort to communicate more proactively about safety,” OpenAI wrote on the webpage, adding that the safety evaluations hub does not reflect the full safety efforts and metrics and instead shows a “snapshot.”

The news comes after CNBC reported earlier Wednesday that tech companies that are leading the way in artificial intelligence are prioritizing products over research, according to industry experts who are sounding the alarm about safety.

CNBC reached out to OpenAI and other AI labs mentioned in the story well before it was published.

Read more CNBC news on OpenAI

OpenAI recently sparked some online controversy for not running certain safety evaluations on the final version of its o1 AI model.

In a recent interview with CNBC, Johannes Heidecke, OpenAI’s head of safety systems, said the company ran its preparedness evaluations on near-final versions of the o1 model, and that minor variations to the model that took place after those tests wouldn’t have contributed to significant jumps in its intelligence or reasoning and thus wouldn’t require additional evaluations.

Still, Heidecke acknowledged in the interview that OpenAI missed an opportunity to more clearly explain the difference.

Meta, which was also mentioned in CNBC’s reporting on AI safety and research, also made an announcement Wednesday.

The company’s Fundamental AI Research team released new joint research with the Rothschild Foundation Hospital and an open dataset for advancing molecular discovery.

“By making our research widely available, we aim to provide easy access for the AI community and help foster an open ecosystem that accelerates progress, drives innovation, and benefits society as a whole, including our national research labs,” Meta wrote in a blog post announcing the research advancements.

LEAVE A REPLY Cancel reply

Bitcoin(BTC)$115,467.004.80%
Ethereum(ETH)$4,182.3712.51%
BNB(BNB)$1,306.6017.36%
Tether(USDT)$1.00-0.02%
XRP(XRP)$2.549.31%
Solana(SOL)$197.8713.56%
USDC(USDC)$1.000.01%
Lido Staked Ether(STETH)$4,174.7512.31%
Dogecoin(DOGE)$0.20805114.29%
TRON(TRX)$0.3240523.95%
Cardano(ADA)$0.7013.19%
Wrapped stETH(WSTETH)$5,079.7512.41%
Wrapped Bitcoin(WBTC)$115,427.005.06%
Wrapped Beacon ETH(WBETH)$4,490.8112.65%
Chainlink(LINK)$19.1213.78%
Figure Heloc(FIGR_HELOC)$1.000.00%
Ethena USDe(USDE)$1.000.05%
Wrapped eETH(WEETH)$4,513.6412.42%
Stellar(XLM)$0.3413308.26%
Bitcoin Cash(BCH)$545.559.43%
Hyperliquid(HYPE)$39.937.59%
Sui(SUI)$2.8014.37%
WETH(WETH)$4,178.3212.20%
Avalanche(AVAX)$22.266.62%
LEO Token(LEO)$9.670.32%
Binance Bridged USDT (BNB Smart Chain)(BSC-USD)$1.00-0.08%
USDS(USDS)$1.000.09%
Coinbase Wrapped BTC(CBBTC)$115,431.004.90%
Hedera(HBAR)$0.18429511.40%
Litecoin(LTC)$99.167.45%
USDT0(USDT0)$1.000.00%
Mantle(MNT)$2.1737.83%
Shiba Inu(SHIB)$0.0000118.94%
WhiteBIT Coin(WBT)$43.535.03%
Cronos(CRO)$0.17351414.75%
Ethena Staked USDe(SUSDE)$1.200.18%
Toncoin(TON)$2.279.20%
Monero(XMR)$307.493.68%
Polkadot(DOT)$3.2410.57%
Dai(DAI)$1.000.04%
Zcash(ZEC)$272.771.82%
Uniswap(UNI)$6.5813.04%
OKB(OKB)$187.299.09%
World Liberty Financial(WLFI)$0.14037319.02%
Aave(AAVE)$250.6610.43%
Bittensor(TAO)$395.6937.56%
Bitget Token(BGB)$4.988.46%
MemeCore(M)$2.04-6.05%
Pepe(PEPE)$0.00000713.80%
NEAR Protocol(NEAR)$2.4710.09%
Ethena(ENA)$0.40935517.44%
Jito Staked SOL(JITOSOL)$244.8013.96%
BlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
Aptos(APT)$3.9110.61%
USD1(USD1)$1.00-0.03%
sUSDS(SUSDS)$1.07-0.08%
Ethereum Classic(ETC)$16.8413.79%
Currency One USD(C1USD)$1.00-0.08%
Ondo(ONDO)$0.8114.65%
Binance-Peg WETH(WETH)$4,189.3012.72%
PayPal USD(PYUSD)$1.000.00%
Aster(ASTER)$1.4827.07%
Worldcoin(WLD)$0.9910.63%
Jupiter Perpetuals Liquidity Provider Token(JLP)$5.577.04%
POL (ex-MATIC)(POL)$0.1998739.47%
Binance Staked SOL(BNSOL)$211.6614.10%
Gate(GT)$17.137.39%
HTX DAO(HTX)$0.0000021.28%
KuCoin(KCS)$14.592.80%
Internet Computer(ICP)$3.5013.58%
Rocket Pool ETH(RETH)$4,781.8412.46%
Story(IP)$5.8111.44%
Provenance Blockchain(HASH)$0.0367751.57%
USDtb(USDTB)$1.000.06%
Arbitrum(ARB)$0.33322411.97%
Algorand(ALGO)$0.20179812.38%
ChainOpera AI(COAI)$8.918.23%
Pi Network(PI)$0.2105236.33%
Wrapped BNB(WBNB)$1,309.3817.76%
BFUSD(BFUSD)$1.00-0.05%
Kelp DAO Restaked ETH(RSETH)$4,408.4812.29%
Kaspa(KAS)$0.06206112.35%
Cosmos Hub(ATOM)$3.4914.24%
VeChain(VET)$0.01914415.28%
Pudgy Penguins(PENGU)$0.02541019.52%
StakeWise Staked ETH(OSETH)$4,416.5112.50%
Kinetiq Staked HYPE(KHYPE)$39.957.52%
Flare(FLR)$0.0200702.96%
Tether Gold(XAUT)$4,060.281.30%
Liquid Staked ETH(LSETH)$4,525.3712.84%
Falcon USD(USDF)$1.000.40%
Pump.fun(PUMP)$0.00420313.91%
Sky(SKY)$0.0629588.57%
Render(RENDER)$2.7418.20%
Sei(SEI)$0.22929814.72%
Lombard Staked BTC(LBTC)$115,397.004.86%
Renzo Restaked ETH(EZETH)$4,433.3212.65%
Quant(QNT)$90.285.32%
Official Trump(TRUMP)$6.257.93%
PAX Gold(PAXG)$4,050.981.15%