OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

However 1,000 tokens per second is definitely modest by Cerebras requirements. The corporate has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI’s personal open-weight gpt-oss-120B mannequin, suggesting that Codex-Spark’s comparatively decrease pace displays the overhead of a bigger or extra advanced mannequin.

AI coding brokers have had a breakout year, with instruments like OpenAI’s Codex and Anthropic’s Claude Code reaching a brand new stage of usefulness for quickly constructing prototypes, interfaces, and boilerplate code. OpenAI, Google, and Anthropic have all been racing to ship extra succesful coding brokers, and latency has develop into what separates the winners; a mannequin that codes quicker lets a developer iterate quicker.

With fierce competitors from Anthropic, OpenAI has been iterating on its Codex line at a speedy fee, releasing GPT-5.2 in December after CEO Sam Altman issued an inner “code purple” memo about aggressive strain from Google, then transport GPT-5.3-Codex simply days in the past.

Diversifying away from Nvidia

Spark’s deeper {hardware} story could also be extra consequential than its benchmark scores. The mannequin runs on Cerebras’ Wafer Scale Engine 3, a chip the scale of a dinner plate that Cerebras has built its enterprise round since a minimum of 2022. OpenAI and Cerebras announced their partnership in January, and Codex-Spark is the primary product to return out of it.

OpenAI has spent the previous 12 months systematically decreasing its dependence on Nvidia. The corporate signed a large multi-year take care of AMD in October 2025, struck a $38 billion cloud computing settlement with Amazon in November, and has been designing its personal customized AI chip for eventual fabrication by TSMC.

In the meantime, a deliberate $100 billion infrastructure take care of Nvidia has fizzled to this point, although Nvidia has since dedicated to a $20 billion funding. Reuters reported that OpenAI grew unhappy with the pace of some Nvidia chips for inference duties, which is precisely the sort of workload that OpenAI designed Codex-Spark for.

No matter which chip is underneath the hood, pace issues, although it might come at the price of accuracy. For builders who spend their days inside a code editor ready for AI recommendations, 1,000 tokens per second might really feel much less like rigorously piloting a jigsaw and extra like working a rip noticed. Simply watch what you’re reducing.

Source link

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

June deadline approaches for Hawthorne sale process

New York sports betting statements bill advances

Why geolocation is challenging for prediction markets

Indian IT companies have spent $7.1B on acquisitions since the start of 2025 to gain clients, as AI-led pricing pressure weakens organic growth (Shristi Achar/The Economic Times)

People Incorporated launches $18B bid for MGM Resorts

Illinois prediction markets face new transaction tax

American Rheinmetall and Harbinger Partner on Autonomous Hybrid Military Trucks

Startup Muster is back in 2026 thanks to widespread support to save it

Pura Promo Codes: $20 Off May 2026

June deadline approaches for Hawthorne sale process

Featured Picks

Beyond KYC: AI-Powered Insurance Onboarding Acceleration

The Dumbest Hack of the Year Exposed a Very Real Problem

‘MobLand’ Release Schedule: When to Watch Episode 4 of the Tom Hardy Series

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

Diversifying away from Nvidia

Related Posts