On the final day of OpenAI’s 12 days of ‘shipmas,’ the corporate unveiled its newest fashions, o3 and o3-mini, which excel at reasoning and even outperform o1 on a collection of benchmarks, together with math and science. At launch, OpenAI CEO Sam Altman stated o3 was slated to drop on the finish of January, and right this moment, the corporate made good on its promise.
o3-mini
On Friday, OpenAI released its o3-mini mannequin, essentially the most cost-efficient mannequin in OpenAI’s reasoning collection, to the general public. Till now, that collection has been comprised of o1 and o1-mini. Like its predecessor, the mannequin is especially sturdy in science, math, and coding, in keeping with the corporate.
When o3-mini is chosen, it would use medium reasoning effort, which balances velocity and accuracy. Whereas the unique o1 mannequin nonetheless has broader normal information than o3-mini, the brand new mannequin’s main benefit is its quicker velocity and better efficiency in comparison with o1-mini.
Benchmark efficiency
When evaluating the efficiency of o3-mini to o1-mini, knowledgeable testers discovered that o3-mini delivered extra correct, reasoned-through, and clearer responses than o1-mini. In line with the publish, they most well-liked o3-mini responses 56% of the time and noticed a 39% discount in main errors.
Past human desire evaluations, in a number of STEM benchmarks, together with the Competitors Math (AIME 2024), PhD-level Science Questions (GPQA Diamond), and Competitors Code (Codeforces), o3-mini with medium reasoning — which is what ChatGPT customers will get by default — outperformed o1-mini.
Additionally notable is that o3-mini, with excessive reasoning effort within the benchmarks, got here near o1 efficiency, generally even surpassing it, as seen within the AIME 2024 above and Software program Engineering (SWE-bench Verified) benchmarks. The o3-mini mannequin with medium reasoning effort matched o1’s efficiency within the Codeforces benchmark.
Security
OpenAI assessed o3-mini’s security by public launch by jailbreak and disallowed content material evaluations. The corporate discovered that the mannequin considerably surpasses GPT-4o on the evaluations. OpenAI posted the analysis outcomes beneath and likewise launched an o3-mini System Card, a 37-page PDF that features the detailed outcomes of the evaluations.
The way to entry
All subscribers to OpenAI’s paid tiers, together with ChatGPT Plus, Team, and Pro, can entry OpenAI o3-mini beginning right this moment. Plus and Group customers now have 3 times the speed restrict, going from 50 messages per day with o1-mini to 150 messages per day. ChatGPT Enterprise entry is coming in every week.
Additionally: Copilot’s powerful new ‘Think Deeper’ feature is free for all users – how it works
The o3-mini mannequin will substitute o1-mini within the mannequin picker, as it will be helpful for a similar duties, besides that have will now be improved with decrease latency and better fee limits. As a paid person, on the time of writing, I didn’t but have entry to the o3-mini, and am as an alternative nonetheless seeing the o1-mini possibility.
If you do not have a subscription, no worries: You may see if o3-mini is well worth the hype out of your free account. All free ChatGPT customers need to do is click on on “Purpose” within the message textbox or regenerate a response. OpenAI CEO Sam Altman confirmed free entry in a post on X. Till now, all of the reasoning fashions have been saved behind a paywall; OpenAI didn’t specify any limitations across the new mannequin for Free customers.