Large language models (LLMs) are incredibly good at stating things confidently, even if they aren't always correct. The OpenAI o1 model family from OpenAI is the latest attempt to fix that—by getting the AI to slow down and work through complex problems, rather than just running with the first idea it has.
It's a really interesting approach, and if o1 lives up to OpenAI's claims, it could be the start of another leap in just how useful LLMs can be.
What is OpenAI o1?
OpenAI o1 is a new series of models from OpenAI. While they're similar to other OpenAI models, like GPT-4o, in many respects—and still use the major underlying technologies like transformers and a neural network—the o1 models are significantly better at working through complex tasks and harder problems that require logical reasoning.
That's why Open AI is "resetting the counter back to 1" rather than releasing them as GPT-5. (And yes, the weird letter casing and hyphenation of all this drives me mad, too.)
Right now, there are three o1 models:
OpenAI o1: The most capable o1 model, it's currently unavailable, though OpenAI has released information about its performance.
OpenAI o1-preview: A preview version of the full o1 model, though it's not as powerful as the full model. It's available to ChatGPT Plus subscribers and through the OpenAI API.
OpenAI o1-mini: A version of o1 optimized for speed.
The o1 models aren't meant as a replacement for GPT-4o and GPT-4o mini: they offer a different price-to-performance tradeoff that makes sense for more advanced tasks. Let's dig into what that looks like.
How does OpenAI o1 work?
According to OpenAI, the o1 models were trained to "think" through problems before responding. In effect, this means they integrate a prompt engineering technique called Chain of Thought reasoning (CoT) directly into the model.
When you give an o1 model a complex prompt, rather than immediately trying to generate a response, it breaks down what you've asked it to do into multiple simpler steps. It then works through this chain of thought step by step before creating its output.
In the introduction post on OpenAI's blog, you can see a few examples of how the o1-preview model uses CoT reasoning to analyze complex problems like decoding a cipher text, solving a crossword, and correctly answering math, chemistry, and English questions. They're worth looking through—they'll give you a much better idea of how the o1 models work.
Unfortunately, OpenAI has decided not to show these chains of thought to users. Instead, you get an AI-generated summary of the key points. It's still useful for understanding how the model is tackling different problems, but it won't give you quite as much detail as to what it's trying to do.
While I'm always happy to argue that using an anthropomorphizing word like "think" to describe what AI is doing is a stretch, it does capture the fact that new models take time to process your prompt before responding directly to you. Research has shown that CoT reliably improves the accuracy of AI models, so it's no surprise that OpenAI o1 is significantly better at complex challenges than the GPT-4o models.
By using reinforcement learning (where the model is rewarded for getting things correct), OpenAI has trained the o1 models to try multiple approaches, recognize and correct mistakes, and take time to work through complex problems to find a good answer.
OpenAI has found that the performance of its o1 models increase with both training time and how long they're allowed to reason before providing an answer. This means that the more computing resources o1 has access to, the better it performs—which is why it's so expensive (we'll get to that in a bit).
Otherwise, OpenAI o1 appears to function much the same as other modern LLMs. OpenAI has released no meaningful details about its architecture, parameter count, or other changes, but that's now what we expect from major AI companies. Despite the name, OpenAI isn't actually producing open AI models.
GPT-4o vs. OpenAI o1
When it comes to tasks that require logical reasoning, OpenAI o1 and OpenAI o1-mini are significantly better than GPT-4o (and by extension, almost all other AI models). On typical AI benchmarks that require some logic where GPT-4o performs really well, like MMLU, OpenAI o1 still scores higher.
More interestingly, on tasks that require high levels of logical reasoning, GPT-4o tends to do pretty poorly. One example that OpenAI uses is the 2024 USA Math Olympiad (AIME) paper. Out of 15 hard math questions, GPT-4o was only able to answer two correctly. o1, however, was able to get 13 correct, which would place it among the top 500 students taking the paper in the U.S. The situation is similar on the competitive coding platform Codeforces. GPT-4o only scores in the 11th percentile, while the full o1 model scores in the 89th percentile.
What struck me most, though, were the situations where OpenAI o1 fell short. In human evaluations, the o1-preview model did slightly worse at personal writing and matched GPT-4o's performance at editing text. While not a big deal in and of itself, it is when you compare the cost of the different models (which we'll look at in a bit).
OpenAI o1-mini is a little more specialized and, according to the information released by OpenAI, excels at STEM questions that require logical reasoning and generating code—but not broad general knowledge. For its niche tasks, it's almost as good as the full o1 and better than GPT-4o—but for general tasks, it's worse than GPT-4o. (A major difference between o1 and o1-mini is likely to be the length of time that the AI is allowed to create its chain of thought before generating a response.)
To see this all in action, here's GPT-4o mini answering a question about how to get to Spain given different options.
While it understands that swimming would be challenging, it also seems to think four hours is longer than six hours. Plus, it says nothing about the fact that it would be ridiculous to swim as a travel option. (Calling it "adventurous" doesn't count in my book.)
When you give o1-preview the same prompt, it nails it.
It works through its chain of thought step by step before creating its output. That's why o1 (a) knows that six is more than four and (b) understands that it's "not practical or safe" to swim.
OpenAI o1 pricing
Through OpenAI's API, GPT-4o costs $5 per million input tokens and $15 per million output tokens. GPT-4o mini costs just $0.15 per million input tokens and $0.60 per million output tokens. On the other hand, o1-preview costs $15 per million input tokens and $60 per million output tokens. Even o1-mini costs $3 per million input tokens and $12 per million output tokens.
Model | Price per million input tokens | Price per million output tokens |
---|---|---|
GPT-4o mini | $0.15 | $0.60 |
o1-mini | $3 | $12 |
GPT-4o | $5 | $15 |
o1-preview | $15 | $60 |
All this is to say: the o1 models' increased logical performance comes at a cost. If you don't need the AI to work through complex problems, the OpenAI o1 models will cost far more for no major performance benefit.
Is OpenAI o1 worth it?
While I was super impressed with both the o1-preview and o1-mini models' ability to solve the kinds of problems that stump most AI models, they otherwise didn't stand out. They're even missing lots of useful features: they can't handle uploaded images or files or pull in content from the internet, for example.
So, at least for the time being, the o1 models are a super exciting development—but existing LLMs and large multimodal models will still have their uses. OpenAI says it's working on a system to automatically route your prompts to the most appropriate model, which would certainly make things work more seamlessly.
How to access OpenAI o1
Right now you can use the OpenAI o1-preview and OpenAI o1-mini models through ChatGPT and the API. If you're a ChatGPT Plus or Teams subscriber, you're limited to 50 weekly messages for o1-preview and 50 daily messages for o1-mini. Free users currently can't test them, but that might change in the near future.
The o1 models are also accessible through the API, but you don't have to be a developer to use them. With Zapier's ChatGPT integration, you can use both the o1-preview and o1-mini models, connecting them to thousands of other apps. Learn more about how to automate these new models, or get started with one of these pre-made workflows.
Create email copy with ChatGPT from new Gmail emails and save as drafts in Gmail
Start a conversation with ChatGPT when a prompt is posted in a particular Slack channel
Generate conversations in ChatGPT with new emails in Gmail
Create ChatGPT conversations from new tl;dv transcripts
Zapier is the leader in workflow automation—integrating with thousands of apps from partners like Google, Salesforce, and Microsoft. Use interfaces, data tables, and logic to build secure, automated systems for your business-critical workflows across your organization's technology stack. Learn more.
Related reading: