OpenAI has expanded its artificial intelligence reasoning technology to handle tasks involving images, unveiling new systems that can work with visual elements like sketches, posters, diagrams and graphs alongside text.
The company launched two enhanced versions of its reasoning technology on Wednesday – OpenAI o3 and OpenAI o4-mini – that can process both images and text, giving them the ability to “manipulate, crop and transform images in service of the task you want to do,” according to OpenAI’s head of research, Mark Chen.
Unlike earlier chatbot iterations that provide immediate responses, these reasoning systems take time to “think” before answering questions, analyzing problems through multiple sequential steps similar to human reasoning processes. The technology builds on large language models (LLMs) enhanced through reinforcement learning, where systems improve through extensive trial and error with various problems.
Industry experts note these systems don’t necessarily reason exactly as humans do and can still produce errors or fabricate information—a phenomenon known as hallucination. OpenAI joins other major tech companies like Google, Meta, and Chinese startup DeepSeek in developing similar reasoning capabilities.
Alongside these visual reasoning advances, OpenAI introduced Codex CLI, a new tool designed for programmers that facilitates using AI systems with existing code stored on personal machines. The company is open-sourcing this tool, allowing developers to freely modify and build upon the technology.
Starting immediately, these new systems are available to ChatGPT Plus ($20/month) and ChatGPT Pro ($200/month) subscribers, providing access to OpenAI’s latest technological capabilities.