Openai has released new O3 and O4-Mini AI models

Openai announced the launch of new O3 and O4-Mini AI models. Both are focused on reasoning – they spend more time before the answer for the rechecking of themselves.

O3 is positioned as the most advanced “thinking” neural network. According to internal tests, it exceeds previous iterations in mathematics, programming, reasoning, science and visual sense.

O4-Mini offers a competitive compromise between price, speed and performance.

Both models are able to view web pages, analyze the code for Python, process and generate images. They, as well as O4-Mini-High variation, are available for subscribers Pro, Plus and Team.

According to the company, the O3 and O4-Mini models were the first to not only recognize the images, but literally “thinks with their help”. Users can upload pictures to ChatGPT – for example, the patterns on the board or diagrams from the PDF – and the models will analyze them using the so -called “thought chain”.

Thanks to this, neural networks are able to understand blurry and poor -quality images. They can also launch and execute the code on Python directly in the browser using the Canvas function in Chatgpt or search on the Internet if they are asked about current events.

O3 scored 69.1% in the SWE-Bench programming test, O4-Mini-68.1%. O3-Mini has an indicator of 49.3 %, Claude 3.7 Sonnet-62.3 %.

O3 charges $ 10 for a million input tokens and $ 40 – weekends. In the case of O4-Mini-$ 1.1 and $ 4.4, respectively.

In the coming weeks, it is planned to launch O3-Pro-version of O3, which involves more computing resources to provide an answer. It will be available only to CHATGPT PRO subscribers.

New security system

Openai has introduced a new monitoring system in O3 and O4-Mini models to identify requests related to biological and chemical threats. It is aimed at preventing the provision of tips that may encourage potentially dangerous attacks.

The company noted that the new models have significantly expanded capabilities compared to previous ones and, accordingly, carry an increased risk when using non -respectable users.

O3 is more skillful in answering questions related to the creation of certain types of biological threats, so the company has created a new monitoring system. It works on top of O3 and O4-Mini and is designed to detect industrial and chemical risk.

Openai specialists spent about 1000 hours, marrying “unsafe” conversations. Then the models refused to respond to risky industrials in 98.7% of cases.

Despite the regular improvement of the safety of and models, one of the company’s partners expressed concern.

Openai is in a hurry

The organization Metr, with which Openai cooperates to check the capabilities of its AI models and their security evaluation, has received little time to test new neural networks.

She reported on the blog that one of the reference experiments of O3 was passed “in a relatively short time” compared to the analysis of the previous flagship model Openai – O1.

According to Financial Times, AI-Startap gave tesers less than a week to check the safety of new products.

Metr claims that, based on the information that was collected in a limited time, O3 has a “high tendency” to “deception” or “hacking” tests in difficult ways to maximize its score. She goes to extreme measures even when she clearly understands that behavior does not correspond to the intentions of the user and Openai.

The organization believes that O3 can also show other types of hostile or “malicious” behavior.

“Although we do not consider this especially probable, it is important to note that [наша] The estimated installation will not be able to catch this type of risk. In general, we believe that testing of opportunities before launching in itself is not a sufficient risk management strategy, and at present we are developing prototypes of additional forms of assessment, ”the company emphasized.

Apollo Research also recorded the deceptive behavior of the O3 O4-Mini model. In one of the tests, she was forbidden to use a certain tool – but the model still applied it, believing that it would help to better cope with the task.

“[Выводы Apollo] It is shown that O3 and O4-Mini are capable of intra-contextual intrigues and strategic deception. Despite the relative harmlessness, it is important for everyday users to know about the discrepancies between statements and the actions of models […] This can be additionally evaluated by analyzing the internal traces of reasoning, ”Openai noted.

Agent for programming

Together with the new Openai AI models, she introduced Codex CLI-a local software agent that is launched directly from the terminal.

The tool allows you to write and edit the code on the desktop and perform some actions like moving files.

“You can get the advantages of multimodal reasoning from the command line, transmitting screenshots or sketches with low resolution of the model, combined with access to your code locally [через Codex CLI]”, – noted in the company.

Openai wants to buy Windsurf

Meanwhile, Openai is negotiating on the possible acquisition of a popular AI assistant for Windsurf programmers. This is written by Bloomberg.

The deal may be the largest purchase for the startup Sam Altman. Its details have not yet been defined and may change, emphasized in the agency.

Recall that in April Openai introduced a new family of AI models-GPT-4.1, GPT-4.1 mini and GPT-4.1 Nano. They “do excellently” with programming and implementation of instructions.

Be in the know! Subscribe to Telegram.


Source: Cryptocurrency

You may also like

Willy Wu predicts bitcoin growth
Top News
David

Willy Wu predicts bitcoin growth

The famous onchain analyst Willy Wu believes that Bitcoin (BTC) may return to his historical maximums if the current capital