#publication

28 posts

29 Apr

29 Apr 2026 1 min read

Where the goblins came from

How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.

publication

5 Mar

5 Mar 2026

3 Mar

3 Mar 2026

5 Feb

5 Feb 2026 1 min read

GPT-5.3-Codex System Card

OpenAI Engineering

GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.

publication

18 Dec 2025

Addendum to GPT-5.2 System Card: GPT-5.2-Codex

OpenAI Engineering

publication

11 Dec 2025

11 Dec 2025 1 min read

Advancing science and math with GPT-5.2

OpenAI Engineering

GPT-5.2 is OpenAI’s strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post shows how those gains translate into real research progress, including solving an open theoretical problem and generating reliable mathematical proofs.

publication

11 Dec 2025 1 min read

Update to GPT-5 System Card: GPT-5.2

OpenAI Engineering

GPT-5.2 is the latest model family in the GPT-5 series. The comprehensive safety mitigation approach for these models is largely the same as that described in the GPT-5 System Card and GPT-5.1 System Card. Like OpenAI’s other models, the GPT-5.2 models were trained on diverse datasets, including information that is publicly available on the internet, information that we partner with…

publication

19 Nov 2025

19 Nov 2025 1 min read

GPT-5.1-Codex-Max System Card

OpenAI Engineering

This system card outlines the comprehensive safety measures implemented for GPT‑5.1-CodexMax. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access.

publication

12 Nov 2025

12 Nov 2025 1 min read

GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum

OpenAI Engineering

This GPT-5 system card addendum provides updated safety metrics for GPT-5.1 Instant and Thinking, including new evaluations for mental health and emotional reliance.

publication

30 Sept 2025

30 Sept 2025 1 min read

Sora 2 System Card

OpenAI Engineering

Sora 2 is our new state of the art video and audio generation model. Building on the foundation of Sora, this new model introduces capabilities that have been difficult for prior video models to achieve– such as more accurate physics, sharper realism, synchronized audio, enhanced steerability, and an expanded stylistic range.

publication

25 Sept 2025

25 Sept 2025 1 min read

Measuring the performance of our models on real-world tasks

OpenAI Engineering

OpenAI introduces GDPval, a new evaluation that measures model performance on real-world economically valuable tasks across 44 occupations.

publication

17 Sept 2025

17 Sept 2025 1 min read

Detecting and reducing scheming in AI models

OpenAI Engineering

Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.

publication

27 Aug 2025

27 Aug 2025 1 min read

Collective alignment: public input on our Model Spec

OpenAI Engineering

OpenAI surveyed over 1,000 people worldwide on how AI should behave and compared their views to our Model Spec. Learn how collective alignment is shaping AI defaults to better reflect diverse human values and perspectives.

publication

22 Aug 2025

22 Aug 2025 1 min read

Accelerating life sciences research

OpenAI Engineering

Discover how a specialized AI model, GPT-4b micro, helped OpenAI and Retro Bio engineer more effective proteins for stem cell therapy and longevity research.

publication

7 Aug 2025

7 Aug 2025 1 min read

GPT-5 System Card

OpenAI Engineering

This GPT-5 system card explains how a unified model routing system powers fast and smart responses using gpt-5-main, gpt-5-thinking, and lightweight versions like gpt-5-thinking-nano, optimized for different tasks and developer use.

publication

5 Aug 2025

5 Aug 2025 1 min read

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI Engineering

We introduce gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models available under the Apache 2.0 license and our gpt-oss usage policy.

publication

22 Jul 2025

22 Jul 2025 1 min read

Pioneering an AI clinical copilot with Penda Health

OpenAI Engineering

OpenAI and Penda Health debut an AI clinical copilot that cuts diagnostic errors by 16% in real-world use—offering a new path for safe, effective AI in healthcare.

publication

17 Jul 2025

17 Jul 2025 1 min read

ChatGPT agent System Card

OpenAI Engineering

ChatGPT agent System Card: OpenAI’s agentic model unites research, browser automation, and code tools with safeguards under the Preparedness Framework.

publication

18 Jun 2025

18 Jun 2025 1 min read

Toward understanding and preventing misalignment generalization

OpenAI Engineering

We study how training on incorrect responses can cause broader misalignment in language models and identify an internal feature driving this behavior—one that can be reversed with minimal fine-tuning.

publication

12 May 2025

12 May 2025 1 min read

Introducing HealthBench

OpenAI Engineering

HealthBench is a new evaluation benchmark for AI in healthcare which evaluates models in realistic scenarios. Built with input from 250+ physicians, it aims to provide a shared standard for model performance and safety in health.

publication

16 Apr 2025

16 Apr 2025 1 min read

OpenAI o3 and o4-mini System Card

OpenAI Engineering

OpenAI o3 and OpenAI o4-mini combine state-of-the-art reasoning with full tool capabilities—web browsing, Python, image and file analysis, image generation, canvas, automations, file search, and memory.

publication

15 Apr 2025

15 Apr 2025 1 min read

Our updated Preparedness Framework

OpenAI Engineering

Sharing our updated framework for measuring and protecting against severe harm from frontier AI capabilities.

publication

10 Apr 2025

10 Apr 2025 1 min read

BrowseComp: a benchmark for browsing agents

OpenAI Engineering

BrowseComp: a benchmark for browsing agents.

publication

2 Apr 2025

2 Apr 2025 1 min read

PaperBench: Evaluating AI’s Ability to Replicate AI Research

OpenAI Engineering

We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.

publication

25 Mar 2025

25 Mar 2025 1 min read

Addendum to GPT-4o System Card: 4o image generation

OpenAI Engineering

4o image generation is a new, significantly more capable image generation approach than our earlier DALL·E 3 series of models. It can create photorealistic output. It can take images as inputs and transform them.

publication

10 Mar 2025

10 Mar 2025 1 min read

Detecting misbehavior in frontier reasoning models

OpenAI Engineering

Frontier reasoning models exploit loopholes when given the chance. We show we can detect exploits using an LLM to monitor their chains-of-thought. Penalizing their “bad thoughts” doesn’t stop the majority of misbehavior—it makes them hide their intent.

publication

27 Feb 2025

27 Feb 2025 1 min read

OpenAI GPT-4.5 System Card

OpenAI Engineering

We’re releasing a research preview of OpenAI GPT‑4.5, our largest and most knowledgeable model yet.

publication

18 Feb 2025

18 Feb 2025 1 min read

Introducing the SWE-Lancer benchmark

OpenAI Engineering

Can frontier LLMs earn $1 million from real-world freelance software engineering?

publication