#safety

48 posts

20 Jul

20 Jul 2026 1 min read

Safety and alignment in an era of long-horizon models

OpenAI shares lessons from deploying long-running AI models, highlighting new safety risks, observed failures, and improved safeguards through iterative deployment.

safety

16 Jul

16 Jul 2026 1 min read

Why teens deserve access to safe AI

OpenAI Engineering

Learn how OpenAI is making ChatGPT safer for teens with age-appropriate protections, learning tools, parental controls, and expert partnerships.

safety

15 Jul

15 Jul 2026 1 min read

GPT-Red: Unlocking Self-Improvement for Robustness

OpenAI Engineering

Explore GPT-Red, OpenAI’s automated red teaming system that uses self-play to improve AI safety, alignment, and prompt injection robustness.

safety

9 Jul

9 Jul 2026 1 min read

GPT-5.5 Bio Bug Bounty

OpenAI Engineering

Details about the OpenAI Bio Bounty program

safety

29 May

29 May 2026 1 min read

A shared playbook for trustworthy third party evaluations

OpenAI Engineering

OpenAI shares guidance on third-party AI evaluations, covering how to assess model capabilities, safeguards, and validity for frontier systems.

safety

28 May

28 May 2026 1 min read

OpenAI’s Frontier Governance Framework

OpenAI Engineering

Explore OpenAI’s Frontier Governance Framework and how our AI safety, security, and risk practices align with emerging EU and California regulations.

safety

19 May

19 May 2026 1 min read

Advancing content provenance for a safer, more transparent AI ecosystem

OpenAI Engineering

OpenAI advances AI content provenance with Content Credentials, SynthID, and a verification tool to help people identify and trust AI-generated media.

safety

14 May

14 May 2026 1 min read

Helping ChatGPT better recognize context in sensitive conversations

OpenAI Engineering

Learn how new ChatGPT safety updates improve context awareness in sensitive conversations, helping detect risk over time and respond more safely.

safety

7 May

7 May 2026 1 min read

Introducing Trusted Contact in ChatGPT

OpenAI Engineering

Introducing Trusted Contact in ChatGPT, an optional safety feature that notifies someone you trust if serious self-harm concerns are detected.

safety

5 May

5 May 2026

GPT-5.5 Instant System Card

OpenAI Engineering

safety

5 May 2026 1 min read

Advancing youth safety and wellbeing in EMEA

OpenAI Engineering

Explore OpenAI’s European Youth Safety Blueprint and EMEA Youth & Wellbeing Grants, advancing safe, responsible AI for teens, families, and educators.

safety

28 Apr

28 Apr 2026 1 min read

Our commitment to community safety

OpenAI Engineering

Learn how OpenAI protects community safety in ChatGPT through model safeguards, misuse detection, policy enforcement, and collaboration with safety experts.

safety

23 Apr

23 Apr 2026

GPT-5.5 System Card

OpenAI Engineering

safety

23 Apr 2026 1 min read

GPT-5.5 Bio Bug Bounty

OpenAI Engineering

Explore the GPT-5.5 Bio Bug Bounty: a red-teaming challenge to find universal jailbreaks for bio safety risks, with rewards up to $25,000.

safety

8 Apr

8 Apr 2026 1 min read

Introducing the Child Safety Blueprint

OpenAI Engineering

Discover OpenAI’s Child Safety Blueprint—a roadmap for building AI responsibly with safeguards, age-appropriate design, and collaboration to protect and empower young people online.

safety

6 Apr

6 Apr 2026 1 min read

Announcing the OpenAI Safety Fellowship

OpenAI Engineering

A pilot program to support independent safety and alignment research and develop the next generation of talent

safety

25 Mar

25 Mar 2026 1 min read

Introducing the OpenAI Safety Bug Bounty program

OpenAI Engineering

OpenAI launches a Safety Bug Bounty program to identify AI abuse and safety risks, including agentic vulnerabilities, prompt injection, and data exfiltration.

safety

24 Mar

24 Mar 2026 1 min read

Helping developers build safer AI experiences for teens

OpenAI Engineering

OpenAI releases prompt-based teen safety policies for developers using gpt-oss-safeguard, helping moderate age-specific risks in AI systems.

safety

23 Mar

23 Mar 2026 1 min read

To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we’ve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.

safety

19 Mar

19 Mar 2026 1 min read

How we monitor internal coding agents for misalignment

OpenAI Engineering

How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agents—analyzing real-world deployments to detect risks and strengthen AI safety safeguards.

safety

17 Mar

17 Mar 2026 1 min read

OpenAI Japan announces Japan Teen Safety Blueprint to put teen safety first

OpenAI Engineering

OpenAI Japan announces the Japan Teen Safety Blueprint, introducing stronger age protections, parental controls, and well-being safeguards for teens using generative AI.

safety

27 Feb

27 Feb 2026 1 min read

An update on our mental health-related work

OpenAI Engineering

OpenAI shares updates on its mental health safety work, including parental controls, trusted contacts, improved distress detection, and recent litigation developments.

safety

13 Feb

13 Feb 2026 1 min read

Introducing Lockdown Mode and Elevated Risk labels in ChatGPT

OpenAI Engineering

Introducing Lockdown Mode and Elevated Risk labels in ChatGPT to help organizations defend against prompt injection and AI-driven data exfiltration.

safety

28 Jan

28 Jan 2026 1 min read

Keeping your data safe when an AI agent clicks a link

OpenAI Engineering

Learn how OpenAI protects user data when AI agents open links, preventing URL-based data exfiltration and prompt injection with built-in safeguards.

safety

20 Jan

20 Jan 2026 1 min read

Our approach to age prediction

OpenAI Engineering

ChatGPT is rolling out age prediction to estimate if accounts are under or over 18, applying safeguards for teens and refining accuracy over time.

safety

18 Dec 2025

18 Dec 2025 1 min read

Updating our Model Spec with teen protections

OpenAI Engineering

OpenAI is updating its Model Spec with new Under-18 Principles that define how ChatGPT should support teens with safe, age-appropriate guidance grounded in developmental science. The update strengthens guardrails, clarifies expected model behavior in higher-risk situations, and builds on our broader work to improve teen safety across ChatGPT.

safety

18 Dec 2025 1 min read

AI literacy resources for teens and parents

OpenAI Engineering

OpenAI shares new AI literacy resources to help teens and parents use ChatGPT thoughtfully, safely, and with confidence. The guides include expert-vetted tips for responsible use, critical thinking, healthy boundaries, and supporting teens through emotional or sensitive topics.

safety

19 Nov 2025

19 Nov 2025 1 min read

Strengthening our safety ecosystem with external testing

OpenAI Engineering

OpenAI works with independent experts to evaluate frontier AI systems. Third-party testing strengthens safety, validates safeguards, and increases transparency in how we assess model capabilities and risks.

safety

29 Oct 2025

29 Oct 2025 1 min read

gpt-oss-safeguard technical report

OpenAI Engineering

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and provide our baseline safety evaluations on the gpt-oss-safeguard models, using the underlying gpt-oss models as a baseline. For more information about the development…

safety

27 Oct 2025

27 Oct 2025 1 min read

Addendum to GPT-5 System Card: Sensitive conversations

OpenAI Engineering

This system card details GPT-5’s improvements in handling sensitive conversations, including new benchmarks for emotional reliance, mental health, and jailbreak resistance.

safety

27 Oct 2025 1 min read

Strengthening ChatGPT’s responses in sensitive conversations

OpenAI Engineering

OpenAI collaborated with 170+ mental health experts to improve ChatGPT’s ability to recognize distress, respond empathetically, and guide users toward real-world support—reducing unsafe responses by up to 80%. Learn how we’re making ChatGPT safer and more supportive in sensitive moments.

safety

16 Sept 2025

16 Sept 2025 1 min read

Building towards age prediction

OpenAI Engineering

Learn how OpenAI is building age prediction and parental controls in ChatGPT to create safer, age-appropriate experiences for teens while supporting families with new tools.

safety

16 Sept 2025 1 min read

Teen safety, freedom, and privacy

OpenAI Engineering

Explore OpenAI’s approach to balancing teen safety, freedom, and privacy in AI use.

safety

15 Sept 2025

15 Sept 2025 1 min read

Addendum to GPT-5 system card: GPT-5-Codex

OpenAI Engineering

This addendum to the GPT-5 system card shares a new model: GPT-5-Codex, a version of GPT-5 further optimized for agentic coding in Codex. GPT-5-Codex adjusts its thinking effort more dynamically based on task complexity, responding quickly to simple conversational queries or small tasks, while independently working for longer on more complex tasks.

safety

2 Sept 2025

2 Sept 2025 1 min read

Building more helpful ChatGPT experiences for everyone

OpenAI Engineering

We’re partnering with experts, strengthening protections for teens with parental controls, and routing sensitive conversations to reasoning models in ChatGPT.

safety

27 Aug 2025

27 Aug 2025 1 min read

OpenAI and Anthropic share findings from a joint safety evaluation

OpenAI Engineering

OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.

safety

26 Aug 2025

26 Aug 2025 1 min read

Helping people when they need it most

OpenAI Engineering

How we think about safety for users experiencing mental or emotional distress, the limits of today’s systems, and the work underway to refine them.

safety

7 Aug 2025

7 Aug 2025 1 min read

From hard refusals to safe-completions: toward output-centric safety training

OpenAI Engineering

Discover how OpenAI's new safe-completions approach in GPT-5 improves both safety and helpfulness in AI responses—moving beyond hard refusals to nuanced, output-centric safety training for handling dual-use prompts.

safety

5 Aug 2025

5 Aug 2025 1 min read

Estimating worst case frontier risks of open weight LLMs

OpenAI Engineering

In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity.

safety

4 Aug 2025

4 Aug 2025 1 min read

What we’re optimizing ChatGPT for

OpenAI Engineering

We build ChatGPT to help you thrive in all the ways you want. Learn how we're improving support for tough moments, have rolled out reminders to take breaks, and are working on better life advice, all guided by expert input.

safety

18 Jun 2025

18 Jun 2025 1 min read

Preparing for future AI risks in biology

OpenAI Engineering

Advanced AI can transform biology and medicine—but also raises biosecurity risks. We’re proactively assessing capabilities and implementing safeguards to prevent misuse.

safety

23 May 2025

23 May 2025 1 min read

Addendum to OpenAI o3 and o4-mini system card: OpenAI o3 Operator

OpenAI Engineering

We are replacing the existing GPT-4o-based model for Operator with a version based on OpenAI o3. The API version will remain based on 4o.

safety

16 May 2025

16 May 2025 1 min read

Addendum to o3 and o4-mini system card: Codex

OpenAI Engineering

Codex is a cloud-based coding agent. Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering. codex-1 was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences, adheres precisely to instructions, and iteratively runs tests until passing results are achieved.

safety

25 Feb 2025

25 Feb 2025 1 min read

Deep research System Card

OpenAI Engineering

This report outlines the safety work carried out prior to releasing deep research including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas.

safety

12 Feb 2025

Sharing the latest Model Spec

OpenAI Engineering

safety

7 May 2024

7 May 2024 1 min read

Our approach to data and AI

OpenAI Engineering

Just over a year after launching ChatGPT, AI is changing how we live, work and learn. It’s also raised important conversations about data in the age of AI. More on our approach, a new Media Manager for creators and content owners, and where we’re headed.

safety

14 Feb 2024

Disrupting malicious uses of AI by state-affiliated threat actors

OpenAI Engineering

safety

11 Apr 2023

11 Apr 2023 1 min read

Announcing OpenAI’s Bug Bounty Program

OpenAI Engineering

This initiative is essential to our commitment to develop safe and advanced AI. As we create technology and services that are secure, reliable, and trustworthy, we need your help.

safety