How OpenAI is scaling the PostgreSQL database to 800 million users -

While vector databases still have many valid use cases, organizations including OpenAI are leaning on PostgreSQL to get things done.

In a blog post on Thursday, OpenAI disclosed how it is using the open-source PostgreSQL database.

OpenAI runs ChatGPT and its API platform for 800 million users on a single-primary PostgreSQL instance — not a distributed database, not a sharded cluster. One Azure PostgreSQL Flexible Server handles all writes. Nearly 50 read replicas spread across multiple regions handle reads. The system processes millions of queries per second while maintaining low double-digit millisecond p99 latency and five-nines availability.

The setup challenges conventional scaling wisdom and offers enterprise architects insight into what actually works at massive scale.

The lesson here isn’t to copy OpenAI’s stack. It’s that architectural decisions should be driven by workload patterns and operational constraints — not by scale panic or fashionable infrastructure choices. OpenAI’s PostgreSQL setup shows how far proven systems can stretch when teams optimize deliberately instead of re-architecting prematurely.

“For years, PostgreSQL has been one of the most critical, under-the-hood data systems powering core products like ChatGPT and OpenAI’s API,” OpenAI engineer Bohan Zhang wrote in a technical disclosure. “Over the past year, our PostgreSQL load has grown by more than 10x, and it continues to rise quickly.”

The company achieved this scale through targeted optimizations, including connection pooling that cut connection time from 50 milliseconds to 5 milliseconds and cache locking to prevent ‘thundering herd’ problems where cache misses trigger database overload.

Why PostgreSQL matters for enterprises

PostgreSQL handles operational data for ChatGPT and OpenAI’s API platform. The workload is heavily read-oriented, which makes PostgreSQL a good fit. However, PostgreSQL’s multiversion concurrency control (MVCC) creates challenges under heavy write loads.

When updating data, PostgreSQL copies entire rows to create new versions, causing write amplification and forcing queries to scan through multiple versions to find current data.  

Rather than fighting this limitation, OpenAI built its strategy around it. At OpenAI’s scale, these tradeoffs aren’t theoretical — they determine which workloads stay on PostgreSQL and which ones must move elsewhere.

How OpenAI is optimizing PostgreSQL

At large scale, conventional database wisdom points to one of two paths: shard PostgreSQL across multiple primary instances so writes can be distributed, or migrate to a distributed SQL database like CockroachDB or YugabyteDB designed to handle massive scale from the start. Most organizations would have taken one of these paths years ago, well before reaching 800 million users.

Sharding or moving to a distributed SQL database eliminates the single-writer bottleneck. A distributed SQL database handles this coordination automatically, but both approaches introduce significant complexity: application code must route queries to the correct shard, distributed transactions become harder to manage and operational overhead increases substantially.

Instead of sharding PostgreSQL, OpenAI established a hybrid strategy: no new tables in PostgreSQL. New workloads default to sharded systems like Azure Cosmos DB. Existing write-heavy workloads that can be horizontally partitioned get migrated out. Everything else stays in PostgreSQL with aggressive optimization.

This approach offers enterprises a practical alternative to wholesale re-architecture. Rather than spending years rewriting hundreds of endpoints, teams can identify specific bottlenecks and move only those workloads to purpose-built systems.  

Why this matters

OpenAI’s experience scaling PostgreSQL reveals several practices that enterprises can adopt regardless of their scale.

Build operational defenses at multiple layers. OpenAI’s approach combines cache locking to prevent “thundering herd” problems, connection pooling (which dropped their connection time from 50ms to 5ms), and rate limiting at application, proxy and query levels. Workload isolation routes low-priority and high-priority traffic to separate instances, ensuring a poorly optimized new feature can’t degrade core services.

Review and monitor ORM-generated SQL in production. Object-Relational Mapping (ORM) frameworks like Django, SQLAlchemy, and Hibernate automatically generate database queries from application code, which is convenient for developers. However, OpenAI found one ORM-generated query joining 12 tables that caused multiple high-severity incidents when traffic spiked. The convenience of letting frameworks generate SQL creates hidden scaling risks that only surface under production load. Make reviewing these queries a standard practice.

Enforce strict operational discipline. OpenAI permits only lightweight schema changes — anything triggering a full table rewrite is prohibited. Schema changes have a 5-second timeout. Long-running queries get automatically terminated to prevent blocking database maintenance operations. When backfilling data, they enforce rate limits so aggressive that operations can take over a week.

Read-heavy workloads with burst writes can run on single-primary PostgreSQL longer than commonly assumed. The decision to shard should depend on workload patterns rather than user counts.

This approach is particularly relevant for AI applications, which often have heavily read-oriented workloads with unpredictable traffic spikes. These characteristics align with the pattern where single-primary PostgreSQL scales effectively.

The lesson is straightforward: identify actual bottlenecks, optimize proven infrastructure where possible, and migrate selectively when necessary. Wholesale re-architecture isn’t always the answer to scaling challenges.