TL;DR
Streambed is a new tool that streams PostgreSQL WAL changes directly to Iceberg tables on S3, allowing analytical queries without modifying existing applications. It supports the Postgres wire protocol for seamless integration. The project is in early release with detailed setup instructions available.
Streambed, an open-source project announced on Hacker News, enables real-time streaming of PostgreSQL WAL changes to Iceberg tables stored on S3, supporting the Postgres wire protocol for querying without traditional ETL or Spark dependencies.
Streambed connects to PostgreSQL as a logical replication subscriber, decoding WAL messages for inserts, updates, and deletes. It buffers these changes and writes them as Parquet files to an S3 bucket, simultaneously updating Iceberg metadata. The system supports updates and deletes through copy-on-write merging. A built-in query server exposes Iceberg tables over the Postgres wire protocol, allowing users to query data with psql or any Postgres-compatible client. The project requires Go 1.22+ and CGO, and can be deployed locally using Docker or in production environments. Setup involves starting Postgres and MinIO locally, building the Go binary, and running the sync and query server components, with commands for resync and cleanup available.
Why It Matters
This development matters because it offers a streamlined, low-latency way to offload analytical workloads from production Postgres databases without changing existing applications. It simplifies data lake management by eliminating traditional ETL pipelines, enabling real-time analytics with familiar tools, and reducing infrastructure complexity. The support for the Postgres wire protocol means users can query streamed data directly with standard Postgres clients, broadening accessibility and ease of integration.

PostgreSQL Mastery: Schema Design, Query Tuning, and HA
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
Traditional data warehousing often relies on batch ETL processes or complex Spark-based pipelines to move data from transactional systems to analytical stores. Recent efforts aim to simplify this by enabling streaming approaches. Prior solutions have required significant setup or proprietary connectors. Streambed builds on logical replication in Postgres, a feature introduced in recent versions, to facilitate continuous data ingestion directly into data lakes on S3 using Iceberg. This aligns with industry trends toward real-time analytics and simplified data architecture.
“Streambed streams WAL changes via logical replication, writes Parquet files to S3, and commits Iceberg metadata, supporting real-time analytics without ETL or Spark.”
— Viggy28 (Hacker News user)
“The query server speaks the Postgres wire protocol, so you can connect with psql directly to query your streamed data.”
— Viggy28 (Hacker News user)

Iceberg 69227 ARC 6-Foot Rectangular Table, 36" x 72", Graphite/Silver Leg
Versatile for open plan environments
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
Details about performance at scale, stability in production environments, and long-term maintenance are still emerging. It is not yet clear how well Streambed handles very high throughput or complex schema changes, and user feedback is limited to initial releases.
Postgres wire protocol compatible client
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
Next steps include broader testing and adoption, potential feature enhancements such as support for more complex schema evolution, and integration with cloud-native orchestration tools. Developers may also explore deploying Streambed in production environments to evaluate performance and reliability.
Parquet file storage on S3
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
How does Streambed compare to traditional ETL pipelines?
Streambed provides real-time streaming of Postgres changes directly into Iceberg on S3, eliminating the need for batch ETL jobs and reducing latency. It simplifies architecture by avoiding Spark or other heavy processing frameworks.
Can I query the streamed data with standard Postgres tools?
Yes, Streambed includes a query server that exposes Iceberg tables over the Postgres wire protocol, allowing connection with psql and other Postgres-compatible clients.
What are the system requirements to run Streambed?
Streambed requires Go 1.22+ and CGO. It can be run locally using Docker or deployed directly on servers. It also depends on a Postgres instance with logical replication enabled and an S3-compatible storage service like MinIO or AWS S3.
Is Streambed suitable for high-volume production environments?
While initial release details are promising, performance at scale and stability in production are still under evaluation. Users should conduct testing before deploying in critical systems.
Source: Hacker News