From d49b8be4d93fbac2a4b58612464dc6cac232e7e8 Mon Sep 17 00:00:00 2001 From: Simon Petit Date: Mon, 10 Nov 2025 15:06:15 +0100 Subject: [PATCH] enabling scrolling on overflow code --- css/poststyle.css | 1 + drafts/postgres_cdc.md | 50 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) create mode 100644 drafts/postgres_cdc.md diff --git a/css/poststyle.css b/css/poststyle.css index 283e783..f639a39 100644 --- a/css/poststyle.css +++ b/css/poststyle.css @@ -37,6 +37,7 @@ pre { background-color: lightgrey; border-radius: 0.5em; display: flex; + overflow-x: auto; } pre > code { diff --git a/drafts/postgres_cdc.md b/drafts/postgres_cdc.md new file mode 100644 index 0000000..76ca41c --- /dev/null +++ b/drafts/postgres_cdc.md @@ -0,0 +1,50 @@ +# Postgres CDC + +## What is CDC ? + +CDC stands for Change Data Capture. It is a mechanism that enables the replication of a database. +That is we listen to changes on the tables of the database so that we can replicate them into +another database. + +This is used in data engineering pipelines to extract data from sources and to replicate +them into the datawarehouse, data lake or lakehouse for example. +This way it is possible to do analysis over these data without impacting the transactionnal +database, which is used by another software as its primary storage. + +The other advantage of replicating the database is that it can be stored in another way. +For example it is possible to store the resulting mirrored database as parquet files, +or any columnar storage format, to speed up analytics queries. + +## Replication in Postgres + +The database needs some configuration to enable a replication sufficient for a CDC data pipeline. + +First, in the `postgres.conf` file the three following lines shall be added : +- `wal_level=logical` +- `max_replication_slots=10` +- `max_wal_senders=10` + +Here follows a quick explanation of what each of these parameters mean : + +### wal_level + +WAL stands for Write Ahead Logs. These are the logs written by postgres to +record all operations on the database. +By default the level is `replica`, which is .... [TODO] +but for CDC we need the highest level `logical`. This level records every transaction +happening is the database, at the point that we can literally reconstruct the database +from the logs; which is exactly what CDC is trying to achieve. + +### max_replication_slots + +Here comes another concept : the replication slots. +These are ... .[TODO] +Naturally, all WAL are not kepts forever, hence we need to configure replication slots +so that unread WAL are not destroyed before our CDC pipeline has had the chance to read them. + +### max_wal_senders + +[TODO] + +## Publications + \ No newline at end of file