enabling scrolling on overflow code
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
Simon Petit 2025-11-10 15:06:15 +01:00
parent 2c4319afcb
commit d49b8be4d9
2 changed files with 51 additions and 0 deletions

View File

@ -37,6 +37,7 @@ pre {
background-color: lightgrey;
border-radius: 0.5em;
display: flex;
overflow-x: auto;
}
pre > code {

50
drafts/postgres_cdc.md Normal file
View File

@ -0,0 +1,50 @@
# Postgres CDC
## What is CDC ?
CDC stands for Change Data Capture. It is a mechanism that enables the replication of a database.
That is we listen to changes on the tables of the database so that we can replicate them into
another database.
This is used in data engineering pipelines to extract data from sources and to replicate
them into the datawarehouse, data lake or lakehouse for example.
This way it is possible to do analysis over these data without impacting the transactionnal
database, which is used by another software as its primary storage.
The other advantage of replicating the database is that it can be stored in another way.
For example it is possible to store the resulting mirrored database as parquet files,
or any columnar storage format, to speed up analytics queries.
## Replication in Postgres
The database needs some configuration to enable a replication sufficient for a CDC data pipeline.
First, in the `postgres.conf` file the three following lines shall be added :
- `wal_level=logical`
- `max_replication_slots=10`
- `max_wal_senders=10`
Here follows a quick explanation of what each of these parameters mean :
### wal_level
WAL stands for Write Ahead Logs. These are the logs written by postgres to
record all operations on the database.
By default the level is `replica`, which is .... [TODO]
but for CDC we need the highest level `logical`. This level records every transaction
happening is the database, at the point that we can literally reconstruct the database
from the logs; which is exactly what CDC is trying to achieve.
### max_replication_slots
Here comes another concept : the replication slots.
These are ... .[TODO]
Naturally, all WAL are not kepts forever, hence we need to configure replication slots
so that unread WAL are not destroyed before our CDC pipeline has had the chance to read them.
### max_wal_senders
[TODO]
## Publications