Change Data Capture (CDC) is a technology that identifies and captures changes (inserts, updates, deletes) in a database, propagating these changes to other systems in real-time or near-real-time.
CDC is critical for data integration, real-time analytics, data synchronization, and microservices architectures, ensuring consistency across disparate data sources.
Compared with traditional full-data ETL, CDC only processes changed data, which significantly reduces the IO of the source database, network bandwidth, and target storage costs. It supports real-time/near-real-time synchronization while retaining the history of data changes.
| Mode | Core Principle | Advantages | Disadvantages | Applicable Scenarios |
|---|---|---|---|---|
| Log-based | Parse database transaction logs (MySQL binlog, PostgreSQL WAL, Oracle Redo Log) | High real-time performance (millisecond level), low intrusiveness, supports full + incremental data synchronization, no data loss | Requires enabling database logs, relies on log formats, some databases need specific permissions | High-concurrency real-time integration systems, core business systems |
| Query-based | Periodically poll incremental fields (e.g., update_time > last_sync) |
Simple to implement, no need to modify the source database, no permission dependencies | High latency (minute/hour level), potential data omission risk, repeated scanning may occur | Non-real-time scenarios, small-scale systems, low-cost integration solutions |
| Trigger-based | Deploy database triggers to capture data changes at the transaction level | Direct capture of change events, high accuracy, compatible with most databases | High intrusiveness (occupies database resources), may affect transaction performance, trigger logic maintenance is complex | Legacy system transformation with limited log access, small-scale data synchronization |
see also: