Change Data Capture in Embedded Databases
July 13, 2020
Change Data Capture (CDC) is broadly defined as tracking changes in a database. The purposes of tracking changes are many and varied.
Change Data Capture (CDC) is broadly defined as tracking changes in a database. The purposes of tracking changes are many and varied. CDC in embedded database systems can be implemented in several different ways that are sometimes invisible to applications, and ways that applications can exploit for data sharing, responding to events, and incremental back up.
The first, and possibly most obvious, implementation of CDC in embedded databases is part-and-parcel of implementing the ACID properties of transactions: Atomicity, Consistency, Isolation and Durability. The successful application of a transaction to a database moves that database from one consistent state to a new consistent state. Conversely, the unsuccessful application of a transaction to the database must return the database to the consistent state that existed just prior to the unsuccessful application of the transaction. To accomplish these requirements, a database management system must keep track of the changes. Implementation details vary from database system to database system, and even within a database system family from a pure in-memory database, a persistent in-memory database, and a database that is partially or fully persistent (a hybrid database).
In the case of a pure in-memory database, there is no transaction log in which to record changes. So, either the changes or the before-image of the changed records need to be kept in a buffer while the transaction is active so that the database can be restored to its pre-transaction state upon an aborted transaction.
For in-memory databases with persistence, all changes are also appended to a transaction log that is stored on persistent media that can be replayed for recovery after a crash.
For persistent (disk-based) databases, transaction logging is also utilized, both to optimize performance and to support recovery from a crash. In this case, two forms of transaction logging can be offered: UNDO logging and write-ahead logging (WAL). WAL is as described in the previous paragraph. UNDO logging writes the before-image of changed records to the transaction log file. In the event of a crash, the UNDO log information is used to rollback an incomplete transaction (i.e., to return the database to the last consistent state).
Another internal use of CDC in some database systems is in the implementation of optimistic concurrency control in via MVCC (Multi-Version Concurrency Control). Optimistic concurrency control means that applications do not have to acquire locks, which also means that an application never has to wait for a lock that is held by another application. This requires the database system to know if two applications attempt to modify the same database object at the same time. This is done by tracking version numbers that are checked when a transaction is committed (hence the name, multi-version concurrency control). If an object’s version has changed between the time the application acquired a copy of the object and the time when the application wants to commit a change to that object, it means that another application modified the underlying object first and this transaction must be aborted and retried. The theory behind MVCC is that such conflicts are rare and that an occasional retry is more efficient, in the large, than always having to acquire locks and potentially blocking other applications with those locks.
CDC is also used in High Availability (HA) implementations for systems that require “five 9s” availability (i.e., 99.999% uptime). HA is implemented in different ways by different database vendors: real-time transaction replication, SQL statement replication, log file forwarding, etc. There are so-called 1-safe (aka lazy, asynchronous) and 2-safe (aka eager, synchronous) implementations. For example, real-time transaction replication can be either 1-safe or 2-safe (in eXtremeDB, we refer to this as time-cognizant two-phase commit); log forwarding is inherently 1-safe.
CDC can be utilized in embedded database and/or third-party systems for other purposes that can be used within applications: open replication, Triggers/event notifications, and Incremental Backup.
Some embedded database systems implement replication to, e.g., support High Availability and/or database clusters. Such systems make it easy to replicate data or transactions between two or more instances of the same database system, but they are not a solution when it is necessary to replicate from an embedded database to any other destination. There are third party products that attempt to fill this gap, such as Actian DataConnect and Oracle Golden Gate. Or through a custom extract-transform-load (ETL) solution, which can be sensitive to changes in the source and/or destination databases.
Triggers and event notification schemes are a classic use-case of change data capture. After all, triggers are fired on specified tables’ INSERT, UPDATE or DELETE statement executions. In other words, data has changed, and the triggers capture that change to be acted on in some way by the surrounding system.
The last significant purpose of Change Data Capture in embedded database systems is in the implementation of incremental backup facilities. By necessity, an incremental backup scheme has to know incremental changes occurred in the database between either the last full snapshot or the last incremental backup.
Change Data Capture is central to any database management system. It is instrumental in the implementation of enforcing the ACID properties of concurrency and durability, replication, triggers/event notifications, and backup and restore.
About the Author
Steve Graves co-founded McObject in 2001. As the company’s president and CEO, he has both spearheaded McObject’s growth and helped the company attain its goal of providing embedded database technology that makes embedded systems smarter, more reliable and more cost-effective to develop and maintain. Prior to McObject, Graves was president and chairman of Centura Solutions Corporation, and vice president of worldwide consulting for Centura Software Corporation (NASDAQ: CNTR); he also served as president and chief operating officer of Raima Corporation. Graves is a member of the advisory board for the University of Washington’s certificate program in Embedded and Real Time Systems Programming.