What Could Go Wrong with a Kafka JDBC Connector?


Manage episode 336673215 series 2510642
By Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka®. Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio is streamed directly from their servers. Hit the Subscribe button to track updates in Player FM, or paste the feed URL into other podcast apps.

Java Database Connectivity (JDBC) is the Java API used to connect to a database. As one of the most popular Kafka connectors, it's important to prevent issues with your integrations.
In this episode, we'll cover how a JDBC connection works, and common issues with your database connection.
Why the Kafka JDBC Connector?

When it comes to streaming database events into Apache Kafka®, the JDBC connector usually represents the first choice for its flexibility and the ability to support a wide variety of databases without requiring custom code. As an experienced data analyst, Francesco Tisiot (Senior Developer Advocate, Aiven) delves into his experience of streaming Kafka data pipeline with JDBC source connector and explains what could go wrong. He discusses alternative options available to avoid these problems, including the Debezium source connector for real-time change data capture.

The JDBC connector is a Java API for Kafka Connect, which streams data between databases and Kafka. If you want to stream data from a rational database into Kafka, once per day or every two hours, the JDBC connector is a simple, batch processing connector to use. You can tell the JDBC connector which query you’d like to execute against the database, and then the connector will take the data into Kafka.

The connector works well with out-of-the-box basic data types, however, when it comes to a database-specific data type, such as geometrical columns and array columns in PostgresSQL, these don’t represent well with the JDBC connector. Perhaps, you might not have any results in Kafka because the column is not within the connector’s supporting capability. Francesco shares other cases that would cause the JDBC connector to go wrong, such as:

  • Infrequent snapshot times
  • Out-of-order events
  • Non-incremental sequences
  • Hard deletes

To help avoid these problems and set up a reliable source of events for your real-time streaming pipeline, Francesco suggests other approaches, such as the Debezium source connector for real-time change data capture. The Debezium connector has enhanced metadata, timestamps of the operation, access to all logs, and provides sequence numbers for you to speak the language of a DBA.

They also talk about the governance tool, which Francesco has been building, and how streaming Game of Thrones sentiment analysis with Kafka started his current role as a developer advocate.

255 episodes