A Kafka Introduction

  • Comments posted to this topic are about the item A Kafka Introduction

  • I've used AWS SQS (Simple Queue Service) and GCP Pub-Sub.  My company places an emphasis on consuming services rather than administering them.

    If we were to use Kafka then we would probably use AWS MSK.  In a previous company I used Kafka and experimented with KQL to query the streamed data.  The danger was that people with tech decision making authority perceived it to be an event database rather than an event stream.

    You can retain data in a Kakfa queue for a configurable amount of time.  The problem with holding data for an extended period of time is that, in a disaster recovery scenario, it can take time for the data to become fully available as the data is distributed across the Kafka nodes.  Kafka really MUST NOT be perceived as a data base.

    In terms of using it it was fine.  Quite stable and reliable.  I'm not convinced that we needed the capabilities it offered.  The use case was more at the SQS /Pub-Sub level.

    To be honest, there are a lot of distributed systems (Airflow, Kafka, Kubernetes) that I believe are massive overkill for what they are actually used for, as opposed to the use case they were designed to cover.  That isn't to denigrate them in any way, they are popular for a reason.

  • Steve, I was just thinking yesterday that I want to know more about Kafka. It's leveraged by one of the application teams to replicate a subset of tables, columns, and most recent data from our 10 TB data warehouse to a much smaller 10 GB database optimized specifically for use by a mobile app they are developing.

    Is this introduction the beginning of a stairway or series? I'd like to hear Kafka explained from your DBA perspective.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I don't know if I'll write a lot on Kafka. The deck from the talk (https://pitch.com/v/apache-kafka-demystified-breaking-down-the-basics-that-conference-naq5ab/682c5ba5-2406-421b-94de-8105e22313ea), was interesting, and I was thinking to download a kafka container and see about having a producer with a few consumers and seeing how well it can move data to multiple locations and how it works.

    From a DBA perspective, I don't know how much I can (or want to) dive into how the retention and various sizing parameters would impact your architecture. I know that this seems more like an multiple stream service broker, which starts to remind me of AGs or multiple repl subscribers where I need to manage disk as systems are online and off.

    If you want to experiment and write, Eric, would love to see some content in this area.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply