Setting Up a Confluent Kafka Docker Cluster Tutorial

1. Install Docker and Docker Compose


  wget
  https://raw.githubusercontent.com/confluentinc/cp-all-in-one/7.5.3-post/cp-all-in-one-kraft/docker-compose.yml

  docker-compose up -d

2. Verify Kafka Cluster

It will take a few minutes for all the components to spin up. It should look like this when complete.


  Creating broker ... done

  Creating schema-registry ... done

  Creating rest-proxy ... done

  Creating connect ... done

  Creating ksqldb-server ... done

  Creating control-center ... done

  Creating ksql-datagen ... done

  Creating ksqldb-cli ... done

Next open your web browser and navigate to the Confluent Control Center UI: http://localhost:9021/

From there you can use the UI to create topics, connectors, etc

3. Additional Configuration

Typically you won't be settings up all the cluster services for your development. It is good to disable some of the services in the yaml file. Keep a watch on the "depends on" section so that you don't turn off dependent services.

The only required services if you need topics and the control UI are "broker" and "command-center". You'll notice the "control-center" service has several dependencies but if you only need "broker" the UI will work but the page for those services will report they are unable to use the connect or other service. No need to worry about that./p>

4. Confluent Services

broker: same as standalone kafka, service that supports topics
schema-registry: schema registry that is available via api and wired up to the UI
connect: connect cluster is used for executing source and sync connectors
control-center: UI for managing topics, connectors, ksql, etc
ksqldb-server: Server for operating the kqldb, storing queries, watermarks, etc
ksqldb-cli: CLI for ksql db that serves as a REPL for interacting with ksql
ksql-datagen: Datagen is a service that can generate data for testing purposes
rest-proxy: A rest api for interacting with the confluent kafka platform

5. Why not use open source kafka?

Simple, if you are an enterprise then you use Confluent, period. You get enterprise support for all the services, peace of mind. Experts on-call to help you with anything in the stack. I've used it and their enterprise support is outstanding.

The Confluent Command center allow non-developers to easily create/modify topics and connectors. Confluent Connectors like "JDBC Source" is a long-running service that keeps a watermark of the last timestamp and ensures that you get all the records immediately. We tested this and have end-to-end (SQL Server to DynamoDB) records in milliseconds. You also have the ability to provide a regex for the schema and tables, so you can easily ingest updates from multiple tables with one connector. You can also view messages and schema in the Control Center UI which simplifies and speeds up integration testing.

Additional Resources

Streaming from Kafka with PySpark