Streaming Kafka Messages to Google Cloud Pub/Sub with StreamSets

Rishi Jain
3 min readOct 20, 2020

--

Whether your Kafka is provisioned in the Cloud or on-premise, you might want to push to a subset of Pub/Sub topics. Why? For the flexibility of having Pub/Sub as your GCP event notifier or to use topics to trigger Cloud Functions. Or maybe your organization has the plan to migrate from apache Kafka to managed google cloud pub/sub.

So how do you exchange messages between Kafka and Pub/Sub? Are you exploring the option for Kafka connector?

This is where the StreamSets Data Collector comes in handy. In this post, you will learn the basic steps to start working with DataCollector to connect with your Kafka instance to Cloud pub/sub and exchange event messages between the two services.

What is StreamSets Data collector?

StreamSets Data Collector is an easy-to-use modern execution engine for fast data ingestion and light transformations that can be used by anyone. you can design pipelines for streaming, batch and change data capture (CDC) in minutes. Please Download and read through get started with the Quick Start Guide.

In this lab, you learn how to:

  • Initiate Apache Kafka instance locally
  • Configure a StreamsSets DataCollectot to integrate with Pub/Sub & kafka
  • Setup topics and subscriptions for message communication
  • Perform basic testing of both Kafka and Cloud Pub/Sub services

I have spun up Apache Kafka on my local MacBook and created a topic test.

Step 1: Download the latest Kafka release and extract it:

*** START KAFKA ENV ***$ tar -xzf kafka_2.13-2.6.0.tgz
$ cd kafka_2.13-2.6.0
$ bin/zookeeper-server-start.sh config/zookeeper.properties
$ bin/kafka-server-start.sh config/server.properties
*** CREATE TOPIC ****$bin/kafka-topics.sh --create --topic test --bootstrap-server localhost:9092
Created topic test.
****Publish some message to kafka **********$bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
This is my first event
This is my second event

Step 2: Access the GCP and create your topic

  1. Go to the Pub/Sub topics page in the Cloud Console.
  2. Go to the Pub/Sub topics page
  3. Click Create a topic.

provide a unique topic name, for example, rishi and save.

Step 3: Now we have both endpoints up and running let’s create a streamsets pipeline.

Checkout the Kafka origin config, you can tweak this configuration as per your environment.

You can get this example pipeline from here:

https://github.com/rishi871/pipeline.git

Note: you need to provide your GCP credentials under pub/sub stage.

Once you run this pipeline it will consume two messages from Apache Kafka and push it to Google cloud pub/sub.

To verify: Go to GCP console of Pub/Sub and click on view message

Conclusion: In this example, we have sent a stream of information between the two services. However, you can stream message to google pub/sub from a wide range of other services as well.

--

--

Rishi Jain
Rishi Jain

Written by Rishi Jain

Software Support Engineer @StreamSets | Hadoop | DataOps | RHCA | Ex-RedHatter | Ex-Cloudera

No responses yet