Get started with 60 Hours of Free course right now 

Spark Streaming with Kafka

Suraj Ghimire
40 Posts

Lets use covid19 dataset to build a small poc on Spark steaming with Kafka.

 

Steps:

  1. Download Kafka latest from Internet https://www.apache.org/dyn/closer.cgi?path=/kafka/2.5.0/kafka_2.13-2.5.0.tgz
  2. Extract to a folder say D:\kafka_2.13-2.5.0
  3. Navigate to below path D:\kafka_2.13-2.5.0\kafka_2.13-2.5.0\bin\windows
  4. copy  D:\kafka_2.13-2.5.0\kafka_2.13-2.5.0/conf/zookeeper.properties to kafka/bin/windows
  5. copy  D:\kafka_2.13-2.5.0\kafka_2.13-2.5.0/conf/server.properties to kafka/bin/windows
  6. start zookeeper
    $ zookeeper-server-start.bat zookeeper.properties

  7. start Kafka Server
    $ kafka-server-start server.properties

  8. Create Kafka Topic
    $ kafka-topics.bat --zookeeper localhost:2181 --create --topic covid19india --partitions 2 --replication-factor 1
  9. Create DB Table to Store Data.

    CREATE TABLE `covid19india` (
      `sno` varchar(100) DEFAULT NULL,
      `date_of_identification` varchar(100) DEFAULT NULL,
      `current_status` varchar(100) DEFAULT NULL,
      `state` varchar(100) DEFAULT NULL,
      `num_of_cases` int(11) DEFAULT NULL
    )
  10.  Run Kafka Consumer Code

  11. Run Kafka Producer Code
  12. Check Your MySQK Database table getting Populated.
  13. Visualize it using Tableau.

Example usecase

https://www.dropbox.com/sh/c8l4uy57wisahub/AAAr9GQGtUb0pXzy5qykr9LZa?dl=0

Published By : Suraj Ghimire
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

Comments