New Course by Dave Wells and Kevin Petrie

Streaming data is data that continuously flows from sources such as IoT devices, sensors, GPS devices, server and security logs, and clickstreams from mobile apps and websites—typically high-volume data moving at high speed. The analytics opportunities with IoT and application data streams are abundant, but the value of streaming technology is not limited to native data streams. In today’s fast paced business world, the need for fast data is pervasive and tacit acceptance of high-latency data is rapidly diminishing.

Streaming as an alternative to batch ETL is a practical way to meet the demand for fast data. Change Data Capture (CDC) is a category of technology that captures data about changes made to a database – inserts, updates, and deletes – and makes that data available to downstream processing such as data pipelines that flow to data warehouses and data lakes. CDC can be combined with streaming to accelerate data flow and reduce data latency.

Apache Kafka is a widely adopted open source technology for stream processing. It is an open source, distributed streaming platform that is used to move high volumes of data in real time. Building data pipelines with Kafka requires knowledge of Kafka architecture, components, and processes. You’ll need to know the actions and responsibilities of data producers and of data consumers, as well as the capabilities for cluster management, data connections, and APIs. Integrating Kafka or other streaming technologies into your data ecosystem is an important consideration.

You will learn:

  • The business and technical drivers for streaming data adoption
  • Data pipeline processing patterns and the advanced patterns that are possible with streaming
  • How microservices and edge computing work with IoT data streams
  • Use case patterns and a variety of use cases for streaming data
  • Five kinds of Change Data Capture (CDC) and the strengths and weaknesses of each
  • The concept and applications of streaming first architecture
  • Kafka architecture and essential components
  • Kafka data and process flow
  • The roles and functions of Kafka broker, data producers, and data consumers
  • Cluster management, data connections, and APIs with Kafka
  • Integrating streaming into the data ecosystem

Click here to go back to all newsletters and the newsletter sign up form.