Streaming data is data that continuously flows from sources such as IoT devices, sensors, GPS devices, server and security logs, and clickstreams from mobile apps and websites—typically high-volume data moving at high speed. The analytics opportunities with IoT and application data streams are abundant, but the value of streaming technology is not limited to native data streams. In today’s fast paced business world, the need for fast data is pervasive and tacit acceptance of high-latency data is rapidly diminishing. Streaming as an alternative to batch ETL is a practical way to meet the demand for fast data.
Change Data Capture (CDC) is a category of technology that captures data about changes made to a database – inserts, updates, and deletes – and makes that data available to downstream processing such as data pipelines that flow to data warehouses and data lakes. CDC can be combined with streaming to accelerate data flow and reduce data latency.
Apache Kafka is a widely adopted open source technology for stream processing. It is an open source, distributed streaming platform that is used to move high volumes of data in real time. Building data pipelines with Kafka requires knowledge of Kafka architecture, components, and processes. You’ll need to know the actions and responsibilities of data producers and of data consumers, as well as the capabilities for cluster management, data connections, and APIs. Integrating Kafka or other streaming technologies into your data ecosystem is an important consideration.