International Journal of Leading Research Publication

E-ISSN: 2582-8010     Impact Factor: 9.56

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 6 Issue 4 April 2025 Submit your research before last 3 days of to publish your research paper in the issue of April.

Stream Processing Internals and Usecases

Author(s) Arjun Reddy Lingala
Country United States
Abstract Batch processing is widely used concept in data warehousing where many companies build analytical solutions deriving insights into their systems and building new products based on the analysis based on various aspects of the systems. The exponential growth of real-time data sources like IoT sensors, social media has necessitated systems capable of processing unbounded data streams with low latency, high throughput, and guaranteed correctness. Unlike batch processing, stream process- ing engines must handle continuous data flows with dynamic arrival patterns, out-of-order events, and variable workloads. The problem with daily batch processes is that changes in the input are only reflected in the output a day later, which is too slow for some use cases. To reduce the delay, we can run the processing more frequently. In the batch processing world, the inputs and outputs of a job are files may be on distributed file system like HDFS [1] or Amazon S3 [2]. Stream processing has emerged as a critical computational model that enables real-time ingestion, transformation, and analysis of continuous data streams. This paper presents a comprehensive exploration of the necessity for stream processing, identifying its use cases over batch processing and its suitability for latency-sensitive applications such as financial trading, fraud detection, and Internet of Things (IoT) systems. We begin by establishing the fundamental motivation behind stream processing, outlining key challenges associated with real-time data analytics, including data velocity, system scalability, and fault tolerance. The discussion highlights the lim- itations of traditional batch processing frameworks like Apache Hadoop and their inability to efficiently handle continuous data flows. In contrast, we analyze how stream processing frameworks such as Apache Kafka [3], Apache Flink [4], Apache Storm [5], and Spark Streaming [6] address these challenges by enabling near real-time event-driven computations.
Keywords Stream Processing, Data warehouse, Windowing, Change Data Capture, Message Queues, Message Brokers, Kafka, Flink, Storm, Real-time
Field Engineering
Published In Volume 5, Issue 4, April 2024
Published On 2024-04-10
Cite This Stream Processing Internals and Usecases - Arjun Reddy Lingala - IJLRP Volume 5, Issue 4, April 2024. DOI 10.5281/zenodo.14945841
DOI https://doi.org/10.5281/zenodo.14945841
Short DOI https://doi.org/g86pfm

Share this