Stream Processing Internals and Usecases

Arjun Reddy Lingala

doi:10.5281/zenodo.14945841

Stream Processing Internals and Usecases

Author(s)	Arjun Reddy Lingala
Country	United States
Abstract	Batch processing is widely used concept in data warehousing where many companies build analytical solutions deriving insights into their systems and building new products based on the analysis based on various aspects of the systems. The exponential growth of real-time data sources like IoT sensors, social media has necessitated systems capable of processing unbounded data streams with low latency, high throughput, and guaranteed correctness. Unlike batch processing, stream process- ing engines must handle continuous data flows with dynamic arrival patterns, out-of-order events, and variable workloads. The problem with daily batch processes is that changes in the input are only reflected in the output a day later, which is too slow for some use cases. To reduce the delay, we can run the processing more frequently. In the batch processing world, the inputs and outputs of a job are files may be on distributed file system like HDFS [1] or Amazon S3 [2]. Stream processing has emerged as a critical computational model that enables real-time ingestion, transformation, and analysis of continuous data streams. This paper presents a comprehensive exploration of the necessity for stream processing, identifying its use cases over batch processing and its suitability for latency-sensitive applications such as financial trading, fraud detection, and Internet of Things (IoT) systems. We begin by establishing the fundamental motivation behind stream processing, outlining key challenges associated with real-time data analytics, including data velocity, system scalability, and fault tolerance. The discussion highlights the lim- itations of traditional batch processing frameworks like Apache Hadoop and their inability to efficiently handle continuous data flows. In contrast, we analyze how stream processing frameworks such as Apache Kafka [3], Apache Flink [4], Apache Storm [5], and Spark Streaming [6] address these challenges by enabling near real-time event-driven computations.
Keywords	Stream Processing, Data warehouse, Windowing, Change Data Capture, Message Queues, Message Brokers, Kafka, Flink, Storm, Real-time
Field	Engineering
Published In	Volume 5, Issue 4, April 2024
Published On	2024-04-10
Cite This	Stream Processing Internals and Usecases - Arjun Reddy Lingala - IJLRP Volume 5, Issue 4, April 2024. DOI 10.5281/zenodo.14945841
DOI	https://doi.org/10.5281/zenodo.14945841
Short DOI	https://doi.org/g86pfm

View / Download PDF File

doi

CrossRef DOI is assigned to each research paper published in our journal.

IJLRP DOI prefix is
10.70528/IJLRP

Downloads

Research Paper Format Copyright Permission Form and Undertaking Form Cover Page Vol 5 Isu 12 Cover Page Vol 5 Isu 11 Cover Page Vol 5 Isu 10

All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.

CC-BY-SA

About IJLRP Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics Peer Review & Plagiarism	Join as a Reviewer Editors & Reviewers Reviewer Referral Program Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us		+91-9687-828-838	editor@ijlrp.com

International Journal of Leading Research Publication

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Stream Processing Internals and Usecases

Share this