Elastic Scaling for Data Stream Processing

B. Gedik,S. Schneider,Martin Hirzel,Kun-Lung Wu

Published 2014 in IEEE Transactions on Parallel and Distributed Systems

ABSTRACT

This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitioning, in order to achieve scale. In order to make auto-parallelization effective in practice, the profitability question needs to be answered: How many parallel channels provide the best throughput? The answer to this question changes depending on the workload dynamics and resource availability at run-time. In this article, we propose an elastic auto-parallelization solution that can dynamically adjust the number of channels used to achieve high throughput without unnecessarily wasting resources. Most importantly, our solution can handle partitioned stateful operators via run-time state migration, which is fully transparent to the application developers. We provide an implementation and evaluation of the system on an industrial-strength data stream processing platform to validate our solution.

PUBLICATION RECORD

Publication year
2014
Venue
IEEE Transactions on Parallel and Distributed Systems
Publication date
2014-06-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/TPDS.2013.295
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A catalog of stream processing optimizations
2014cited by this paper
IBM Streams Processing Language: Analyzing Big Data in motion
2013influential reference
Testing properties of dataflow program operators
2013cited by this paper
Auto-parallelizing stateful distributed streaming applications
2012cited by this paper
Parallelizing stateful operators in a distributed stream processing system: how, should you and how much?
2012cited by this paper
Processing high data rate streams in System S
2011cited by this paper
Active Replication at (Almost) No Cost
2011cited by this paper
Hyracks: A flexible and extensible foundation for data-intensive computing
2011cited by this paper
Flood: elastic streaming MapReduce
2010cited by this paper
Massively parallel data analysis with PACTs on Nephele
2010cited by this paper
MapReduce: a flexible data processing tool
2010cited by this paper
Feedback-directed pipeline parallelism
2010cited by this paper
Elastic scaling of data parallel operators in stream processing
2009cited by this paper
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
2009cited by this paper
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
2006cited by this paper
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core
2006cited by this paper
Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options
2006cited by this paper
The Design of the Borealis Stream Processing Engine
2005cited by this paper
The 8 requirements of real-time stream processing
2005cited by this paper
Feedback Control of Computing Systems
2004cited by this paper
STREAM: The Stanford Stream Data Manager
2003cited by this paper
Flux: an adaptive partitioning operator for continuous query systems
2003cited by this paper
Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources
2003cited by this paper
SEDA: an architecture for well-conditioned, scalable internet services
2001cited by this paper
Eddies: continuously adaptive query processing
2000cited by this paper
Process migration
1999cited by this paper
The Anatomy of a Large-Scale Hypertextual Web Search Engine
1998cited by this paper
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web
1997influential reference

CITED BY

Trident: Adaptive Scheduling for Heterogeneous Multimodal Data Pipelines
2026cites this paper
Towards a Proactive Autoscaling Framework for Data Stream Processing at the Edge using GRU and Transfer Learning
2025cites this paper
An elastic reconfiguration strategy for operators in distributed stream computing systems
2025cites this paper
ScalaSSC: Scalable Stateful Serverless Computing for Stream Processing Applications
2025cites this paper
Towards Fine-Grained Scalability for Stateful Stream Processing Systems
2025cites this paper
DesFaaS: Cross-Layer Joint Dynamic Deployment System for Serverless Stateful Functions
2025cites this paper
Justin: Hybrid CPU/Memory Elastic Scaling for Distributed Stream Processing
2025cites this paper
Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS Targets
2024cites this paper
Enhancing self-adaptation for efficient decision-making at run-time in streaming applications on multicores
2024cites this paper
Lc‐Stream: An elastic scheduling strategy with latency constraints in geo‐distributed stream computing environments
2024cites this paper
A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture
2024cites this paper
VideoJam: Self-Balancing Architecture for Live Video Analytics
2024cites this paper
Online Nonstop Task Management for Storm-Based Distributed Stream Processing Engines
2024cites this paper
Evaluating Stream Processing Autoscalers
2024cites this paper
Online Analytics with Local Operator Rebinding for Simulation Data Stream Processing
2024cites this paper
Adaptive key partitioning in distributed stream processing
2024cites this paper
Hierarchical Auto-scaling Policies for Data Stream Processing on Heterogeneous Resources
2023cites this paper
An empirical analysis of stateful operator migration for online scheduling in distributed stream processing systems
2023cites this paper
On Improving Streaming System Autoscaler Behaviour using Windowing and Weighting Methods
2023cites this paper
Revisiting self-adaptation for efficient decision-making at run-time in parallel executions
2023cites this paper
Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems
2023cites this paper
Stateful Adaptive Streams with Approximate Computing and Elastic Scaling
2023cites this paper
Micro-batch and data frequency for stream processing on multi-cores
2023cites this paper
Practical Storage-Compute Elasticity for Stream Data Processing
2023cites this paper
ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems
2023influential citation
Dirigo: Self-scaling Stateful Actors For Serverless Real-time Data Processing
2023cites this paper
ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling
2023cites this paper
A predictive model for Stream Processing System that dynamically calibrates the number of operator replicas
2022cites this paper
A Low-Load Distributed Stream Processing System for Continuous Conjunctive Normal Form Queries
2022cites this paper
Achieving multilevel elasticity for distributed stream processing systems in the cloud environment: A review and conceptual framework
2022influential citation
A Preliminary Fuzzy Markup Language based Approach for the Queue Buffer Size Optimization in Fog Nodes for Stream Processing
2022cites this paper
Hyperion: A Generic and Distributed Mobile Offloading Framework on OpenCL
2022cites this paper
Toward optimal operator parallelism for stream processing topology with limited buffers
2022cites this paper
Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads
2022cites this paper
Runtime Adaptation of Data Stream Processing Systems: The State of the Art
2022influential citation
A state lossless scheduling strategy in distributed stream computing systems
2022cites this paper
Jarvis: Large-scale Server Monitoring with Adaptive Near-data Processing
2022cites this paper
To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream Processing
2022cites this paper
Elastic Pulsar Functions for Distributed Stream Processing
2021cites this paper
SpecK: Composition of Stream Processing Applications over Fog Environments
2021cites this paper
Fine-Grained Multi-Query Stream Processing on Integrated Architectures
2021cites this paper
Providing high‐level self‐adaptive abstractions for stream parallelism on multicores
2021cites this paper
Heterogeneity-aware elastic scaling of streaming applications on cloud platforms
2021cites this paper
Towards On-the-fly Self-Adaptation of Stream Parallel Patterns
2021cites this paper
AuTraScale: An Automated and Transfer Learning Solution for Streaming System Auto-Scaling
2021cites this paper
Approxate: Stateful Functions for Approximate Stream Processing - Extended Abstract
2021cites this paper
Throughput prediction based on ExtraTree for stream processing tasks
2021cites this paper
STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing
2021cites this paper
Self‐adaptation on parallel stream processing: A systematic review
2021cites this paper
MEAD: Model-Based Vertical Auto-Scaling for Data Stream Processing
2021cites this paper
Evaluation of Load Prediction Techniques for Distributed Stream Processing
2021cites this paper
Run-time adaptation of stream processing spanning the cloud and the edge
2021cites this paper
AI‐Driven Performance Management in Data‐Intensive Applications
2021cites this paper
Klink: Progress-Aware Scheduling for Streaming Data Systems
2021cites this paper
Scaleplus: Towards Fast Scaling of Distributed Streaming Dataflows
2020cites this paper
A Review of Dynamic Scalability and Dynamic Scheduling in Cloud-Native Distributed Stream Processing Systems
2020cites this paper
Scheduling Solutions for Data Stream Processing Applications on Cloud-Edge Infrastructure. (Solutions de planification pour les applications de traitement de flux de données sur une infrastructure Cloud-Edge)
2020cites this paper
Q-Flink: A QoS-Aware Controller for Apache Flink
2020cites this paper
Group Mutual Exclusion to Scale Distributed Stream Processing Pipelines
2020cites this paper
Graceful Performance Degradation in Apache Storm
2020cites this paper
Predictive topology refinements in distributed stream processing system
2020cites this paper
VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware
2020cites this paper
KungFu: Making Training in Distributed Machine Learning Adaptive
2020cites this paper
Scalable Joint Optimization of Placement and Parallelism of Data Stream Processing Applications on Cloud-Edge Infrastructure
2020cites this paper
An Optimal Model for Optimizing the Placement and Parallelism of Data Stream Processing Applications on Cloud-Edge Computing
2020cites this paper
Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo
2020cites this paper
FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU Integrated Architectures
2020cites this paper
A survey on the evolution of stream processing systems
2020influential citation
Scalable Decentralized Indexing and Querying of Multi-Streams in the Fog
2020cites this paper
EdgeScaler: effective elastic scaling for graph stream processing systems
2020influential citation
Resource Management and Scheduling in Distributed Stream Processing Systems
2020cites this paper
VirtualFlow: Decoupling Deep Learning Model Execution from Underlying Hardware
2020cites this paper
Performance Modeling and Vertical Autoscaling of Stream Joins
2020influential citation
Ad-Hoc stream query processing
2020cites this paper
A scheduling algorithm to maximize storm throughput in heterogeneous cluster
2020cites this paper
Dynamic redirection of real-time data streams for elastic stream computing
2020cites this paper
Joker: Elastic stream processing with organic adaptation
2020influential citation
Model-based auto-scaling of distributed data stream processing applications
2020cites this paper
A Fully Decentralized Autoscaling Algorithm for Stream Processing Applications
2019cites this paper
A Network-aware and Partition-based Resource Management Scheme for Data Stream Processing
2019cites this paper
Adaptive Partitioning and Order-Preserved Merging of Data Streams
2019cites this paper
Decentralized Scaling for Stream Processing Engines
2019cites this paper
Self-Adaptive Data Stream Processing in Geo-Distributed Computing Environments
2019cites this paper
Shuffle Grouping Fields Grouping Spout Count Count Splitter Metrics Manager Stream Manager Container 2 Spout Count Count Splitter Stream Manager Stream Manager Splitter Splitter Stream Manager Stream Manager Count Count Count Count
2019cites this paper
Reinforcement Learning Based Policies for Elastic Stream Processing on Heterogeneous Resources
2019cites this paper
Caladrius: A Performance Modelling Service for Distributed Stream Processing Systems
2019cites this paper
Reliable stream data processing for elastic distributed stream processing systems
2019cites this paper
A Comprehensive Survey on Parallelization and Elasticity in Stream Processing 1 : 3 apply methods from control theory to adapt the parallelization degree of the operators
2019cites this paper
Multi-Level Elasticity for Data Stream Processing
2019cites this paper
An Adaptive Online Scheme for Scheduling and Resource Enforcement in Storm
2019cites this paper
The Elastic Processing of Data Streams in Cloud Environments: A Systematic Mapping Study
2019cites this paper
Elasticity
2019cites this paper
Efficient Operator Placement for Distributed Data Stream Processing Applications
2019cites this paper
High availability of data using Automatic Selection Algorithm (ASA) in distributed stream processing systems
2019cites this paper
BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures
2019cites this paper
Transactions on Large-Scale Data- and Knowledge-Centered Systems XL
2019influential citation
State and runtime-aware scheduling in elastic stream computing systems
2019cites this paper
SLA-Based Adaptation Schemes in Distributed Stream Processing Engines
2019cites this paper
DABS-Storm: A Data-Aware Approach for Elastic Stream Processing
2019cites this paper
A Holistic Abstraction to Ensure Trusted Scaling and Memory Speed Trusted Analytics
2019cites this paper