Trill: A High-Performance Incremental Query Processor for Diverse Analytics

Badrish Chandramouli,J. Goldstein,Mike Barnett,R. Deline,John C. Platt,James F. Terwilliger,J. Wernsing

Published 2014 in Proceedings of the VLDB Endowment

ABSTRACT

This paper introduces Trill -- a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model : Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration : Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance : Trill's throughput is high across the latency spectrum. For streaming data, Trill's throughput is 2-4 orders of magnitude higher than comparable streaming engines. For offline relational queries, Trill's throughput is comparable to a major modern commercial columnar DBMS. Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this paper, we describe Trill's new design and architecture, and report experimental results that demonstrate Trill's high performance across diverse analytics scenarios. We also describe how Trill's ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.

PUBLICATION RECORD

Publication year
2014
Venue
Proceedings of the VLDB Endowment
Publication date
2014-12-01
Fields of study
Computer Science
Identifiers
DOI 10.14778/2735496.2735503
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Orleans: Distributed Virtual Actors for Programmability and Scalability
2014cited by this paper
The Trill Incremental Analytics Engine
2014influential reference
S-Store: A Streaming NewSQL System for Big Velocity Applications
2014cited by this paper
Naiad: a timely dataflow system
2013influential reference
REEF: Retainable Evaluator Execution Framework
2013cited by this paper
Stat!: an interactive analytics environment for big data
2013cited by this paper
Scalable Progressive Analytics on Big Data in the Cloud
2013cited by this paper
How to Fit when No One Size Fits
2013cited by this paper
Discretized streams: fault-tolerant streaming computation at scale
2013cited by this paper
Enhanced stream processing in a DBMS kernel
2013cited by this paper
Enhancements to SQL server column stores
2013cited by this paper
MillWheel: Fault-Tolerant Stream Processing at Internet Scale
2013cited by this paper
Shark: fast data analysis using coarse-grained distributed memory
2012influential reference
Temporal Analytics on Big Data for Web Advertising
2012cited by this paper
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems
2012cited by this paper
Accurate latency estimation in a distributed event processing system
2011cited by this paper
Phoenix++: modular MapReduce for shared-memory systems
2011cited by this paper
Reactive Extensions for .NET
2010cited by this paper
A comparison of approaches to large-scale data analysis
2009cited by this paper
SCOPE: easy and efficient parallel processing of massive data sets
2008cited by this paper
Consistent Streaming Through Time: A Vision for Event Stream Processing
2006cited by this paper
MonetDB/X100: Hyper-Pipelining Query Execution
2005cited by this paper
The Design of the Borealis Stream Processing Engine
2005cited by this paper
C-Store: A Column-oriented DBMS
2005cited by this paper
Semantics of Data Streams and Operators
2005cited by this paper
Nile: a query processing engine for data streams
2004cited by this paper
Models and issues in data stream systems
2002cited by this paper
Temporal specialization
1992cited by this paper
Fast Pattern Matching in Strings
1977cited by this paper

CITED BY

Keyed watermarks: A fine-grained watermark generation for Apache Flink
2025cites this paper
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First
2025cites this paper
SieveJoin: Boosting Multi-way Joins by Filtering Unneeded Intermediate Results
2025cites this paper
Understanding the limitations of pubsub systems
2025cites this paper
Workload-aware approximate backup to reduce fault-tolerant overhead for stream processing applications
2025cites this paper
IDAT: An Interactive Data Exploration Tool
2025cites this paper
GRELA: Exploiting graph representation learning in effective approximate query processing
2025cites this paper
Would Microsoft Azure Stream Analytics Be a Suitable Foundation for an Event Processing Network Model?
2025influential citation
MVLevelDB+: Meeting Relative Consistency Requirements of Temporal Queries in Sensor Stream Databases
2024cites this paper
Continual Observation of Joins under Differential Privacy
2024cites this paper
Optimising Queries for Pattern Detection Over Large Scale Temporally Evolving Graphs
2024cites this paper
Geo-Distributed Analytical Streaming Architecture for IoT Platforms
2024cites this paper
Kairos: Enabling Prompt Monitoring of Information Diffusion Over Temporal Networks
2024cites this paper
TiLT: A Time-Centric Approach for Stream Query Optimization and Parallelization
2023cites this paper
Relation-Based In-Database Stream Processing
2023cites this paper
Change Propagation Without Joins
2023cites this paper
Exploiting Structure in Regular Expression Queries
2023cites this paper
Survey of window types for aggregation in stream processing systems
2023cites this paper
Dirigo: Self-scaling Stateful Actors For Serverless Real-time Data Processing
2023cites this paper
F-IVM: analytics over relational databases under updates
2023cites this paper
Desis: Efficient Window Aggregation in Decentralized Networks
2023cites this paper
DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps
2023cites this paper
High-Performance Row Pattern Recognition Using Joins
2023cites this paper
Throughput Optimization with a NUMA-Aware Runtime System for Efficient Scientific Data Streaming
2023cites this paper
T-Rex: Optimizing Pattern Search on Time Series
2023cites this paper
A new window Clause for SQL++
2023cites this paper
Bounding substreams in distributed stream processing
2023cites this paper
Substream management in distributed streaming dataflows
2022cites this paper
Darwin: Scale-In Stream Processing
2022cites this paper
What Do We Mean When We Say “Insight”? A Formal Synthesis of Existing Theory
2022cites this paper
Soft real-time data processing solutions in measurement systems on example of small-scale GEM based x-ray spectrometer
2022cites this paper
Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees
2022cites this paper
Exploring Query Processing on CPU-GPU Integrated Edge Device
2022cites this paper
Correctness in Stream Processing: Challenges and Opportunities
2022cites this paper
Computing Complex Temporal Join Queries Efficiently
2022cites this paper
Cross-Domain Transfer Learning for Demand Forecasting: Using Social Media Sentiment from Related Industries
2022cites this paper
Targeting a light-weight and multi-channel approach for distributed stream processing
2022cites this paper
How to validate Machine Learning Models Prior to Deployment: Silent trial protocol for evaluation of real-time models at ICU
2022cites this paper
Dalton: Learned Partitioning for Distributed Data Streams
2022cites this paper
Meces: Latency-efficient Rescaling via Prioritized State Migration for Stateful Distributed Stream Processing Systems
2022cites this paper
Structured, unstructured, and diverse databases
2022cites this paper
Algorithms for Windowed Aggregations and Joins on Distributed Stream Processing Systems
2022cites this paper
A survey on transactional stream processing
2022cites this paper
Fine-Grained Multi-Query Stream Processing on Integrated Architectures
2021cites this paper
Babelfish: Efficient Execution of Polyglot Queries
2021cites this paper
Scabbard: Single-Node Fault-Tolerant Stream Processing
2021cites this paper
The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward
2021cites this paper
Cquirrel: Continuous Query Processing over Acyclic Relational Schemas
2021cites this paper
Watermarks in Stream Processing Systems: Semantics and Comparative Analysis of Apache Flink and Google Cloud Dataflow
2021cites this paper
End-to-End Data Pipeline in Games for Real-Time Data Analytics
2021cites this paper
Resource-efficient Shared Query Execution via Exploiting Time Slackness
2021cites this paper
Stream processing with dependency-guided synchronization
2021influential citation
Klink: Progress-Aware Scheduling for Streaming Data Systems
2021cites this paper
Scotty
2021cites this paper
Improving Performance of Data Extracts Using Window-Based Refresh Strategies
2021cites this paper
Platform for Situated Intelligence
2021cites this paper
Real-time Data Infrastructure at Uber
2021cites this paper
Distributed Stream KNN Join
2021cites this paper
Hazelcast Jet: Low-latency Stream Processing at the 99.99th Percentile
2021cites this paper
Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka
2021influential citation
Optimization of Threshold Functions over Streams
2021cites this paper
LifeStream: a high-performance stream processing engine for periodic streams
2021influential citation
A Network Traffic Processing Library for ICS Anomaly Detection
2021cites this paper
StreamQL: a query language for processing streaming time series
2020influential citation
Approximating Aggregated SQL Queries with LSTM Networks
2020cites this paper
LifeStream: A High-performance Stream Processing Engine for Waveform Data
2020influential citation
Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo
2020influential citation
InferLine: latency-aware provisioning and scaling for prediction serving pipelines
2020cites this paper
Scotty: General and Efficient Open-Source Window Aggregation for Stream Processing Systems
2020cites this paper
AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing
2020cites this paper
WFApprox: Approximate Window Functions Processing
2020cites this paper
Leveraging Watermarks to Improve Performance of Streaming Systems
2020cites this paper
Railgun: streaming windows for mission critical systems
2020cites this paper
Factor Windows: Cost-based Query Rewriting for Optimizing Correlated Window Aggregates
2020influential citation
Meet me halfway
2020influential citation
Hardware-Conscious Stream Processing
2020influential citation
Helios
2020cites this paper
Concurrent Prefix Recovery
2020cites this paper
Cost-based Query Rewriting Techniques for Optimizing Aggregates Over Correlated Windows
2020influential citation
Marabunta: Continuous Distributed Processing of Skewed Streams
2020cites this paper
Achieving High Throughput and Elasticity in a Larger-than-Memory Store
2020cites this paper
FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU Integrated Architectures
2020cites this paper
Handling Highly Contended OLTP Workloads Using Fast Dynamic Partitioning
2020cites this paper
Performance Optimizations and Operator Semantics for Streaming Data Flow Programs
2020cites this paper
Spur: Mitigating Slow Instances in Large-Scale Streaming Pipelines
2020cites this paper
SPEAr: Expediting Stream Processing with Accuracy Guarantees
2020cites this paper
A survey on the evolution of stream processing systems
2020cites this paper
Grizzly: Efficient Stream Processing Through Adaptive Query Compilation
2020influential citation
LightSaber: Efficient Window Aggregation on Multi-core Processors
2020cites this paper
Maintaining Acyclic Foreign-Key Joins under Updates
2020cites this paper
Ad-Hoc stream query processing
2020cites this paper
Thrifty Query Execution via Incrementability
2020cites this paper
23rd International Conference on Database Theory, ICDT 2020, Copenhagen, Denmark, March 30 - April 2, 2020
2020cites this paper
On the Expressiveness of Languages for Complex Event Recognition
2020cites this paper
NebulaStream: Complex Analytics Beyond the Cloud
2020cites this paper
Ananke: A Streaming Framework for Live Forward Provenance
2020cites this paper
Advancing Analytical Database Systems
2020cites this paper
Scalable processing of aggregate functions for data streams in resource-constrained environments
2019cites this paper
Engineering for a Science-Centric Experimentation Platform
2019cites this paper
Approximate Quantiles for Datacenter Telemetry Monitoring
2019influential citation