Storyboard: Optimizing Precomputed Summaries for Aggregation

Edward Gan,Peter Bailis,M. Charikar

Published 2020 in arXiv.org

ABSTRACT

An emerging class of data systems partition their data and precompute approximate summaries (i.e., sketches and samples) for each segment to reduce query costs. They can then aggregate and combine the segment summaries to estimate results without scanning the raw data. However, given limited storage space each summary introduces approximation errors that affect query accuracy. For instance, systems that use existing mergeable summaries cannot reduce query error below the error of an individual precomputed summary. We introduce Storyboard, a query system that optimizes item frequency and quantile summaries for accuracy when aggregating over multiple segments. Compared to conventional mergeable summaries, Storyboard leverages additional memory available for summary construction and aggregation to derive a more precise combined result. This reduces error by up to 25x over interval aggregations and 4.4x over data cube aggregations on industrial datasets compared to standard summarization methods, with provable worst-case error guarantees.

PUBLICATION RECORD

  • Publication year

    2020

  • Venue

    arXiv.org

  • Publication date

    2020-02-08

  • Fields of study

    Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-45 of 45 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1