Data-Trace Types for Distributed Stream Processing Systems
Distributed architectures for efficient processing of streaming data are increasingly critical to modern information processing systems. The goal of this paper is to develop type-based programming abstractions that facilitate correct and efficient deployment of a logical specification of the desired computation on such architectures. In the proposed model, each communication link has an associated type specifying tagged data items along with a dependency relation over tags that captures the logical partial ordering constraints over data items. The semantics of a (distributed) stream processing system is then a function from input data traces to output data traces, where a data trace is an equivalence class of sequences of data items induced by the dependency relation. This data-trace transduction model generalizes both acyclic synchronous data-flow and relational query processors, and can specify computations over data streams with a rich variety of partial ordering and synchronization characteristics. We then describe a set of programming templates for data-trace transductions: abstractions corresponding to common stream processing tasks. Our system automatically maps these high-level programs to a given topology on the distributed implementation platform Apache Storm while preserving the semantics. Our experimental evaluation shows that (1) while automatic parallelization deployed by existing systems may not preserve semantics, particularly when the computation is sensitive to the ordering of data items, our programming abstractions allow a natural specification of the query that contains a mix of ordering constraints while guaranteeing correct deployment, and (2) the throughput of the automatically compiled distributed code is comparable to that of hand-crafted distributed implementations.
Tue 25 JunDisplayed time zone: Tijuana, Baja California change
10:00 - 11:00 | |||
10:00 20mTalk | ILC: A Calculus for Composable, Computational Cryptography PLDI Research Papers | ||
10:20 20mTalk | Proving Differential Privacy with Shadow Execution PLDI Research Papers Yuxin Wang , Zeyu Ding Pennsylvania State University, USA, Guanhong Wang Pennsylvania State University, USA, Daniel Kifer Dept. of Computer Science and Engineering, Penn State University, Danfeng Zhang Pennsylvania State University Media Attached | ||
10:40 20mTalk | Data-Trace Types for Distributed Stream Processing Systems PLDI Research Papers Konstantinos Mamouras University of Pennsylvania, Caleb Stanford University of Pennsylvania, Rajeev Alur University of Pennsylvania, Zachary G. Ives University of Pennsylvania, Val Tannen University of Pennsylvania, USA Media Attached |