Data-Trace Types for Distributed Stream Processing Systems
Distributed architectures for efficient processing of streaming data are increasingly critical to modern information processing systems. The goal of this paper is to develop type-based programming abstractions that facilitate correct and efficient deployment of a logical specification of the desired computation on such architectures. In the proposed model, each communication link has an associated type specifying tagged data items along with a dependency relation over tags that captures the logical partial ordering constraints over data items. The semantics of a (distributed) stream processing system is then a function from input data traces to output data traces, where a data trace is an equivalence class of sequences of data items induced by the dependency relation. This data-trace transduction model generalizes both acyclic synchronous data-flow and relational query processors, and can specify computations over data streams with a rich variety of partial ordering and synchronization characteristics. We then describe a set of programming templates for data-trace transductions: abstractions corresponding to common stream processing tasks. Our system automatically maps these high-level programs to a given topology on the distributed implementation platform Apache Storm while preserving the semantics. Our experimental evaluation shows that (1) while automatic parallelization deployed by existing systems may not preserve semantics, particularly when the computation is sensitive to the ordering of data items, our programming abstractions allow a natural specification of the query that contains a mix of ordering constraints while guaranteeing correct deployment, and (2) the throughput of the automatically compiled distributed code is comparable to that of hand-crafted distributed implementations.
Tue 25 Jun
|10:00 - 10:20|
|10:20 - 10:40|
Yuxin Wang, Zeyu DingPennsylvania State University, USA, Guanhong WangPennsylvania State University, USA, Daniel KiferDept. of Computer Science and Engineering, Penn State University, Danfeng ZhangPennsylvania State UniversityMedia Attached
|10:40 - 11:00|
Konstantinos MamourasUniversity of Pennsylvania, Caleb StanfordUniversity of Pennsylvania, Rajeev AlurUniversity of Pennsylvania, Zachary G. IvesUniversity of Pennsylvania, Val TannenUniversity of Pennsylvania, USAMedia Attached