Mon 24 Jun 2019 15:00 - 15:20 at 229AB - Synthesis Chair(s): Nuno P. Lopes

We present a way to combine techniques from the program synthesis and machine learning communities to extract structured information from heterogeneous data. Such problems arise in several situations such as extracting attributes from web pages, machine-generated emails, or from data obtained from multiple sources. Our goal is to extract a set of structured attributes from such data.

We use machine learning models ("ML models") such as conditional random fields to get an initial labeling of potential attribute values. However, such models are typically not interpretable, and the noise produced by such models is hard to manage or debug. We use (noisy) labels produced by such ML models as inputs to program synthesis, and generate interpretable programs that cover the input space. We also employ type specifications (called "field constraints") to certify well-formedness of extracted values. Using synthesized programs and field constraints, we re-train the ML models with improved confidence on the labels. We then use these improved labels to re-synthesize a better set of programs. We iterate the process of re-synthesizing the programs and re-training the ML models, and find that such an iterative process improves the quality of the extraction process. This iterative approach, called HDEF, is novel, not only the in way it combines the ML models with program synthesis, but also in the way it adapts program synthesis to deal with noise and heterogeneity.

More broadly, our approach points to ways by which machine learning and programming language techniques can be combined to get the best of both worlds — handling noise, transferring signals from one context to another using ML, producing interpretable programs using PL, and minimizing user intervention.

Mon 24 Jun

Displayed time zone: Tijuana, Baja California change

14:00 - 15:30
SynthesisPLDI Research Papers at 229AB
Chair(s): Nuno P. Lopes Microsoft Research
14:00
20m
Talk
Resource-Guided Program Synthesis
PLDI Research Papers
Tristan Knoth University of California at San Diego, USA, Di Wang Carnegie Mellon University, Nadia Polikarpova University of California, San Diego, Jan Hoffmann Carnegie Mellon University
Media Attached
14:20
20m
Talk
Using Active Learning to Synthesize Models of Applications That Access Databases
PLDI Research Papers
Jiasi Shen Massachusetts Institute of Technology, Martin C. Rinard Massachusetts Institute of Technology
DOI Media Attached
14:40
20m
Talk
Synthesizing Database Programs for Schema Refactoring
PLDI Research Papers
Yuepeng Wang University of Texas at Austin, James Dong University of Texas at Austin, USA, Rushi Shah UT Austin, Işıl Dillig UT Austin
Media Attached
15:00
20m
Talk
Synthesis and Machine Learning for Heterogeneous Extraction
PLDI Research Papers
Arun Iyer Microsoft Research, India, Manohar Jonnalagedda Inpher Inc., Switzerland, Suresh Parthasarathy Microsoft Research, India, Arjun Radhakrishna Microsoft, Sriram Rajamani Microsoft Research
Media Attached