Building Training Sets with Snorkel: Three Key Operators
One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today’s models learn from. In this talk, I will describe our work on Snorkel (snorkel.stanford.edu), an open-source framework for building and managing training datasets, and describe three key operators for letting users build and manipulate training datasets: labeling functions, for labeling unlabeled data; transformation functions, for expressing data augmentation strategies; and slicing functions, for partitioning and structuring training datasets. These operators allow domain expert users to specify machine learning (ML) models via noisy operators over training data, leading to applications that can be built in hours or days, rather than months or years. I will describe recent work on modeling the noise and imprecision inherent in these operators, and using these approaches to train ML models that solve real-world problems, including a recent state-of-the-art result on the SuperGLUE natural language processing benchmark task.
Sat 22 JunDisplayed time zone: Tijuana, Baja California change
09:00 - 11:00 | |||
09:00 40mTalk | Building Training Sets with Snorkel: Three Key Operators MAPL | ||
09:40 40mTalk | Machine Learning in Python with No Strings Attached MAPL Guillaume Baudart IBM Research, Martin Hirzel IBM Research, Kiran Kate , Louis Mandel IBM Research, Avraham Shinnar IBM Research | ||
10:20 40mTalk | Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations MAPL |