Building Training Sets with Snorkel: Three Key Operators (MAPL 2019)

Sat 22 - Wed 26 June 2019 Phoenix, Arizona, United States

Track

MAPL 2019

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 22 Jun 2019 09:00 - 09:40 at 105A - Session 1

Abstract

One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today’s models learn from. In this talk, I will describe our work on Snorkel (snorkel.stanford.edu), an open-source framework for building and managing training datasets, and describe three key operators for letting users build and manipulate training datasets: labeling functions, for labeling unlabeled data; transformation functions, for expressing data augmentation strategies; and slicing functions, for partitioning and structuring training datasets. These operators allow domain expert users to specify machine learning (ML) models via noisy operators over training data, leading to applications that can be built in hours or days, rather than months or years. I will describe recent work on modeling the noise and imprecision inherent in these operators, and using these approaches to train ML models that solve real-world problems, including a recent state-of-the-art result on the SuperGLUE natural language processing benchmark task.

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sat 22 Jun
Displayed time zone: Tijuana, Baja California change

09:00 - 11:00	Session 1MAPL at 105A

09:00 40m Talk		Building Training Sets with Snorkel: Three Key Operators MAPL Alex Jason Ratner
09:40 40m Talk		Machine Learning in Python with No Strings Attached MAPL Guillaume Baudart IBM Research, Martin Hirzel IBM Research, Kiran Kate , Louis Mandel IBM Research, Avraham Shinnar IBM Research
10:20 40m Talk		Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations MAPL Philippe Tillet , H. T. Kung , David Cox

Building Training Sets with Snorkel: Three Key Operators

Sat 22 Jun
Displayed time zone: Tijuana, Baja California change

Alex Jason Ratner

Tracks

Co-hosted Conferences

Workshops

Building Training Sets with Snorkel: Three Key Operators

Program Display Configuration

Program Display Configuration

Sat 22 JunDisplayed time zone: Tijuana, Baja California change

Alex Jason Ratner

Sat 22 Jun
Displayed time zone: Tijuana, Baja California change