SemCluster: Clustering of Imperative Programming Assignments Based on Quantitative Semantic Features (PLDI 2019 - PLDI Research Papers)

Who

David Mitchel Perry, Dohyeong Kim, Roopsha Samanta, Xiangyu Zhang

Track

PLDI 2019 PLDI Research Papers

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 25 Jun 2019 16:00 - 16:20 at 229AB - Dynamics: Analysis and Compilation Chair(s): Nadia Polikarpova

Abstract

A fundamental challenge in automated reasoning about programming assignments at scale is clustering student submissions based on their underlying algorithms. State-of-the-art clustering techniques are sensitive to control structure variations, cannot cluster buggy solutions with similar correct solutions, and either require expensive pair-wise program analyses or training efforts. We propose a novel technique that can cluster small imperative programs based on their algorithmic essence: (A) how the input space is partitioned into equivalence classes and (B) how the problem is uniquely addressed within individual equivalence classes. We capture these algorithmic aspects as two quantitative semantic program features that are merged into a program's vector representation. Programs are then clustered using their vector representations. The computation of our first semantic feature leverages model counting to identify the number of inputs belonging to an input equivalence class. The computation of our second semantic feature abstracts the program's data flow by tracking the number of occurrences of a unique pair of consecutive values of a variable during its lifetime. The comprehensive evaluation of our tool SemCluster on benchmarks drawn from solutions to small programming assignments shows that SemCluster (1) generates far fewer clusters than other clustering techniques, (2) precisely identifies distinct solution strategies, and (3) boosts the performance of clustering-based program repair, all within a reasonable amount of time.

Link to Preprint

https://www.cs.purdue.edu/homes/roopsha/papers/semcluster_pldi2019.pdf

David Mitchel Perry

Purdue University

Dohyeong Kim

Purdue University

Roopsha Samanta

Purdue University

Xiangyu Zhang

Purdue University

Video Abstract

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 25 Jun
Displayed time zone: Tijuana, Baja California change

16:00 - 17:00	Dynamics: Analysis and CompilationPLDI Research Papers at 229AB Chair(s): Nadia Polikarpova University of California, San Diego

16:00 20m Talk		SemCluster: Clustering of Imperative Programming Assignments Based on Quantitative Semantic Features PLDI Research Papers David Mitchel Perry Purdue University, Dohyeong Kim Purdue University, Roopsha Samanta Purdue University, Xiangyu Zhang Purdue University Pre-print Media Attached
16:20 20m Talk		Computing Summaries of String Loops in C for Better Testing and Refactoring PLDI Research Papers Timotej Kapus Imperial College London, Oren Ish-Shalom Tel Aviv University, Israel, Shachar Itzhaky Technion, Israel, Noam Rinetzky Tel Aviv University, Cristian Cadar Imperial College London Link to publication Pre-print Media Attached
16:40 20m Talk		Reusable Inline Caching for JavaScript Performance PLDI Research Papers Jiho Choi University of Illinois at Urbana-Champaign, Thomas Shull University of Illinois at Urbana-Champaign, Josep Torrellas University of Illinois at Urbana-Champaign