SemCluster: Clustering of Imperative Programming Assignments Based on Quantitative Semantic Features
A fundamental challenge in automated reasoning about programming assignments at scale is clustering student submissions based on their underlying algorithms. State-of-the-art clustering techniques are sensitive to control structure variations, cannot cluster buggy solutions with similar correct solutions, and either require expensive pair-wise program analyses or training efforts. We propose a novel technique that can cluster small imperative programs based on their algorithmic essence: (A) how the input space is partitioned into equivalence classes and (B) how the problem is uniquely addressed within individual equivalence classes. We capture these algorithmic aspects as two quantitative semantic program features that are merged into a program's vector representation. Programs are then clustered using their vector representations. The computation of our first semantic feature leverages model counting to identify the number of inputs belonging to an input equivalence class. The computation of our second semantic feature abstracts the program's data flow by tracking the number of occurrences of a unique pair of consecutive values of a variable during its lifetime. The comprehensive evaluation of our tool SemCluster on benchmarks drawn from solutions to small programming assignments shows that SemCluster (1) generates far fewer clusters than other clustering techniques, (2) precisely identifies distinct solution strategies, and (3) boosts the performance of clustering-based program repair, all within a reasonable amount of time.
Tue 25 JunDisplayed time zone: Tijuana, Baja California change
16:00 - 17:00 | Dynamics: Analysis and CompilationPLDI Research Papers at 229AB Chair(s): Nadia Polikarpova University of California, San Diego | ||
16:00 20mTalk | SemCluster: Clustering of Imperative Programming Assignments Based on Quantitative Semantic Features PLDI Research Papers David Mitchel Perry Purdue University, Dohyeong Kim Purdue University, Roopsha Samanta Purdue University, Xiangyu Zhang Purdue University Pre-print Media Attached | ||
16:20 20mTalk | Computing Summaries of String Loops in C for Better Testing and Refactoring PLDI Research Papers Timotej Kapus Imperial College London, Oren Ish-Shalom Tel Aviv University, Israel, Shachar Itzhaky Technion, Israel, Noam Rinetzky Tel Aviv University, Cristian Cadar Imperial College London Link to publication Pre-print Media Attached | ||
16:40 20mTalk | Reusable Inline Caching for JavaScript Performance PLDI Research Papers Jiho Choi University of Illinois at Urbana-Champaign, Thomas Shull University of Illinois at Urbana-Champaign, Josep Torrellas University of Illinois at Urbana-Champaign |