SemCluster: Clustering of Imperative Programming Assignments Based on Quantitative Semantic Features
A fundamental challenge in automated reasoning about programming assignments at scale is clustering student submissions based on their underlying algorithms. State-of-the-art clustering techniques are sensitive to control structure variations, cannot cluster buggy solutions with similar correct solutions, and either require expensive pair-wise program analyses or training efforts. We propose a novel technique that can cluster small imperative programs based on their algorithmic essence: (A) how the input space is partitioned into equivalence classes and (B) how the problem is uniquely addressed within individual equivalence classes. We capture these algorithmic aspects as two quantitative semantic program features that are merged into a program's vector representation. Programs are then clustered using their vector representations. The computation of our first semantic feature leverages model counting to identify the number of inputs belonging to an input equivalence class. The computation of our second semantic feature abstracts the program's data flow by tracking the number of occurrences of a unique pair of consecutive values of a variable during its lifetime. The comprehensive evaluation of our tool SemCluster on benchmarks drawn from solutions to small programming assignments shows that SemCluster (1) generates far fewer clusters than other clustering techniques, (2) precisely identifies distinct solution strategies, and (3) boosts the performance of clustering-based program repair, all within a reasonable amount of time.
Tue 25 Jun Times are displayed in time zone: Tijuana, Baja California change
16:00 - 17:00: Dynamics: Analysis and CompilationPLDI Research Papers at 229AB Chair(s): Nadia PolikarpovaUniversity of California, San Diego | |||
16:00 - 16:20 Talk | SemCluster: Clustering of Imperative Programming Assignments Based on Quantitative Semantic Features PLDI Research Papers David Mitchel PerryPurdue University, Dohyeong KimPurdue University, Roopsha SamantaPurdue University, Xiangyu ZhangPurdue University Pre-print Media Attached | ||
16:20 - 16:40 Talk | Computing Summaries of String Loops in C for Better Testing and Refactoring PLDI Research Papers Timotej KapusImperial College London, Oren Ish-ShalomTel Aviv University, Israel, Shachar ItzhakyTechnion, Israel, Noam RinetzkyTel Aviv University, Cristian CadarImperial College London Link to publication Pre-print Media Attached | ||
16:40 - 17:00 Talk | Reusable Inline Caching for JavaScript Performance PLDI Research Papers Jiho ChoiUniversity of Illinois at Urbana-Champaign, Thomas ShullUniversity of Illinois at Urbana-Champaign, Josep TorrellasUniversity of Illinois at Urbana-Champaign |