Variant analysis with QL
Reports about major security flaws and data breaches in both open-source and proprietary software have become almost a daily fixture. Last year alone, more than 16000 CVEs were assigned, or almost two CVEs per hour. Unfortunately, security vulnerabilities are notoriously hard to test for, they are tricky to find with general static analysis, and even the most attentive code reviewer is bound to miss many of them.
Based on the observation that many newly discovered security flaws are similar to known vulnerabilities, variant analysis has been proposed as one way out of this dilemma. Using known vulnerabilities as “seeds”, security researchers can systematically search for variants that represent potential vulnerabilities and ensure these threats are fixed properly across multiple code bases.
But doing variant analysis by hand or by textual search is time-consuming, tedious and error-prone. Instead, we propose to use Semmle’s QL language, an object-oriented dialect of Datalog with powerful support for program analysis, to codify the problematic patterns underlying a known vulnerability as a query, and then use that query to identify variants. I will give a general overview of this approach, and demonstrate a concrete example of variant analysis with QL on an open-source code base.
Previously, I was an assistant professor at the School of Computer Engineering of Nanyang Technological University, Singapore; a post-doctoral researcher at IBM T.J. Watson Research Center, New York; and a PhD student at the Department of Computer Science at Oxford University.