Fast and Precise Disassembly using Datalog
Disassembly is fundamental to binary analysis and rewriting. We have developed a novel disassembly technique that takes a stripped binary and produces reassembleable assembly code. The resulting assembly code has accurate symbolic information providing cross-references for analysis and enabling adjustment of code and data pointers to accommodate rewriting. This symbolic information is not present in the binary and has to be inferred, this inference process is called symbolization. Our symbolization technique relies on multiple static analyses and heuristics implemented in Datalog (Souffle). Some of the analyses implemented in Datalog are: code discovery, def-use chains, register value analyis, and data access pattern inference. Our disassembler makes extensive use of Souffle’s aggregates to combine the results of heuristics and static analyses and make symbolization decisions. We have implemented our approach into an open-source tool called Ddisasm. In extensive experiments in which we rewrite thousands of x64 binaries we find Ddisasm is both faster and more accurate than the current state-of-the-art binary reassembling tool, Ramblr.
Senior Scientist at GrammaTech, Inc.