Program reasoning has important and wide-ranging applications in programming languages, such as type inference, static analysis, testing, verification, and synthesis. This tutorial will cover how to build program reasoning tools using two versatile, modern, and open-source systems: the LLVM compiler infrastructure and the Z3 constraint solver. These systems underlie state-of-the-art tools such as Clang Static Analyzer, the KLEE symbolic execution engine, and the SeaHorn verification framework.
This tutorial will introduce LLVM and Z3’s architecture and conduct three hands-on exercises. The exercises will target common applications in program reasoning that embody three distinctive use-cases: static dataflow analysis, dynamic symbolic execution, and assertion verification. The target audience for this tutorial is those who want hands-on experience with state-of-the-art tools and techniques employed in program reasoning. The tutorial presumes background in C++ programming and basic logical reasoning.
We will teach how to build program reasoning tools via three hands-on exercises. The tutorial will be structured into two parts.
We will first introduce the systems which form the backbone of our applications. In particular, we will cover:
We will then conduct hands-on exercises on building three applications. Each exercise will cover a distinctive use-case: dataflow analysis, symbolic execution, and verification. We discuss each in turn.
1. Static Dataflow Analysis. We will build a constraint-based static analyzer for C programs. This will serve as an exercise in writing constraints in Datalog, and solving them using the theory of fixed points in Z3. In doing so, we will become familiar with Z3’s API for C program constructs.
2. Dynamic Symbolic Execution. We will implement a dynamic symbolic execution engine that generates inputs to efficiently explore program paths in order to find inputs that crash a C program. In this exercise, we will learn to implement an LLVM pass to encode C program path constraints symbolically.
3. Assertion Verification. We will create an application similar to a small-scale version of SeaHorn, a program verifier that automatically checks safety properties of C programs. This will serve as an exercise in deriving Verification Conditions (VC) from the LLVM IR in the form of Constrained Horn Clauses (CHC) and checking their satisfiability using Z3.
We provide a docker image for the tutorial that contains skeleton code for each of the three hands-on exercises. Follow the step-by-step guide that details how to set up and run the tutorial docker which is hosted on dockerhub. Please also find our slides which contain the overview and concepts of our talk. The following is a quick-start guide to get you up and running:
To download the docker image:
docker pull petablox/popl2020-tutorial
To start a docker container:
docker run -it --name popl2020 petablox/popl2020-tutorial
git pull origin master
Please find additional starting code and solutions to the exercises at this link.
Mayur Naik is a professor of computer science at the University of Pennsylvania. His research interests span the areas of program analysis, constraint solving, and machine learning for programming. He holds a Ph.D. in Computer Science from Stanford University. Earlier, he was a researcher at Intel Labs and a professor in the College of Computing at Georgia Tech.
Elizabeth Dinella is a 2nd year PhD student at the University of Pennsylvania advised by Professor Mayur Naik. She received her B.S. in computer science from Rensselaer Polytechnic Institute in 2018. Her research interests lie in the fields of software engineering and programming languages. In particular, she is excited about applying machine learning techniques to solve traditional program analysis problems.
Pardis Pashakhanloo is a 3rd year PhD student at the University of Pennsylvania co-advised by Professor Mayur Naik and Professor Boon Thau Loo. She received her B.Sc. in Software Engineering from Sharif University of Technology in 2017. At the intersection of program analysis, machine learning, and software security, Pardis seeks to develop methodologies and frameworks to eliminate security vulnerabilities by reducing system-wide complexity.
Anthony Canino is a Postdoctoral Researcher at the University of Pennsylvania working with Professor Mayur Naik. His current work focuses on debloating large software systems for increased performance and security. He previously focused on on using programming languages to help programmers build energy efficient software. Moving forward, he aims to strengthen the support that programming languages and models can provide in performance-critical systems, e.g., unmanned aerial vehicles and mobile devices.
|[CGO '04]||Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation.|
|[OSDI '08]||Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs.|
|[TACAS '08]||Leonardo de Moura and Nikolaj Bjørner Z3: An Efficient SMT Solver.|
|[CAV '15]||Arie Gurfinkel, Temesghen Kahsai, Anvesh Komuravelli, and Jorge A. Navas. The SeaHorn Verification Framework.|