Building Program Reasoning Tools using LLVM and Z3

POPL 2020 Tutorial

Monday, January 20, 2020 at 9:00-12:30 in Room TutorialFest-A
[ Introduction | Structure of Tutorial | Getting Started | Learn More ]

Introduction

Program reasoning has important and wide-ranging applications in programming languages, such as type inference, static analysis, testing, verification, and synthesis. This tutorial will cover how to build program reasoning tools using two versatile, modern, and open-source systems: the LLVM compiler infrastructure and the Z3 constraint solver. These systems underlie state-of-the-art tools such as Clang Static Analyzer, the KLEE symbolic execution engine, and the SeaHorn verification framework.

This tutorial will introduce LLVM and Z3’s architecture and conduct three hands-on exercises. The exercises will target common applications in program reasoning that embody three distinctive use-cases: static dataflow analysis, dynamic symbolic execution, and assertion verification. The target audience for this tutorial is those who want hands-on experience with state-of-the-art tools and techniques employed in program reasoning. The tutorial presumes background in C++ programming and basic logical reasoning.

Structure of Tutorial

We will teach how to build program reasoning tools via three hands-on exercises. The tutorial will be structured into two parts.

Part I: Introduction to LLVM and Z3.

We will first introduce the systems which form the backbone of our applications. In particular, we will cover:

  1. LLVM, a compiler infrastructure primarily targeting the C family of languages. It is a versatile framework that is widely used in both industry and academia. We will introduce the architecture of LLVM as well as its language-independent intermediate representation (IR).
  2. Z3, an efficient SMT solver developed by Microsoft Research that is used for a variety of computational problems in programming languages and beyond. We will introduce the architecture of Z3, its capabilities, and its API. However, we will not delve into the internals of Z3.
Part II: Building Three Applications.

We will then conduct hands-on exercises on building three applications. Each exercise will cover a distinctive use-case: dataflow analysis, symbolic execution, and verification. We discuss each in turn.

1. Static Dataflow Analysis. We will build a constraint-based static analyzer for C programs. This will serve as an exercise in writing constraints in Datalog, and solving them using the theory of fixed points in Z3. In doing so, we will become familiar with Z3’s API for C program constructs.

2. Dynamic Symbolic Execution. We will implement a dynamic symbolic execution engine that generates inputs to efficiently explore program paths in order to find inputs that crash a C program. In this exercise, we will learn to implement an LLVM pass to encode C program path constraints symbolically.

3. Assertion Verification. We will create an application similar to a small-scale version of SeaHorn, a program verifier that automatically checks safety properties of C programs. This will serve as an exercise in deriving Verification Conditions (VC) from the LLVM IR in the form of Constrained Horn Clauses (CHC) and checking their satisfiability using Z3.

Getting Started

We provide a docker image for the tutorial that contains skeleton code for each of the three hands-on exercises. Follow the step-by-step guide that details how to set up and run the tutorial docker which is hosted on dockerhub. Please also find our slides which contain the overview and concepts of our talk. The following is a quick-start guide to get you up and running:

To download the docker image:

docker pull petablox/popl2020-tutorial

To start a docker container:

docker run -it --name popl2020 petablox/popl2020-tutorial

git pull origin master

wget http://rightingcode.org/tutorials/popl20/popl2020-soln.zip

unzip popl2020-soln.zip

Please find additional starting code and solutions to the exercises at this link.

Learn More

About the Presenters

Mayur Naik is a professor of computer science at the University of Pennsylvania. His research interests span the areas of program analysis, constraint solving, and machine learning for programming. He holds a Ph.D. in Computer Science from Stanford University. Earlier, he was a researcher at Intel Labs and a professor in the College of Computing at Georgia Tech.

Elizabeth Dinella is a 2nd year PhD student at the University of Pennsylvania advised by Professor Mayur Naik. She received her B.S. in computer science from Rensselaer Polytechnic Institute in 2018. Her research interests lie in the fields of software engineering and programming languages. In particular, she is excited about applying machine learning techniques to solve traditional program analysis problems.

Pardis Pashakhanloo is a 3rd year PhD student at the University of Pennsylvania co-advised by Professor Mayur Naik and Professor Boon Thau Loo. She received her B.Sc. in Software Engineering from Sharif University of Technology in 2017. At the intersection of program analysis, machine learning, and software security, Pardis seeks to develop methodologies and frameworks to eliminate security vulnerabilities by reducing system-wide complexity.

Anthony Canino is a Postdoctoral Researcher at the University of Pennsylvania working with Professor Mayur Naik. His current work focuses on debloating large software systems for increased performance and security. He previously focused on on using programming languages to help programmers build energy efficient software. Moving forward, he aims to strengthen the support that programming languages and models can provide in performance-critical systems, e.g., unmanned aerial vehicles and mobile devices.

Resources

References

The below references describe tools, systems, and frameworks that we use in or have influenced our tutorial.
[CGO '04]Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation.
[OSDI '08]Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs.
[TACAS '08]Leonardo de Moura and Nikolaj Bjørner Z3: An Efficient SMT Solver.
[CAV '15]Arie Gurfinkel, Temesghen Kahsai, Anvesh Komuravelli, and Jorge A. Navas. The SeaHorn Verification Framework.