Analyzing Programs with SMT Solvers
Tikhon Jelvis (tikhon@jelv.is)
Press → key to advance.
Zoom in/out: Ctrl or Command + +/-
1 SAT, SMT, SBV… oh my
- acronyms are fun
- nested ones doubly so
- SAT: Boolean satisfiability problem
- classic NP-complete problem
- SMT: SAT modulo theories
- SBV: SMT-based verification
- Haskell library by Levent Erkok
2 I can't get no…
- SAT problem: find satisfying assignment for Boolean formula
- Boolean formula: a bunch of true/false variables with \(\land\), \(\lor\) and \(\lnot\): $$ (a \lor \lnot b) \land (\lnot a \lor c \lor d \lor \lnot e) \land (b \lor \lnot d) $$
- satisfying assignment: values for variables such that the whole formula is true
3 It's not easy
- NP-complete
- naïve solution: backtracking
- best solution: backtracking
- huh?
- solvers are fast for real world instances
- recent improvements \(\Rightarrow\) fast enough for some practical purposes
4 A theory of everything
- SMT: extend SAT with theories
- theory: new type of variable—not Boolean
- bitvectors (0001010101)
- unbounded integers (ℤ)
- unbounded arrays
- algebraic numbers (almost like ℝ)
- floating-point numbers (not entirely unlike ℝ)
5 What can I do with…?
- formula still has to be a Boolean
- new types \(\Rightarrow\) new constraints, operations
- pretty much what you'd expect: $$ (x^2 + y^2 = 25) \land (3x + 4y = 0) $$
- solves to: $$x = 4, y = -3$$
6 Bit by bit
- not just arithmetic
- “flow” control:
- ite: if-then-else
- different theories have:
- bitwise operations
- signed and unsigned operations
- array indexing
7 Still not satisfied
- sometimes, we cannot solve a formula
$$ (x^2 + y^2 = 42) \land (3x + 4y = 0) $$
- unsatifiable
- this works even over unbounded integers!
- however, the solver can also
- take too long
- return
Unknown
8 Quantification
- more explicit equation from before: $$ \exists x. \exists y. (x^2 + y^2 = 25) \land (3x + 4y = 0) $$
- compare to: $$ \forall x. \forall y. (x^2 + y^2 = 25) \land (3x + 4y = 0) $$
- Unsatisfiable; can find counterexample: $$x = 23, y = 0$$
9 Again but in English Haskell
- SBV: Haskell DSL for specifying formulas
- mostly ASCII-ifies what we've seen before:
x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0
- reuses
Num
andBits
but notEq
orOrd
- has own class for Boolean-like things
10 Everything is better with types
- symbolic versions of normal Haskell types
SInteger
forInteger
SWord32
forWord32
; unsigned bitvectorsSInt32
forInt32
; signed bitvectorsSBool
forBool
SArray
for unbounded arrays
11 Running
- formulas as Haskell functions:
formula :: SInteger -> SInteger -> SBool formula x y = x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0
- run with quantified variables:
λ> sat . forSome ["x", "y"] formula
12 Everything is better with types monads
- handy monad for composing formulas
- better way to manage variables
formula = do x :: SInteger <- exists "x" y :: SInteger <- exists "y" constrain $ x^2 + y^2 .== 25 return $ 3 * x + 4 * y .== 0
- note:
ScopedTypeVariables
13 Modelling programs
- extend to three sets of variables: $$ \phi(input, program, output) $$
- fix \(input, program\): interpreter
- fix \(program, output\): reverse interpreter
- fix \(input, output\): synthesizer
- also: check arbitrary invariants
14 Step by step
- encode program state with single static assignment SSA
- transform
x = x + y
to:constrain $ x_2 .== x_1 + y_1
x_1
,y_1
…etc: existentially quantified
15 Operational Semantics
- operational semantics \(\Rightarrow\) constraints
- interpreter \(\Rightarrow\) formula compiler
- let's consider very simple language: IMP
- all variables 32 bit ints
- small number of imperative constructs
16 Expressions
- arithmetic just gets encoded as arithmetic!
x + 1
becomes…x_n + 1
- keep track of the “current” step #
n
- keep track of the “current” step #
- operations are signed, so we use
SInt32
- remember: this is the theory of bitvectors
17 Assignment
- for assignment (
x = x + y
), we just use SSAconstrain $ x_<n + 1> .== x_n + y_n
- note how expression
x + y
got compiled
18 Conditions
- IMP just has if-then-else
- gets compiled to
ite
if x > 5 then y = 1 else y = 2
:constrain $ ite (x_n .> 5) (y_<n + 1> .== 1) (y_<n + 1> .== 2)
19 Sequencing
- one statement follows another
- only trick: remember to increment
n
x = x + y; x = x * 2
constrain $ x_2 .== x_1 + y_1 constrain $ x_3 .== x_2 * 2
- basically works out to
>>
- but remember
n
!
- but remember
20 Loops
- this is the trickiest part
- SMT solver cannot handle recursion
- must finitize loops
- unroll some arbitrary number of times
- fail if that is not enough
- later: rerun formula multiple times, unrolling more and more
21 Loops
while x < 5 do x = x + 1
unrolls to:if x < 5 then x = x + 1; <while> else skip
- after we reach our max depth,
<while>
becomesfalse
- convert nested if-statements as before
- result: long, ugly formula full of
ite
's
22 Now what?
- we can run our code (slowly)
- we can run our code backwards
- we can check invariants over our program
- we can verify two functions are equivalent
- we can synthesize programs
- whatever else you can imagine
23 Running
- forwards; solve for outputs:
constrain $ x_0 .== 10 constrain $ y_0 .== 20
- backwards; solve for inputs:
constrain $ x_20 .== 10 constrain $ y_20 .== 20
24 Invariants
- easy to encode invariants
- for example "x is always positive":
constrain $ x_1 .> 0 &&& x_2 .> 0 &&& x_3 .> 0 ...
- universally quantify inputs:
x :: SWord32 <- forall "x"
25 Verification
- search for inputs where programs disagree
- assert outputs are not equal
constrain $ p1_x_20 ./= p2_x_20 constrain $ p1_y_20 ./= p2_y_20
- existentially quantify inputs
- unsatisfiable = verified
- satisfiable = counterexample
26 Synthesis
- add variables for each part of the program
- select what to do with
ite
- fix inputs and outputs
- also solve for literals (
1
,2
,3
…) - this gives you a program correct only over your fixed inputs and outputs
27 CEGIS
- start with random input/output pair
- synthesize
- verify result
- if correct: we're done
- if wrong: verifier returns new input
- repeat
- each new input is a new corner case!
28 Sketching
- some things are easy for synthesizer
- some things are easy for humans
- programmer specifies code with holes:
while ?? do x = ?? y = ?? z = x + y return z
29 Fun ideas
- good fit for low-level code (ie assembly)
- use techniques to optimize Haskell
- maybe take advantage of algebraic laws?
- implement simple verified language
- check out ImProve on Hackage
- design interactive tools
- intelligent debugging
- maybe use for education?
30 Credit
- sbv package by Levent Erkok
- synthesis ideas mostly from:
- z3 SMT Solver from Microsoft Research
- most popular SMT solver
- very versatile
- commercially unfriendly license (not open source)