Analyzing Programs with SMT Solvers

Tikhon Jelvis (tikhon@jelv.is)

Press key to advance.
Zoom in/out: Ctrl or Command + +/-

1 SAT, SMT, SBV… oh my

  • acronyms are fun
    • nested ones doubly so
  • SAT: Boolean satisfiability problem
    • classic NP-complete problem
  • SMT: SAT modulo theories
  • SBV: SMT-based verification
    • Haskell library by Levent Erkok

2 I can't get no…

  • SAT problem: find satisfying assignment for Boolean formula
  • Boolean formula: a bunch of true/false variables with \(\land\), \(\lor\) and \(\lnot\): $$ (a \lor \lnot b) \land (\lnot a \lor c \lor d \lor \lnot e) \land (b \lor \lnot d) $$
  • satisfying assignment: values for variables such that the whole formula is true

3 It's not easy

  • NP-complete
  • naïve solution: backtracking
  • best solution: backtracking
  • huh?
  • solvers are fast for real world instances
    • recent improvements \(\Rightarrow\) fast enough for some practical purposes

4 A theory of everything

  • SMT: extend SAT with theories
  • theory: new type of variable—not Boolean
    • bitvectors (0001010101)
    • unbounded integers (ℤ)
    • unbounded arrays
    • algebraic numbers (almost like ℝ)
    • floating-point numbers (not entirely unlike ℝ)

5 What can I do with…?

  • formula still has to be a Boolean
  • new types \(\Rightarrow\) new constraints, operations
  • pretty much what you'd expect: $$ (x^2 + y^2 = 25) \land (3x + 4y = 0) $$
  • solves to: $$x = 4, y = -3$$

6 Bit by bit

  • not just arithmetic
  • “flow” control:
    • ite: if-then-else
  • different theories have:
    • bitwise operations
    • signed and unsigned operations
    • array indexing

7 Still not satisfied

  • sometimes, we cannot solve a formula

$$ (x^2 + y^2 = 42) \land (3x + 4y = 0) $$

  • unsatifiable
  • this works even over unbounded integers!
  • however, the solver can also
    • take too long
    • return Unknown

8 Quantification

  • more explicit equation from before: $$ \exists x. \exists y. (x^2 + y^2 = 25) \land (3x + 4y = 0) $$
  • compare to: $$ \forall x. \forall y. (x^2 + y^2 = 25) \land (3x + 4y = 0) $$
  • Unsatisfiable; can find counterexample: $$x = 23, y = 0$$

9 Again but in English Haskell

  • SBV: Haskell DSL for specifying formulas
  • mostly ASCII-ifies what we've seen before:
x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0
  • reuses Num and Bits but not Eq or Ord
  • has own class for Boolean-like things

10 Everything is better with types

  • symbolic versions of normal Haskell types
    • SInteger for Integer
    • SWord32 for Word32; unsigned bitvectors
    • SInt32 for Int32; signed bitvectors
    • SBool for Bool
    • SArray for unbounded arrays

11 Running

  • formulas as Haskell functions:
    formula :: SInteger -> SInteger -> SBool
    formula x y = 
      x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0
    
  • run with quantified variables:
    λ> sat . forSome ["x", "y"] formula
    

12 Everything is better with types monads

  • handy monad for composing formulas
  • better way to manage variables
    formula = 
      do x :: SInteger <- exists "x"
         y :: SInteger <- exists "y"
         constrain $ x^2 + y^2 .== 25
         return $ 3 * x + 4 * y .== 0
    
  • note: ScopedTypeVariables

13 Modelling programs

  • extend to three sets of variables: $$ \phi(input, program, output) $$
  • fix \(input, program\): interpreter
  • fix \(program, output\): reverse interpreter
  • fix \(input, output\): synthesizer
  • also: check arbitrary invariants

14 Step by step

  • encode program state with single static assignment SSA
  • transform x = x + y to:
    constrain $ x_2 .== x_1 + y_1
    
  • x_1, y_1…etc: existentially quantified

15 Operational Semantics

  • operational semantics \(\Rightarrow\) constraints
  • interpreter \(\Rightarrow\) formula compiler
  • let's consider very simple language: IMP
    • all variables 32 bit ints
    • small number of imperative constructs

16 Expressions

  • arithmetic just gets encoded as arithmetic!
  • x + 1 becomes… x_n + 1
    • keep track of the “current” step # n
  • operations are signed, so we use SInt32
  • remember: this is the theory of bitvectors

17 Assignment

  • for assignment (x = x + y), we just use SSA
    constrain $ x_<n + 1> .== x_n + y_n
    
  • note how expression x + y got compiled

18 Conditions

  • IMP just has if-then-else
  • gets compiled to ite
  • if x > 5 then y = 1 else y = 2:
    constrain $ ite (x_n .> 5) 
                    (y_<n + 1> .== 1)
                    (y_<n + 1> .== 2)
    

19 Sequencing

  • one statement follows another
  • only trick: remember to increment n
  • x = x + y; x = x * 2
    constrain $ x_2 .== x_1 + y_1
    constrain $ x_3 .== x_2 * 2
    
  • basically works out to >>
    • but remember n!

20 Loops

  • this is the trickiest part
  • SMT solver cannot handle recursion
  • must finitize loops
    • unroll some arbitrary number of times
    • fail if that is not enough
  • later: rerun formula multiple times, unrolling more and more

21 Loops

  • while x < 5 do x = x + 1 unrolls to:
  • if x < 5 then x = x + 1; <while> else skip
  • after we reach our max depth, <while> becomes false
  • convert nested if-statements as before
  • result: long, ugly formula full of ite's

22 Now what?

  • we can run our code (slowly)
  • we can run our code backwards
  • we can check invariants over our program
  • we can verify two functions are equivalent
  • we can synthesize programs
  • whatever else you can imagine

23 Running

  • forwards; solve for outputs:
    constrain $ x_0 .== 10
    constrain $ y_0 .== 20
    
  • backwards; solve for inputs:
    constrain $ x_20 .== 10
    constrain $ y_20 .== 20
    

24 Invariants

  • easy to encode invariants
  • for example "x is always positive":
    constrain $ x_1 .> 0 
            &&& x_2 .> 0 
            &&& x_3 .> 0 ...
    
  • universally quantify inputs:
    x :: SWord32 <- forall "x"
    

25 Verification

  • search for inputs where programs disagree
  • assert outputs are not equal
    constrain $ p1_x_20 ./= p2_x_20
    constrain $ p1_y_20 ./= p2_y_20
    
  • existentially quantify inputs
  • unsatisfiable = verified
  • satisfiable = counterexample

26 Synthesis

  • add variables for each part of the program
  • select what to do with ite
  • fix inputs and outputs
  • also solve for literals (1, 2, 3…)
  • this gives you a program correct only over your fixed inputs and outputs

27 CEGIS

  • start with random input/output pair
  • synthesize
  • verify result
    • if correct: we're done
    • if wrong: verifier returns new input
      • repeat
  • each new input is a new corner case!

28 Sketching

  • some things are easy for synthesizer
  • some things are easy for humans
  • programmer specifies code with holes:
    while ?? do
      x = ??
      y = ??
      z = x + y
    return z
    

29 Fun ideas

  • good fit for low-level code (ie assembly)
  • use techniques to optimize Haskell
    • maybe take advantage of algebraic laws?
  • implement simple verified language
    • check out ImProve on Hackage
  • design interactive tools
    • intelligent debugging
    • maybe use for education?

30 Credit

  • sbv package by Levent Erkok
  • synthesis ideas mostly from:
    • Ras Bodik at the Berkeley ParLab
    • especially slides from his class on synthesis
  • z3 SMT Solver from Microsoft Research
    • most popular SMT solver
    • very versatile
    • commercially unfriendly license (not open source)