Analyzing Programs with SMT Solvers

Tikhon Jelvis (tikhon@jelv.is)

Zoom in/out: Ctrl or Command + +/-

1 SAT, SMT, SBV… oh my

• acronyms are fun
• nested ones doubly so
• SAT: Boolean satisfiability problem
• classic NP-complete problem
• SMT: SAT modulo theories
• SBV: SMT-based verification
• Haskell library by Levent Erkok

2 I can't get no…

• SAT problem: find satisfying assignment for Boolean formula
• Boolean formula: a bunch of true/false variables with $$\land$$, $$\lor$$ and $$\lnot$$: $$(a \lor \lnot b) \land (\lnot a \lor c \lor d \lor \lnot e) \land (b \lor \lnot d)$$
• satisfying assignment: values for variables such that the whole formula is true

3 It's not easy

• NP-complete
• naïve solution: backtracking
• best solution: backtracking
• huh?
• solvers are fast for real world instances
• recent improvements $$\Rightarrow$$ fast enough for some practical purposes

4 A theory of everything

• SMT: extend SAT with theories
• theory: new type of variable—not Boolean
• bitvectors (0001010101)
• unbounded integers (ℤ)
• unbounded arrays
• algebraic numbers (almost like ℝ)
• floating-point numbers (not entirely unlike ℝ)

5 What can I do with…?

• formula still has to be a Boolean
• new types $$\Rightarrow$$ new constraints, operations
• pretty much what you'd expect: $$(x^2 + y^2 = 25) \land (3x + 4y = 0)$$
• solves to: $$x = 4, y = -3$$

6 Bit by bit

• not just arithmetic
• “flow” control:
• ite: if-then-else
• different theories have:
• bitwise operations
• signed and unsigned operations
• array indexing

7 Still not satisfied

• sometimes, we cannot solve a formula

$$(x^2 + y^2 = 42) \land (3x + 4y = 0)$$

• unsatifiable
• this works even over unbounded integers!
• however, the solver can also
• take too long
• return Unknown

8 Quantification

• more explicit equation from before: $$\exists x. \exists y. (x^2 + y^2 = 25) \land (3x + 4y = 0)$$
• compare to: $$\forall x. \forall y. (x^2 + y^2 = 25) \land (3x + 4y = 0)$$
• Unsatisfiable; can find counterexample: $$x = 23, y = 0$$

9 Again but in English Haskell

• SBV: Haskell DSL for specifying formulas
• mostly ASCII-ifies what we've seen before:
x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0

• reuses Num and Bits but not Eq or Ord
• has own class for Boolean-like things

10 Everything is better with types

• symbolic versions of normal Haskell types
• SInteger for Integer
• SWord32 for Word32; unsigned bitvectors
• SInt32 for Int32; signed bitvectors
• SBool for Bool
• SArray for unbounded arrays

11 Running

formula :: SInteger -> SInteger -> SBool
formula x y =
x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0

• run with quantified variables:
λ> sat . forSome ["x", "y"] formula


12 Everything is better with types monads

• handy monad for composing formulas
• better way to manage variables
formula =
do x :: SInteger <- exists "x"
y :: SInteger <- exists "y"
constrain $x^2 + y^2 .== 25 return$ 3 * x + 4 * y .== 0

• note: ScopedTypeVariables

13 Modelling programs

• extend to three sets of variables: $$\phi(input, program, output)$$
• fix $$input, program$$: interpreter
• fix $$program, output$$: reverse interpreter
• fix $$input, output$$: synthesizer
• also: check arbitrary invariants

14 Step by step

• encode program state with single static assignment SSA
• transform x = x + y to:
constrain $x_2 .== x_1 + y_1  • x_1, y_1…etc: existentially quantified 15 Operational Semantics • operational semantics $$\Rightarrow$$ constraints • interpreter $$\Rightarrow$$ formula compiler • let's consider very simple language: IMP • all variables 32 bit ints • small number of imperative constructs 16 Expressions • arithmetic just gets encoded as arithmetic! • x + 1 becomes… x_n + 1 • keep track of the “current” step # n • operations are signed, so we use SInt32 • remember: this is the theory of bitvectors 17 Assignment • for assignment (x = x + y), we just use SSA constrain$ x_<n + 1> .== x_n + y_n

• note how expression x + y got compiled

18 Conditions

• IMP just has if-then-else
• gets compiled to ite
• if x > 5 then y = 1 else y = 2:
constrain $x_3 .== x_2 * 2  • basically works out to >> • but remember n! 20 Loops • this is the trickiest part • SMT solver cannot handle recursion • must finitize loops • unroll some arbitrary number of times • fail if that is not enough • later: rerun formula multiple times, unrolling more and more 21 Loops • while x < 5 do x = x + 1 unrolls to: • if x < 5 then x = x + 1; <while> else skip • after we reach our max depth, <while> becomes false • convert nested if-statements as before • result: long, ugly formula full of ite's 22 Now what? • we can run our code (slowly) • we can run our code backwards • we can check invariants over our program • we can verify two functions are equivalent • we can synthesize programs • whatever else you can imagine 23 Running • forwards; solve for outputs: constrain$ x_0 .== 10
constrain $y_0 .== 20  • backwards; solve for inputs: constrain$ x_20 .== 10
constrain $y_20 .== 20  24 Invariants • easy to encode invariants • for example "x is always positive": constrain$ x_1 .> 0
&&& x_2 .> 0
&&& x_3 .> 0 ...

• universally quantify inputs:
x :: SWord32 <- forall "x"


25 Verification

• search for inputs where programs disagree
• assert outputs are not equal
constrain $p1_x_20 ./= p2_x_20 constrain$ p1_y_20 ./= p2_y_20

• existentially quantify inputs
• unsatisfiable = verified
• satisfiable = counterexample

26 Synthesis

• add variables for each part of the program
• select what to do with ite
• fix inputs and outputs
• also solve for literals (1, 2, 3…)
• this gives you a program correct only over your fixed inputs and outputs

27 CEGIS

• synthesize
• verify result
• if correct: we're done
• if wrong: verifier returns new input
• repeat
• each new input is a new corner case!

28 Sketching

• some things are easy for synthesizer
• some things are easy for humans
• programmer specifies code with holes:
while ?? do
x = ??
y = ??
z = x + y
return z


29 Fun ideas

• good fit for low-level code (ie assembly)
• use techniques to optimize Haskell
• maybe take advantage of algebraic laws?
• implement simple verified language
• check out ImProve on Hackage