## Analyzing Programs with SMT Solvers

Tikhon Jelvis (tikhon@jelv.is)

Press → key to advance.

Zoom in/out: Ctrl or Command + +/-

# 1 SAT, SMT, SBV… oh my

- acronyms are fun
- nested ones doubly so

**SAT**: Boolean satisfiability problem- classic NP-complete problem

**SMT**: SAT modulo theories**SBV**: SMT-based verification- Haskell library by Levent Erkok

# 2 I can't get no…

- SAT problem: find
**satisfying assignment**for**Boolean formula** **Boolean formula**: a bunch of true/false variables with \(\land\), \(\lor\) and \(\lnot\): $$ (a \lor \lnot b) \land (\lnot a \lor c \lor d \lor \lnot e) \land (b \lor \lnot d) $$**satisfying assignment**: values for variables such that the whole formula is true

# 3 It's not easy

- NP-complete
- naïve solution: backtracking
- best solution: backtracking
- huh?
- solvers are fast for
**real world**instances- recent improvements \(\Rightarrow\) fast enough for
*some*practical purposes

- recent improvements \(\Rightarrow\) fast enough for

# 4 A theory of everything

- SMT: extend SAT with
**theories** **theory**: new type of variable—not Boolean- bitvectors (0001010101)
*unbounded*integers (ℤ)*unbounded*arrays- algebraic numbers (almost like ℝ)
- floating-point numbers (not entirely unlike ℝ)

# 5 What can I do with…?

- formula still has to be a Boolean
- new types \(\Rightarrow\) new constraints, operations
- pretty much what you'd expect: $$ (x^2 + y^2 = 25) \land (3x + 4y = 0) $$
- solves to: $$x = 4, y = -3$$

# 6 Bit by bit

- not just arithmetic
- “flow” control:
**ite**: if-then-else

- different theories have:
- bitwise operations
- signed
*and*unsigned operations - array indexing

# 7 Still not satisfied

- sometimes, we cannot solve a formula

$$ (x^2 + y^2 = 42) \land (3x + 4y = 0) $$

**unsatifiable**- this works even over
*unbounded integers*! - however, the solver can also
- take too long
- return
`Unknown`

# 8 Quantification

- more explicit equation from before: $$ \exists x. \exists y. (x^2 + y^2 = 25) \land (3x + 4y = 0) $$
- compare to: $$ \forall x. \forall y. (x^2 + y^2 = 25) \land (3x + 4y = 0) $$
- Unsatisfiable; can find
**counterexample**: $$x = 23, y = 0$$

# 9 Again but in ~~English~~ Haskell

- SBV: Haskell DSL for specifying formulas
- mostly ASCII-ifies what we've seen before:

x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0

- reuses
`Num`

and`Bits`

but not`Eq`

or`Ord`

- has own class for Boolean-like things

# 10 Everything is better with types

**symbolic**versions of normal Haskell types`SInteger`

for`Integer`

`SWord32`

for`Word32`

; unsigned bitvectors`SInt32`

for`Int32`

; signed bitvectors`SBool`

for`Bool`

`SArray`

for*unbounded*arrays

# 11 Running

- formulas as Haskell functions:
formula :: SInteger -> SInteger -> SBool formula x y = x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0

- run with quantified variables:
λ> sat . forSome ["x", "y"] formula

# 12 Everything is better with ~~types~~ monads

- handy monad for composing formulas
- better way to manage variables
formula = do x :: SInteger <- exists "x" y :: SInteger <- exists "y" constrain $ x^2 + y^2 .== 25 return $ 3 * x + 4 * y .== 0

- note:
`ScopedTypeVariables`

# 13 Modelling programs

- extend to three sets of variables: $$ \phi(input, program, output) $$
- fix \(input, program\): interpreter
- fix \(program, output\): reverse interpreter
- fix \(input, output\): synthesizer
- also: check arbitrary invariants

# 14 Step by step

- encode program state with
**single static assignment**SSA - transform
`x = x + y`

to:constrain $ x_2 .== x_1 + y_1

`x_1`

,`y_1`

…etc: existentially quantified

# 15 Operational Semantics

- operational semantics \(\Rightarrow\) constraints
- interpreter \(\Rightarrow\) formula compiler
- let's consider very simple language: IMP
- all variables 32 bit ints
- small number of imperative constructs

# 16 Expressions

- arithmetic just gets encoded as arithmetic!
`x + 1`

becomes…`x_n + 1`

- keep track of the “current” step #
`n`

- keep track of the “current” step #
- operations are
*signed*, so we use`SInt32`

- remember: this is the theory of
**bitvectors**

# 17 Assignment

- for assignment (
`x = x + y`

), we just use SSAconstrain $ x_<n + 1> .== x_n + y_n

- note how expression
`x + y`

got compiled

# 18 Conditions

- IMP just has if-then-else
- gets compiled to
`ite`

`if x > 5 then y = 1 else y = 2`

:constrain $ ite (x_n .> 5) (y_<n + 1> .== 1) (y_<n + 1> .== 2)

# 19 Sequencing

- one statement follows another
- only trick: remember to increment
`n`

`x = x + y; x = x * 2`

constrain $ x_2 .== x_1 + y_1 constrain $ x_3 .== x_2 * 2

- basically works out to
`>>`

- but remember
`n`

!

- but remember

# 20 Loops

- this is the trickiest part
- SMT solver
**cannot**handle recursion - must
**finitize**loops- unroll some arbitrary number of times
- fail if that is not enough

- later: rerun formula multiple times, unrolling more and more

# 21 Loops

`while x < 5 do x = x + 1`

unrolls to:`if x < 5 then x = x + 1; <while> else skip`

- after we reach our max depth,
`<while>`

becomes`false`

- convert nested if-statements as before
- result: long, ugly formula full of
`ite`

's

# 22 Now what?

- we can run our code (slowly)
- we can run our code
*backwards* - we can check invariants over our program
- we can verify two functions are equivalent
- we can synthesize programs
- whatever else you can imagine

# 23 Running

- forwards; solve for outputs:
constrain $ x_0 .== 10 constrain $ y_0 .== 20

- backwards; solve for inputs:
constrain $ x_20 .== 10 constrain $ y_20 .== 20

# 24 Invariants

- easy to encode invariants
- for example "x is always positive":
constrain $ x_1 .> 0 &&& x_2 .> 0 &&& x_3 .> 0 ...

- universally quantify inputs:
x :: SWord32 <- forall "x"

# 25 Verification

- search for inputs where programs
**disagree** - assert outputs are
*not*equalconstrain $ p1_x_20 ./= p2_x_20 constrain $ p1_y_20 ./= p2_y_20

- existentially quantify inputs
- unsatisfiable = verified
- satisfiable = counterexample

# 26 Synthesis

- add variables for each part of the program
- select what to do with
`ite`

- fix inputs and outputs
- also solve for literals (
`1`

,`2`

,`3`

…) - this gives you a program correct only over your fixed inputs and outputs

# 27 CEGIS

- start with random input/output pair
- synthesize
- verify result
- if correct: we're done
- if wrong: verifier returns new input
- repeat

- each new input is a new corner case!

# 28 Sketching

- some things are easy for synthesizer
- some things are easy for humans
- programmer specifies code with
**holes**:while ?? do x = ?? y = ?? z = x + y return z

# 29 Fun ideas

- good fit for
**low-level**code (ie assembly) - use techniques to optimize Haskell
- maybe take advantage of algebraic laws?

- implement simple verified language
- check out ImProve on Hackage

- design interactive tools
- intelligent debugging
- maybe use for education?

# 30 Credit

- sbv package by Levent Erkok
- synthesis ideas mostly from:
- z3 SMT Solver from Microsoft Research
- most popular SMT solver
- very versatile
- commercially unfriendly license (
**not**open source)