README
==============================================================================

CAV 2016 - Artifact Evaluation

Precise and Complete Propagation Based Local Search for Satisfiability 
Modulo Theory

Aina Niemetz, Mathias Preiner, Armin Biere

==============================================================================

1 - Getting Started
------------------------------------------------------------------------------

 + Download the archive at http://fmv.jku.at/cav16/cav16-materials.7z

 + Unpack with
     7z x /path/to/cav16-materials.7z
   
   -> yields /path/to/CAV16-NiemetzPreinerBiere-artifact

 + Unpack further with
     7z x /path/to/CAV16-NiemetzPreinerBiere-artifact/cav16-benchmarks.7z
     7z x /path/to/CAV16-NiemetzPreinerBiere-artifact/runs.7z
     7z x /path/to/CAV16-NiemetzPreinerBiere-artifact/solver.7z
 
 + Build solvers according to instructions in
     /path/to/CAV16-NiemetzPreinerBiere-artifact/solvers/boolector-paig/README
     /path/to/CAV16-NiemetzPreinerBiere-artifact/solvers/boolector-pw/README


 Note that the set of benchmarks provided occupies ~14GB when unpacked.
 
 Further note that all our experiments have been run with a memory limit of
 7GB and script 'run-experiments.sh' has been set up to use this limit.
 In order to exactly reproduce our results, we therefore recommend to adjust
 the memory limit of the VM to 8GB.

 If you use a lower memory limit, we advise you to also adjust the memory
 limit ($spacelimit) in 'run-experiments.sh'. Note that some results
 (in particular of configuration Bb) might differ as a consequence.


------------------------------------------------------------------------------

2 - Results
------------------------------------------------------------------------------

  Our experimental evaluation defines 4 physical and 2 virtual configurations.
  All physical configurations are implemented in our SMT solver Boolector,
  albeit in three different versions (see 3):

  Physical configurations:

    + Pw    - the word level approach presented in this paper 
    + Paig  - the bit level approach presented in this paper
    + Bb    - the default bit-blasting approach of Boolector
    + Bprop - the approach the work in this paper is based on, presented in [1]

  Virtual configurations:

    + Bb+Pw    - combining Bb and Pw in a sequential portfolio setting
    + Bb+Bprop - combining Bb and Bprop in a sequential portfolio setting


  In directory 'runs' we provide all log files of all runs on all
  benchmarks used for this paper (see 3).

  For the AE we provide scripts to reproduce (and compare) results on
  randomly selected (small) benchmark sets for configurations Pw, Paig, Bb
  and Bprop. These results are (automatedly) compared to the results
  on those benchmarks used for this paper.

  All runs use a time limit of 10 seconds. This time limit can be manually
  adjusted in script 'run-experiments.sh'.

  Note that in the paper, we used the following time limits:
  
    + Pw, Paig, Bprop: 10 and 1 seconds
    + Bb: 1200 seconds

  To reproduce the results of the virtual configurations and additional results
  not reflected in Table 2 (in the paper) but mentioned in the text you can use
  the provided log files in 'runs' and 'compare-smt-runs.py'.

  For a more detailed description on how to reproduce our results see below
  (section 4).


------------------------------------------------------------------------------

3 - Materials Provided
------------------------------------------------------------------------------

  + cav16-benchmarks
    ------------------------------------
      Contains all 16436 benchmarks (by family) used in our experimental
      evaluation. All benchmarks are from the QF_BV division of the SMT-LIB
      (www.smt-lib.org).
      
      Note that our benchmark set contains a family 'Sage2-fixed'. The
      benchmarks of this family originally belong to the family 'Sage2' and
      contain an invalid 'set-info' command. This command has been fixed
      accordingly for the benchmarks in 'Sage2_fixed'.


  + benchmarks-cav16-all
    ----------------------------------------
      a file listing all benchmarks in 'cav16-benchmarks'


  + solvers
    ---------------------------
      contains three solvers, all of them Boolector, but different versions.
          * boolector-pw (configuration Pw)
          * boolector-paig (configuration Paig)
          * boolector-difts-2015 (configuration Bb and Bprop as in [1])


  + runs
    ------------------------
      contains all log files of our experimental evaluation used for this paper


        + configuration Bb:
          -----------------
            * bb    : default bit-blasting engine, 1200 seconds time limit


        + configuration Bprop:
          --------------------
            * bprop : default config of [1], 10 seconds time limit

        
        + configuration Paig:
          -------------------
            * paig  : bit-level approach in default configuration, i.e.,
                      no restarts, random selection of roots for path
                      propagation
                      10 seconds time limit

            * paig+restarts: as paig above but with restarts enabled
  
            * paig+bandit  : as paig, but using the heuristic of [1] to 
                             select roots for path propagation

            * paig+restarts+bandit: as paig, but with restarts and root
                                     selection heuristic enabled

        + configuration Pw:
          -----------------
            * pw    : word-level approach in default configuration, i.e.,
                      no restarts, random selection of roots for path
                      propagation
                      10 seconds time limit
      
            * pw+restarts: as pw but with restarts enabled
  
            * pw+bandit  : as pw, but using the heuristic of [1] to 
                           select roots for path propagation

            * pw+restarts+bandit: as pw, but with restarts and root
                                   selection heuristic enabled

            * pw+seed=xxx : as pw, but using xxx as the seed for Boolector's
                            random number generator (default seed: 0)
                            Note that given a seed, Boolector's RNG delivers
                            reproducible sequences (hence all results given
                            a certain seed are deterministic and comparable).

            * pw+inv=xxx : as pw, but using xxx as the probability (in %)
                           with whitch to choose inverse values over 
                           consistent values (default: 99)

 
------------------------------------------------------------------------------

4 - Running the Experiments
------------------------------------------------------------------------------

  All our experiments were conducted on a cluster of 30 nodes of 2.83GHz
  Intel Core 2 Quad machines with 8GB of memory using Ubuntu 14.04.3 LTS.
  As a memory limit, we used 7GB.

  Important Notice:
  -----------------
  Boolector reports an error on 4 benchmarks of the spear family:

    cav16-benchmarks/QF_BV/spear/
        wget_v1.10.2/src_wget_vc18196.smt2
        wget_v1.10.2/src_wget_vc18197.smt2
        wget_v1.10.2/src_wget_vc18755.smt2
        wget_v1.10.2/src_wget_vc18756.smt2
  
  "boolector: 'unsat' but status of benchmark in '/data/starexec/2015-06-01/QF_BV/spear/wget_v1.10.2/src_wget_vc18196.smt2' is 'sat"

  This is due to the fact that up until SMT-LIB v2.5, division by zero
  was defined as being handled by means of uninterpreted functions.
  However, Boolector (and several other state-of-the-art SMT solvers) handles
  division by zero by returning the greatest possible value, which as of v2.5
  is now the standard way of handling it. The status of these 4 benchmarks has
  not yet been updated accordingly in SMT-LIB, which is also the reason why all
  benchmarks with division by zero have been excluded in previous SMT
  competitions.

  Hence, these are not actual errors, the result Boolector reports is
  correct and has been treated as such in this paper.


  ----------------------------------------------------------------------------
  1) Reproducing results on certain benchmarks using
     run-experiments.sh
  ----------------------------------------------------------------------------

     We provide a script to run configurations Pw, Paig, Bb, and Bprop
     on a set of benchmarks, either

       * given via a benchmark file (listing the path to the benchmarks
         selected from directory 'cav16-benchmarks')

       * or automatically randomly selected (max. 2 per family)
    
     We recommend using the latter option, which selects 53 benchmarks
     in total and finishes in a reasonable time. You can adjust the number
     of benchmarks to be selected in the script provided.

     The script runs all 4 configurations Pw, Paig, Bb and Bprop on the
     selected benchmarks and compares the results to the results on those
     benchmarks used for this paper.

     Note that the benchmark selection of a run is stored in
     benchmarks-cav16-selection-$pid in order to be able to reproduce
     runs on a certain selection.


  ----------------------------------------------------------------------------
  2) Reproducing the results of configurations Pw, Paig, Bb, Bprop as
     listed in Table 2 using
     compare-smt-runs.py
  ----------------------------------------------------------------------------

    You can reproduce the overall results on all benchmarks (as listed in
    Table 2) by means of the provided log files using the compare script
    provided, e.g.,

      ./compare-smt-runs.py -g --no-color runs/pw runs/paig

    This further lists results by benchmark family, as listed in Table 3).

    Note that running a configuration on all benchmarks with a time limit
    of 10 seconds may take up to well over 24 hours on a single machine.
    Hence we do not recommend to reproduce runs on all benchmarks by
    means of script 'run-experiments.sh' and benchmark file 
    'benchmarks-cav16-all' but use the option above instead.
     

  ----------------------------------------------------------------------------
  3) Reproducing results of the virtual configurations using
     compare-smt-runs.py
  ----------------------------------------------------------------------------

    Virtual configurations are reproduced by means of the provided log files
    as follows:

      + configuration Bb+Pw with a time limit of 10s for Pw

        ./compare-smt-runs.py -g -vbp --no-color -to 1200,10 runs/bb runs/pw

      + configuration Bb+Pw with a time limit of 1s for Pw

        ./compare-smt-runs.py -g -vbp --no-color -to 1200,1 runs/bb runs/pw

      + configuration Bb+Bprop with a time limit of 10s for Bprop

        ./compare-smt-runs.py -g -vbp --no-color -to 1200,10 runs/bb runs/bprop

      + configuration Bb+Bprop with a time limit of 1s for Bprop

        ./compare-smt-runs.py -g -vbp --no-color -to 1200,1 runs/bb runs/bprop


  ----------------------------------------------------------------------------
  4) Reproducing any other results using
     compare-smt-runs.py
  ----------------------------------------------------------------------------

    You can reproduce any other results of the paper by utilizing the compare
    script and the provided log files.

    E.g., in order to compare the results of configuration Pw with and without
    restarts enabled do

      ./compare-smt-runs.py -g runs/pw runs/pw+restarts

    If you want to compare configurations with a time limit different to
    the limit used for the runs, e.g., 1s instead of 10s, do

      ./compare-smt-runs.py -g -to 1 runs/pw runs/paig


------------------------------------------------------------------------------

R - References
------------------------------------------------------------------------------

[1] Niemetz, A., Preiner, M., Biere, A., Fröhlich, A.:
    Improving local search for bit-vector logics in SMT with path propagation
    In: Proc. DIFTS'15. (2015) 1–10