README ============================================================================== CAV 2016 - Artifact Evaluation Precise and Complete Propagation Based Local Search for Satisfiability Modulo Theory Aina Niemetz, Mathias Preiner, Armin Biere ============================================================================== 1 - Getting Started ------------------------------------------------------------------------------ + Download the archive at http://fmv.jku.at/cav16/cav16-materials.7z + Unpack with 7z x /path/to/cav16-materials.7z -> yields /path/to/CAV16-NiemetzPreinerBiere-artifact + Unpack further with 7z x /path/to/CAV16-NiemetzPreinerBiere-artifact/cav16-benchmarks.7z 7z x /path/to/CAV16-NiemetzPreinerBiere-artifact/runs.7z 7z x /path/to/CAV16-NiemetzPreinerBiere-artifact/solver.7z + Build solvers according to instructions in /path/to/CAV16-NiemetzPreinerBiere-artifact/solvers/boolector-paig/README /path/to/CAV16-NiemetzPreinerBiere-artifact/solvers/boolector-pw/README Note that the set of benchmarks provided occupies ~14GB when unpacked. Further note that all our experiments have been run with a memory limit of 7GB and script 'run-experiments.sh' has been set up to use this limit. In order to exactly reproduce our results, we therefore recommend to adjust the memory limit of the VM to 8GB. If you use a lower memory limit, we advise you to also adjust the memory limit ($spacelimit) in 'run-experiments.sh'. Note that some results (in particular of configuration Bb) might differ as a consequence. ------------------------------------------------------------------------------ 2 - Results ------------------------------------------------------------------------------ Our experimental evaluation defines 4 physical and 2 virtual configurations. All physical configurations are implemented in our SMT solver Boolector, albeit in three different versions (see 3): Physical configurations: + Pw - the word level approach presented in this paper + Paig - the bit level approach presented in this paper + Bb - the default bit-blasting approach of Boolector + Bprop - the approach the work in this paper is based on, presented in [1] Virtual configurations: + Bb+Pw - combining Bb and Pw in a sequential portfolio setting + Bb+Bprop - combining Bb and Bprop in a sequential portfolio setting In directory 'runs' we provide all log files of all runs on all benchmarks used for this paper (see 3). For the AE we provide scripts to reproduce (and compare) results on randomly selected (small) benchmark sets for configurations Pw, Paig, Bb and Bprop. These results are (automatedly) compared to the results on those benchmarks used for this paper. All runs use a time limit of 10 seconds. This time limit can be manually adjusted in script 'run-experiments.sh'. Note that in the paper, we used the following time limits: + Pw, Paig, Bprop: 10 and 1 seconds + Bb: 1200 seconds To reproduce the results of the virtual configurations and additional results not reflected in Table 2 (in the paper) but mentioned in the text you can use the provided log files in 'runs' and 'compare-smt-runs.py'. For a more detailed description on how to reproduce our results see below (section 4). ------------------------------------------------------------------------------ 3 - Materials Provided ------------------------------------------------------------------------------ + cav16-benchmarks ------------------------------------ Contains all 16436 benchmarks (by family) used in our experimental evaluation. All benchmarks are from the QF_BV division of the SMT-LIB (www.smt-lib.org). Note that our benchmark set contains a family 'Sage2-fixed'. The benchmarks of this family originally belong to the family 'Sage2' and contain an invalid 'set-info' command. This command has been fixed accordingly for the benchmarks in 'Sage2_fixed'. + benchmarks-cav16-all ---------------------------------------- a file listing all benchmarks in 'cav16-benchmarks' + solvers --------------------------- contains three solvers, all of them Boolector, but different versions. * boolector-pw (configuration Pw) * boolector-paig (configuration Paig) * boolector-difts-2015 (configuration Bb and Bprop as in [1]) + runs ------------------------ contains all log files of our experimental evaluation used for this paper + configuration Bb: ----------------- * bb : default bit-blasting engine, 1200 seconds time limit + configuration Bprop: -------------------- * bprop : default config of [1], 10 seconds time limit + configuration Paig: ------------------- * paig : bit-level approach in default configuration, i.e., no restarts, random selection of roots for path propagation 10 seconds time limit * paig+restarts: as paig above but with restarts enabled * paig+bandit : as paig, but using the heuristic of [1] to select roots for path propagation * paig+restarts+bandit: as paig, but with restarts and root selection heuristic enabled + configuration Pw: ----------------- * pw : word-level approach in default configuration, i.e., no restarts, random selection of roots for path propagation 10 seconds time limit * pw+restarts: as pw but with restarts enabled * pw+bandit : as pw, but using the heuristic of [1] to select roots for path propagation * pw+restarts+bandit: as pw, but with restarts and root selection heuristic enabled * pw+seed=xxx : as pw, but using xxx as the seed for Boolector's random number generator (default seed: 0) Note that given a seed, Boolector's RNG delivers reproducible sequences (hence all results given a certain seed are deterministic and comparable). * pw+inv=xxx : as pw, but using xxx as the probability (in %) with whitch to choose inverse values over consistent values (default: 99) ------------------------------------------------------------------------------ 4 - Running the Experiments ------------------------------------------------------------------------------ All our experiments were conducted on a cluster of 30 nodes of 2.83GHz Intel Core 2 Quad machines with 8GB of memory using Ubuntu 14.04.3 LTS. As a memory limit, we used 7GB. Important Notice: ----------------- Boolector reports an error on 4 benchmarks of the spear family: cav16-benchmarks/QF_BV/spear/ wget_v1.10.2/src_wget_vc18196.smt2 wget_v1.10.2/src_wget_vc18197.smt2 wget_v1.10.2/src_wget_vc18755.smt2 wget_v1.10.2/src_wget_vc18756.smt2 "boolector: 'unsat' but status of benchmark in '/data/starexec/2015-06-01/QF_BV/spear/wget_v1.10.2/src_wget_vc18196.smt2' is 'sat" This is due to the fact that up until SMT-LIB v2.5, division by zero was defined as being handled by means of uninterpreted functions. However, Boolector (and several other state-of-the-art SMT solvers) handles division by zero by returning the greatest possible value, which as of v2.5 is now the standard way of handling it. The status of these 4 benchmarks has not yet been updated accordingly in SMT-LIB, which is also the reason why all benchmarks with division by zero have been excluded in previous SMT competitions. Hence, these are not actual errors, the result Boolector reports is correct and has been treated as such in this paper. ---------------------------------------------------------------------------- 1) Reproducing results on certain benchmarks using run-experiments.sh ---------------------------------------------------------------------------- We provide a script to run configurations Pw, Paig, Bb, and Bprop on a set of benchmarks, either * given via a benchmark file (listing the path to the benchmarks selected from directory 'cav16-benchmarks') * or automatically randomly selected (max. 2 per family) We recommend using the latter option, which selects 53 benchmarks in total and finishes in a reasonable time. You can adjust the number of benchmarks to be selected in the script provided. The script runs all 4 configurations Pw, Paig, Bb and Bprop on the selected benchmarks and compares the results to the results on those benchmarks used for this paper. Note that the benchmark selection of a run is stored in benchmarks-cav16-selection-$pid in order to be able to reproduce runs on a certain selection. ---------------------------------------------------------------------------- 2) Reproducing the results of configurations Pw, Paig, Bb, Bprop as listed in Table 2 using compare-smt-runs.py ---------------------------------------------------------------------------- You can reproduce the overall results on all benchmarks (as listed in Table 2) by means of the provided log files using the compare script provided, e.g., ./compare-smt-runs.py -g --no-color runs/pw runs/paig This further lists results by benchmark family, as listed in Table 3). Note that running a configuration on all benchmarks with a time limit of 10 seconds may take up to well over 24 hours on a single machine. Hence we do not recommend to reproduce runs on all benchmarks by means of script 'run-experiments.sh' and benchmark file 'benchmarks-cav16-all' but use the option above instead. ---------------------------------------------------------------------------- 3) Reproducing results of the virtual configurations using compare-smt-runs.py ---------------------------------------------------------------------------- Virtual configurations are reproduced by means of the provided log files as follows: + configuration Bb+Pw with a time limit of 10s for Pw ./compare-smt-runs.py -g -vbp --no-color -to 1200,10 runs/bb runs/pw + configuration Bb+Pw with a time limit of 1s for Pw ./compare-smt-runs.py -g -vbp --no-color -to 1200,1 runs/bb runs/pw + configuration Bb+Bprop with a time limit of 10s for Bprop ./compare-smt-runs.py -g -vbp --no-color -to 1200,10 runs/bb runs/bprop + configuration Bb+Bprop with a time limit of 1s for Bprop ./compare-smt-runs.py -g -vbp --no-color -to 1200,1 runs/bb runs/bprop ---------------------------------------------------------------------------- 4) Reproducing any other results using compare-smt-runs.py ---------------------------------------------------------------------------- You can reproduce any other results of the paper by utilizing the compare script and the provided log files. E.g., in order to compare the results of configuration Pw with and without restarts enabled do ./compare-smt-runs.py -g runs/pw runs/pw+restarts If you want to compare configurations with a time limit different to the limit used for the runs, e.g., 1s instead of 10s, do ./compare-smt-runs.py -g -to 1 runs/pw runs/paig ------------------------------------------------------------------------------ R - References ------------------------------------------------------------------------------ [1] Niemetz, A., Preiner, M., Biere, A., Fröhlich, A.: Improving local search for bit-vector logics in SMT with path propagation In: Proc. DIFTS'15. (2015) 1–10