README ============================================================================== Formal Methods in System Design - FMSD Special Issue on Satisfiability Modulo Theories, 2016 Propagation Based Local Search for Bit-Precise Reasoning Aina Niemetz, Mathias Preiner, Armin Biere ============================================================================== ------------------------------------------------------------------------------ 1 - Materials provided ------------------------------------------------------------------------------ + boolector ------------------------ the solver binary used in the evaluation + fmsd16-benchmarks-sat ------------------------ Contains all 16436 benchmarks (by family) used in our experimental evaluation. All benchmarks are from the QF_BV division of the SMT-LIB (www.smt-lib.org, status: 2015-06-01). Note that our benchmark set contains a family 'Sage2-fixed'. The benchmarks of this family originally belong to the family 'Sage2' and contain an invalid 'set-info' command. This command has been fixed accordingly for the benchmarks in 'Sage2_fixed'. These benchmarks have since been removed from the SMT-LIB. + fmsd16-benchmarks-unsat ------------------------ Contains all 21173 benchmarks (by family) with status unsat from the QF_BV division of the SMT-LIB. + compare-smt-runs.py ------------------------ A python script for comparing runs. + scatter_plot_legend_top.r box_plot_time.r box_plot_solved.r ------------------------ R scripts for generating the plots provided in the paper. * plot ------------------------ Contains all data files for generating the plots provided in the paper. + runs ------------------------ Contains all log files of our experimental evaluation used for this paper. All runs were performed on the sat benchmark set unless noted otherwise. + configuration Bb: ----------------- * Bb/2016-07-17-Bb : default bit-blasting engine, ~~~~~~~~~~~~~~~~~~ 1200 seconds time limit + configuration Bb+Pw: ----------------- * Bb+Pw : sequential portfolio combination of Bb and Pw using ~~~~~~~ a given number of propagation steps as limit for Pw, 1200 seconds time limit + configuration Pw: -------------------- * Pw/Pw : default propagation based configuration, ~~~~~~~ 10 seconds time limit * Pw/Pw-path-selection : ~~~~~~~~~~~~~~~~~~~~~~ - Pw/Pw-path-selection/Pw-path-selection-controlling-only: prioritize controlling inputs of Boolean operators only, 10 seconds time limit - Pw/Pw-path-selection/Pw-path-selection-random: do not prioritize inputs but choose randomly, 10 seconds time limit * Pw/Pw-value-selection : ~~~~~~~~~~~~~~~~~~~~~~ - Pw/Pw-value-selection/Pw-inv=X%: choose inverse over consistent values with a probability of X%, 10 seconds time limit + configuration Paig: ------------------- * Paig : default bit-level propagation based appraoch, ~~~~~~~ 10 seconds time limit + configuration Bsls: ------------------- * Bsls : the SLS engine with random walks enabled, ~~~~~~~ 10 seconds time limit Note that the set of benchmarks provided occupies ~14GB each when unpacked. ------------------------------------------------------------------------------ 2 - Experimental Setup ------------------------------------------------------------------------------ All our experiments were conducted on a cluster of 30 nodes of 2.83GHz Intel Core 2 Quad machines with 8GB of memory using Ubuntu 14.04.3 LTS. As a memory limit, we used 7GB. ---------------------------------------------------------------------------- 1) Important Notice: ---------------------------------------------------------------------------- Boolector reports an error on 4 benchmarks of the spear family: cav16-benchmarks/QF_BV/spear/ wget_v1.10.2/src_wget_vc18196.smt2 wget_v1.10.2/src_wget_vc18197.smt2 wget_v1.10.2/src_wget_vc18755.smt2 wget_v1.10.2/src_wget_vc18756.smt2 "boolector: 'unsat' but status of benchmark in '/data/starexec/2015-06-01/QF_BV/spear/wget_v1.10.2/src_wget_vc18196.smt2' is 'sat" This is due to the fact that up until SMT-LIB v2.5, division by zero was defined as being handled by means of uninterpreted functions. However, Boolector (and several other state-of-the-art SMT solvers) handles division by zero by returning the greatest possible value, which as of v2.5 is now the standard way of handling it. The status of these 4 benchmarks has not yet been updated accordingly in SMT-LIB, which is also the reason why all benchmarks with division by zero have been excluded in previous SMT competitions. Hence, these are not actual errors, the result Boolector reports is correct and has been treated as such in this paper. ---------------------------------------------------------------------------- 2) Reproducing the overall results of configurations Pw, Paig, Bb, Bb+Pw and Bsls, as listed in Table 6 and 7 using 'compare-smt-runs.py' ---------------------------------------------------------------------------- You can reproduce the overall results on all benchmarks (as listed in Table 6 and 7) by means of the provided log files using the compare script provided, e.g., ./compare-smt-runs.py -g --no-color -to 10 runs/Pw/Pw/2016-06-29-Pw runs/Paig/2016-06-29-Paig ./compare-smt-runs.py -g --no-color -to 1200 runs/Bb/2016-07-17-Bb runs/Bb+Pw/2016-07-17-Bb+Pw-10k ---------------------------------------------------------------------------- 3) Reproducing results of the virtual configurations using compare-smt-runs.py ---------------------------------------------------------------------------- Virtual configurations are reproduced by means of the provided log files as follows: + configuration Bb+Pw-virtual-1s with a time limit of 1s for Pw ./compare-smt-runs.py -g -vbp --no-color -to 1200,1 runs/Bb/2016-07-17-Bb runs/Pw/Pw/2016-06-29-Pw ---------------------------------------------------------------------------- 4) Reproducing the plots provided in the paper with the R scripts provided (Note: you need R in order to be able to reproduce the plots.) ---------------------------------------------------------------------------- + Bsls vs Pw (Figure 7): ---------------------- Rscript scatter_plot_legend_top.r plot/Bsls-vs-Pw.data Bsls Pw + Paig vs Pw (Figure 8): ---------------------- Rscript scatter_plot_legend_top.r plot/Paig-vs-Pw.data Paig Pw + Bb vs Pw (Figure 9): ---------------------- Rscript scatter_plot_legend_top.r plot/Bb-vs-Pw.data Bb Pw + Bb vs Bb+Pw-virtual-1s (Figure 10): ---------------------------------- Rscript scatter_plot_legend_top.r plot/Bb-vs-Bb+Pw-virtual-1s.data Bb Bb+Pw-virtual-1s + Bb vs Bb+Pw-1k (Figure 11): ---------------------------------- Rscript scatter_plot_legend_top.r plot/Bb-vs-Bb+Pw-1k.data Bb Bb+Pw-1k + Bb vs Bb+Pw-10k (Figure 11): ---------------------------------- Rscript scatter_plot_legend_top.r plot/Bb-vs-Bb+Pw-10k.data Bb Bb+Pw-10k + Bb vs Bb+Pw-50k (Figure 11): ---------------------------------- Rscript scatter_plot_legend_top.r plot/Bb-vs-Bb+Pw-50k.data Bb Bb+Pw-50k + Bb vs Bb+Pw-100k (Figure 11): ---------------------------------- Rscript scatter_plot_legend_top.r plot/Bb-vs-Bb+Pw-100k.data Bb Bb+Pw-100k + Randomization effects, Figure 12: ---------------------------------- Rscript box_plot_solved.r plot/Pw-random.data Rscript box_plot_time.r plot/Pw-random.data Rscript box_plot_solved.r plot/Paig-random.data Rscript box_plot_time.r plot/Paig-random.data Rscript box_plot_solved.r plot/Bsls-random.data Rscript box_plot_time.r plot/Bsls-random.data + Randomization effects, Figure 13: ---------------------------------- Rscript box_plot_solved.r plot/Pw-random.data Rscript box_plot_solved.r plot/Pw-inv=0%-random.data Rscript box_plot_solved.r plot/Pw-inv=50%-random.data Rscript box_plot_solved.r plot/Pw-inv=100%-random.data