****************************************************************************** * VoteSMT * ****************************************************************************** This is a release of VoteSMT which is in essence a majority voting framework for SMT solvers. Fuzz testing techniques are used to find defects and discrepancies between SMT solvers. See www.smtlib.org for more details about SMT in general. For more info about fuzzing, delta-debugging and majority voting in the context of SMT, see our paper: 'Robert Brummayer and Armin Biere. Fuzzing and Delta-Debugging SMT Solvers', presented at SMT'09. VoteSMT is released under GPL. See COPYING for more details about the license. VoteSMT is implemented by Bash scripts and uses my SMT fuzzer FuzzSMT. My short introduction assumes that you work on a UNIX/LINUX-like system with access to tools such as flock, bc and printf. VoteSMT needs my SMT fuzzer FuzzSMT. It can be downloaded from http://fmv.jku.at/fuzzsmt and is available under GPL as well. If you have not done it already, install the fuzzer, and make sure that the wrapper shell script 'fuzzsmt' that comes with FuzzSMT can be found in the PATH. Then, adapt the wrapper shell script in such a way that it uses an absolute path. For instance, if you have extracted FuzzSMT 0.1 into your home directory, then adapt the fuzzsmt script as follows: #/bin/sh java -jar ~/fuzzsmt-0.1/fuzzsmt.jar $* If you do not perform this step, then there may be a problem that the jar file of fuzzsmt cannot be found. Of course, other solutions are possible, but this one is straightforward. Furthermore, VoteSMT needs the 'run' tool for handling time and memory consumption. The 'run' tool comes with a BSD style license and can be downloaded from http://fmv.jku.at/run. Again, make sure that 'run' can be found in the PATH. Note that the time limit feature of the current version 1.1 may not work correctly for running threaded applications. VoteSMT assumes that the SMT solvers that should be tested are in the sub-directory 'solvers'. I provide three dummy solver shell scripts: dummysolver1 to dummysolver3. Their purpose is just that you can start and play with the system to figure out how it works. VoteSMT consists of two shell scripts: 'votesmtsp' and 'votesmtmp'. The shell script 'votesmtsp' does the whole work as one single process. The other script 'votesmtmp' does the following. It tries to find out how many processors/cores are available (alternatively, it can specified with the option '-p'). Then, it runs one instance of 'votesmtsp' on each processor/core. Moreover, it repeatedly prints out statistics after a few seconds. Note that 'votesmtmp' has not been designed to be run concurrently itself. The statistics won't work anymore if you run multiple instances of 'votesmtmp' concurrently. Finally, one note: the scripts should not be run from other directories, run them directly from the VoteSMT working directory. The script 'votesmtmp' does signal handling. Therefore, if you terminate it, it automatically terminates all generated instances of 'votesmtsp'. Moreover, as there may be buggy solvers that do signal handling and are still running afterwards, it calls 'killall' on the solvers that have been used by VoteSMT. If first sends SIGTERM and then SIGKILL to each solver. However, be careful if you have other solver instances already running on your machine. As 'killall' kills processes by names, solver instances that have been started before calling VoteSMT are terminated as well. The usage of 'killall' can be disabled with '-k'. VoteSMT assumes that for each logic you want to test, there is a file with the same name as the logic. The file contains the names of the solvers (without path) you want to test for this theory. As an example, you can find the file QF_NIA which lists the dummy solvers line by line. If you want to comment solvers out, simply put a '#' at the beginning of the line. VoteSMT assumes that the solvers read input from stdin. If this is not the case, you have to write a small wrapper script. OK, now it is time to start. Call the single process script as follows: ./votesmtsp QF_NIA The script repeatedly calls FuzzSMT with the logic QF_NIA in order to generate random SMT formulas for this logic. The formulas are used to test the solvers, i.e. the whole set of solvers is consecutively run on each formula. Let 'votesmtsp' run for a few seconds. The first two dummy solvers are identical. They just compute the result depending on the number of lines of the formula. The third dummy solver reports the opposite result. Depending on the number of lines, all three dummy solvers may also report 'unknown', or simulate a crash by terminating without a result. After waiting a few seconds, terminate the process. A sub-directory 'errors' should now have been created by VoteSMT. This directory contains all failure-inducing inputs for each theory and for each solver. Go into the directory 'errors/QF_NIA' to find sub-directories for each solver, and one sub-directory 'unresolved'. I will discuss the principle of unresolvedness later. The solver sub-directories contain failure-inducing formulas that caused errors, and formulas on which the majority of the solvers reported the opposite result. In addition to the failure-inducing SMT files 'error*.smt', there are the corresponding outputs of the solver to stdout (error*.out) and stderr (error*.err). Moreover, the files error*.txt contain the respective status reported by the 'run' tool and a short error description. The SMT formulas for which the majority reported the opposite result are named incorrect*.smt. The corresponding files incorrect*.txt contain the results of all solvers and the majority in percents. Note that both scripts, 'votesmtsp' and 'votesmtmp', create directories on demand as needed. The single process script 'votesmtsp' does not delete any files from previous runs. However, depending on the selected theory and the solvers that should be tested, 'votesmtmp' deletes the files from previous runs. So if you want to save them, make sure to do a backup before you run 'votesmtmp' again. Now, now it's time to try out 'votesmtmp': ./votesmtmp QF_NIA It executes 'votesmtsp' on each processor/core and collects statistics. You can also specify command line options for 'votesmtsp'. The script 'votesmtmp' simply passes them to each instance. The statistics are just an approximation of the current state, i.e. you have "at least" found these defects. They are not fully synchronized to avoid a slowdown, i.e. while the first lines of on report are printed the fuzzing instances might have found new failure-inducing formulas that are directly shown by the consecutive lines. Moreover, per default, each 'votesmtsp' instance updates the statistics after having tested 10 formulas in order to avoid too much locking overhead. Be careful if you run 'votesmtmp'. Some SMT solvers use wrapper scripts that read/write temporary files for input/output conversion. Typically, these run scripts are not designed for parallel usage. However, 'votesmtmp' runs them in parallel which may lead to unexpected results. As an example consider the following run script: #!/bin/bash tmpfile=/tmp/solverx.tmp # call solver to read from stdin and write result into temporary file solverx /dev/stdin > $tmpfile # analyze temporary file if [[ -n `grep SATISFIABLE $tmpfile` ]]; then echo sat elif ... fi Solver X uses its own output format. In order to convert the output into the standard format, it writes the output into a temporary file and analyzes the result afterwards. Depending on the result it then writes 'sat' or 'unsat'. This run script is not designed for parallel usage as the name of the temporary file is fixed. If multiple instances of this scripts are run in parallel, unexpected results may occur as the instances use the same temporary file with unsynchronized access. Fortunately, there is an easy way to fix this problem: encode the process ID in the file name of the temporary file: #!/bin/bash tmpfile=/tmp/solverx$$.tmp ... The shell variable $$ holds the process ID. As the ID is part of the file name, multiple instances of the script can be run in parallel. As each process has its own unique process ID, each process uses its own temporary file. So, if you want to test solvers that use run scripts that are not designed for parallel usage, fix them first or 'votesmtmp' may not work correctly. Now I want to give you an introduction to the most important options. For an overview of the options call the scripts with '-h'. As VoteSMT uses the 'run' tool, we can use time and memory limits for each solver. This is particularly useful if you test solvers that do not always terminate. You can use the options '-t ' and 's ' to set a time-limit in seconds and a memory limit in MB. Per default, if a solver exceeds a limit, it is treated as an error. However, you can use the options '-i' and '-a' to ignore these cases. Moreover, per default, VoteSMT considers the answer 'unknown' as an error. To change this behavior, use '-u'. The option '-o' can be used to pass arguments to FuzzSMT. For example, if you consider the case that the solvers take quite long to solve the random formulas, you should consider making them easier. It is obvious that if you can only test 10 formulas in one hour, the throughput is too low and fuzzing won't be effective. One way to solve this problem is to tell FuzzSMT that it should use, for instance, fewer variables or fewer reads. For example, if you are fuzzing formulas for QF_AUFBV, call 'fuzzsmt -h' to find out its default settings for this logic. Then, for example, use the VoteSMT option '-o "-mv 1 -Mv 2 -mr 1 -Mr 2"'. Note that this must be an option string. Otherwise, '-v' and '-r' would be interpreted as options for VoteSMT. The provided options tell FuzzSMT to generate SMT benchmarks with a minimum of 1 and a maximum of 2 variables and a minimum of 1 and a maximum of 2 reads. VoteSMT will pass the options to the fuzzer and the formulas will become easier which in turn will increase the throughput. Now to the principle of unresolvedness. The majority voting principle in its basic form is rather problematic as, for instance, a majority of 51% is quite unconvincing. Therefore, I introduced a concept of 'unresolved discrepancies'. The majority must reach a minimum threshold in order to classify the minority as incorrect. It this threshold (default is 66%) is not reached, VoteSMT treats this case as unresolved, i.e. there is not enough evidence in order to decide which solvers are correct and which are incorrect. For each logic, these unresolved discrepancies can be found in the sub-directory 'unresolved'. The threshold can be set via the option '-m'. Its value must be greater than 50 and less than 100. In principle, the higher this threshold is set, the more confidence you can have in the result. Finally, note that you can use my delta-debugger DeltaSMT in order to automatically minimize failure-inducing inputs. DeltaSMT is available under GPL at http://fmv.jku.at/deltasmt If you have comments, questions, improvements or bug reports, then just send them to my mail address: robert.brummayer@gmail.com I will try to answer your mails as soon as possible. Robert