Construction of Verifier Combinations Based on Off-the-Shelf Verifiers

This webpage presents the result tables for experiments conducted for the article Construction of Verifier Combinations Based on Off-the-Shelf Verifiers by Dirk Beyer, Sudeep Kanav, and Cedric Richter to appear in FASE 2022.

Compositions used for comparative evaluation

We considered the following three kinds of compositions:

Sequential portfolio: In this composition, tools are executed one after another. This kind of composition restricts the amount of time given to each tool to run, and result of the first tool in the sequence that produces the result is taken. This means, that this composition is expected to solve a lot of tasks that can be solved fast, but perform worse on the tasks that require more time.
Parallel portfolio: In this composition, tools are executed in parallel. This kind of composition restricts the amount of memory given to each tool to run, and result of the tool that produces result fastest is taken. This means, that this composition is expected to solve a lot of tasks that can be solved fast and require less memory, but perform worse on the tasks that require more memory. Also, if a tool produces an incorrect result faster; that incorrect result will be chosen.
Algorithm selection: In this composition, a selector is used to select the verifier to be executed. We have used a machine learning based selector. Since the selector trains a model harshly punishing the tools that produce incorrect results, this composition is expected to produce the least amount of incorrect results, although it might also produce lesser number of correct results in comparison to other two compositions. We expect this composition to dominate in the quadrant of verification tasks that requires more memory and time.

Tool selection

The following tools were selected for composition:

cpa-seq
esbmc-kind
symbiotic
uautomizer
cbmc
divine
goblint
utaipan

We created compositions using different number (2, 3, 4, and 8) of tools from the above list of verifiers. A composition of n tools means that we have used first n tools from the above list. This gave us a comparison of a composition of 1 tool (i.e., the best performing tool), 2, 3, and 4. And then we decided not to run combinations of all sizes, so we executed it for 8 (next power of 2) to give some insight in the performance trend.

Resource restrictions

For a fair comparison, all compositions were executed on the same infratructure as sv-comp, and with the same resource restrictions: 15 minutes of CPU time, and 15 GB of memory, and 8 CPU cores.

Result tables

Following is the list of evaluations: