Construction of Verifier Combinations From Off-the-Shelf Components

This webpage presents the result tables for experiments conducted for the article Construction of Verifier Combinations From Off-the-Shelf Components by Dirk Beyer, Sudeep Kanav, Tobias Kleinert, and Cedric Richter to be submitted to FASE 2022 special issue in the journal Formal Methods in System Design.

A previous version of this paper was published in the FASE 2022 proceedings. Supplementary webpage for the FASE 2022 version of the paper has been moved here.

Compositions used for comparative evaluation

We considered the following three kinds of compositions:

  1. Sequential portfolio: In this composition, tools are executed one after another. This kind of composition restricts the amount of time given to each tool to run, and result of the first tool in the sequence that produces the result is taken. This means, that this composition is expected to solve a lot of tasks that can be solved fast, but perform worse on the tasks that require more time.
  2. Parallel portfolio: In this composition, tools are executed in parallel. This kind of composition restricts the amount of memory given to each tool to run, and result of the tool that produces result fastest is taken. This means, that this composition is expected to solve a lot of tasks that can be solved fast and require less memory, but perform worse on the tasks that require more memory. Also, if a tool produces an incorrect result faster; that incorrect result will be chosen.
  3. Algorithm selection: In this composition, a selector is used to select the verifier to be executed. We have used a machine learning based selector. Since the selector trains a model harshly punishing the tools that produce incorrect results, this composition is expected to produce the least amount of incorrect results, although it might also produce lesser number of correct results in comparison to other two compositions. We expect this composition to dominate in the quadrant of verification tasks that requires more memory and time.

Selection of verifier+validator combinations for the compositions

Instead of creating combinations of verifiers, we used combinations of verifier+validator. Each verifier+validator combination runs a verifier and then a portfolio of validators. The output is considered only if a validator is able to confirm the results of a verifier.

The verifier+validator combinations of the following verifiers were selected for composition:

  1. cpachecker
  2. esbmc-kind
  3. symbiotic
  4. uautomizer
  5. cbmc
  6. pinaka
  7. utaipan
  8. 2ls

We created compositions using different number (2, 3, 4, and 8) of tools from the above list of verifiers. A composition of n tools means that we have used first n tools from the above list. This gave us a comparison of a composition of 1 tool (i.e., the best performing tool), 2, 3, and 4. And then we decided not to run combinations of all sizes, so we executed it for 8 (next power of 2) to give some insight in the performance trend.

Resource restrictions

For a fair comparison, all compositions were executed on the same infratructure as sv-comp, and with the same resource restrictions: 15 minutes of CPU time, and 15 GB of memory, and 8 CPU cores.

Result tables

Following is the list of evaluations:

  1. Comparison of verifier+validator combinations
  2. Comparison of tools in sequential composition
  3. Comparison of tools in portfolio composition
  4. Comparison of tools using machine learning based tool selection