CoVeriTeam: On-Demand Composition of Cooperative Verification Systems

Dirk Beyer and Sudeep Kanav
LMU Munich, Germany

Abstract

CoVeriTeam is a tool for on-demand composition of cooperative verification tools. Software verification is a hard and important problem, and many approaches with different strengths exist. Thus, it is essential to combine the strengths of many verification tools via combinations and cooperation. CoVeriTeam provides a systematic way to combine existing verification tools in order to not miss opportunities.

The concept is based on verification artifacts (programs, specifications, witnesses) as basic objects, verification actors (verifiers, validators, testers) as basic operations, and defines composition operators that make it possible to easily describe new compositions, taking verification artifacts as interface between the verification actors. CoVeriTeam consists of a language for composition of verification actors; and its interpreter.

As a result of viewing tools as components, we can now create powerful verification engines that are beyond the possibilities of single tools, avoiding to develop certain components repeatedly. We illustrate the abilities of CoVeriTeam on a few case studies. We expect that CoVeriTeam will help a verification researcher to easily experiment with new tools, and assist her in rapid prototyping various combinations of verification tools.

CoVeriTeam is open source and available as GitLab project at: https://gitlab.com/sosy-lab/software/coveriteam

Contents

This page contains the supplementary material for the article CoVeriTeam: On-Demand Composition of Cooperative Verification Systems.

Performance Evaluation Experiment

The aim of the experiment is to find out the overhead induced by CoVeriTeam.

In order to accurately measure the overhead, we identified and eliminated the factors that might influence the measurements (confounding factors). The factors are as follows:

Infrastructure

Our experiments were executed on machines with the following configuration: one 3.4 GHz CPU (Intel Xeon E3-1230 v5), 8 processing units, 33 GB RAM, and running the operating system Ubuntu 20.04. Each verification task was limited to 8 CPU cores, a CPU time of 15 min, and a memory usage of 15 GB.

Resource Measurements

Since in our experiment we are executing a tool that takes negligible and constant time, we only consider the time taken by CoVeriTeam.

To measure the overhead of CoVeriTeam, we measure the full resource consumption of each execution of CoVeriTeam including the executed tool.

Results: Plots of the Measurements

Fig.1: Box plot for CoVeriTeam overhead for time
Fig.1: Box plot for CoVeriTeam overhead for time
Fig.2: Box plot for CoVeriTeam overhead for memory
Fig.2: Box plot for CoVeriTeam overhead for memory
Fig.3: Quantile plots for CoVeriTeam overhead for time
Fig.3: Quantile plots for CoVeriTeam overhead for time
Fig.4: Quantile plots for CoVeriTeam overhead for memory
Fig.4: Quantile plots for CoVeriTeam overhead for memory

Results: HTML Tables with the Measurements

The following table was produced by the table generator of BenchExec. The resource measurements can be viewed in more detail as HTML table. The HTML tables also support interactive plots; please select Quantile Plot in the Plot drop-down list, and All in the Results drop-down list.

Archive Containing the Results

The supplementary archive contains the XMLs for the resource measurements produced by BenchExec, script to generate the inner resource measurement consumed by the tool, script to generate CSV files from which the plots were generated, and the script to generate the plots from the CSV file. The archive contains:

Discussion of Results

The plots show that the overhead varies much more in the robust configuration as compared to the trusted. Also, we notice that the time overhead varies much more as compared to the memory overhead.

We have discussed the results with the BenchExec developers since we use its features to execute the tool (tool-info modules are part of the BenchExec repository). The default (robust) configuration executes both the tool and the tool-info module in two separate containers. For communicating with the tool-info module, BenchExec uses the standard Python multiprocessing package, which adds some additional overhead. We see more variance for time overhead in the robust configuration due to this reason.

We recommend the experimental user to use the robust configuration. A user concerned about the performance should use the trusted configuration. Trusted configuration should only be used when the user has enough information about the tool-info modules to trust them to not execute anything suspicious while assembling the command-line for the tool execution.