We are hiring new doctoral researchers, student research assistants, and tutors. Apply now!
2 papers accepted at FSE 2024!

Publications about Benchmarking

Articles in journal or book chapters

  1. Dirk Beyer, Stefan Löwe, and Philipp Wendler. Reliable Benchmarking: Requirements and Solutions. International Journal on Software Tools for Technology Transfer (STTT), 21(1):1-29, 2019. doi:10.1007/s10009-017-0469-y Link to this entry Keyword(s): Benchmarking Publisher's Version PDF Presentation Supplement
    Abstract
    Benchmarking is a widely used method in experimental computer science, in particular, for the comparative evaluation of tools and algorithms. As a consequence, a number of questions need to be answered in order to ensure proper benchmarking, resource measurement, and presentation of results, all of which is essential for researchers, tool developers, and users, as well as for tool competitions. We identify a set of requirements that are indispensable for reliable benchmarking and resource measurement of time and memory usage of automatic solvers, verifiers, and similar tools, and discuss limitations of existing methods and benchmarking tools. Fulfilling these requirements in a benchmarking framework can (on Linux systems) currently only be done by using the cgroup and namespace features of the kernel. We developed BenchExec, a ready-to-use, tool-independent, and open-source implementation of a benchmarking framework that fulfills all presented requirements, making reliable benchmarking and resource measurement easy. Our framework is able to work with a wide range of different tools, has proven its reliability and usefulness in the International Competition on Software Verification, and is used by several research groups worldwide to ensure reliable benchmarking. Finally, we present guidelines on how to present measurement results in a scientifically valid and comprehensible way.
    BibTeX Entry
    @article{Benchmarking-STTT, author = {Dirk Beyer and Stefan L{\"o}we and Philipp Wendler}, title = {Reliable Benchmarking: {R}equirements and Solutions}, journal = {International Journal on Software Tools for Technology Transfer (STTT)}, volume = {21}, number = {1}, pages = {1--29}, year = {2019}, doi = {10.1007/s10009-017-0469-y}, sha256 = {a50fbc212af394b32166d6354f986e7b1d5bc87220bdc50df899d6a46fedf33c}, url = {https://www.sosy-lab.org/research/benchmarking/}, presentation = {https://www.sosy-lab.org/research/prs/Latest_ReliableBenchmarking.pdf}, abstract = {Benchmarking is a widely used method in experimental computer science, in particular, for the comparative evaluation of tools and algorithms. As a consequence, a number of questions need to be answered in order to ensure proper benchmarking, resource measurement, and presentation of results, all of which is essential for researchers, tool developers, and users, as well as for tool competitions. We identify a set of requirements that are indispensable for reliable benchmarking and resource measurement of time and memory usage of automatic solvers, verifiers, and similar tools, and discuss limitations of existing methods and benchmarking tools. Fulfilling these requirements in a benchmarking framework can (on Linux systems) currently only be done by using the cgroup and namespace features of the kernel. We developed BenchExec, a ready-to-use, tool-independent, and open-source implementation of a benchmarking framework that fulfills all presented requirements, making reliable benchmarking and resource measurement easy. Our framework is able to work with a wide range of different tools, has proven its reliability and usefulness in the International Competition on Software Verification, and is used by several research groups worldwide to ensure reliable benchmarking. Finally, we present guidelines on how to present measurement results in a scientifically valid and comprehensible way.}, keyword = {Benchmarking}, _pdf = {https://www.sosy-lab.org/research/pub/2019-STTT.Reliable_Benchmarking_Requirements_and_Solutions.pdf}, annote = {Publication appeared first online in November 2017<BR/> BenchExec is available at: <a href="https://github.com/sosy-lab/benchexec"> https://github.com/sosy-lab/benchexec</a>}, }
    Additional Infos
    Publication appeared first online in November 2017
    BenchExec is available at: https://github.com/sosy-lab/benchexec
  2. Philipp Wendler. Beiträge zu praktikabler Prädikatenanalyse. In S. Hölldobler, editors, Ausgezeichnete Informatikdissertationen 2017, LNI, pages 261-270, 2018. Gesellschaft für Informatik (GI). Link to this entry Keyword(s): Benchmarking, CPAchecker, Software Model Checking Publisher's Version PDF Presentation Supplement
    Abstract
    Der Stand der Forschung im Bereich der automatischen Software-Verifikation ist fragmentiert. Verschiedene Verfahren existieren nebeneinander in unterschiedlichen Darstellungen und mit wenig Bezug zueinander, aussagekräftige Vergleiche sind selten. Die Dissertation adressiert dieses Problem. Ein konfigurierbares und flexibles Rahmenwerk zur Vereinheitlichung solcher Verfahren wird entwickelt und mehrere vorhandene Verfahren werden in diesem Rahmenwerk ausgedrückt. Dies bringt neue Erkenntnisse über die Kernideen dieser Verfahren, ermöglicht experimentelle Studien in einer neuartigen Qualität, und erleichtert die Forschung an Kombinationen und Weiterentwicklungen dieser Verfahren. Die Implementierung dieses Rahmenwerks im erfolgreichen Verifizierer CPAchecker wird in der bisher größten derartigen experimentellen Studie (120 verschiedene Konfigurationen, 671280 Ausführungen) evaluiert. Hierzu wird ein Benchmarking-System präsentiert, das mit Hilfe moderner Technologien signifikante qualitative Messfehler existierender Systeme vermeidet.
    BibTeX Entry
    @incollection{DissZusammenfassungWendler, author = {Philipp Wendler}, title = {Beitr{\"{a}}ge zu praktikabler Pr{\"{a}}dikatenanalyse}, booktitle = {Ausgezeichnete Informatikdissertationen 2017}, editor = {S. H{\"{o}}lldobler}, volume = {{D-18}}, pages = {261-270}, year = {2018}, series = {{LNI}}, publisher = {Gesellschaft f{\"{u}}r Informatik ({GI})}, isbn = {978-3885799771}, url = {https://www.sosy-lab.org/research/phd/wendler/}, pdf = {https://dl.gi.de/bitstream/handle/20.500.12116/19476/invited_paper_4.pdf?sequence=1&isAllowed=y}, presentation = {https://www.sosy-lab.org/research/prs/2018-05-08_GiDiss_BeitraegeZuPraktikablerPraedikatenanalyse.pdf}, abstract = {Der Stand der Forschung im Bereich der automatischen Software-Verifikation ist fragmentiert. Verschiedene Verfahren existieren nebeneinander in unterschiedlichen Darstellungen und mit wenig Bezug zueinander, aussagekr&auml;ftige Vergleiche sind selten. Die Dissertation adressiert dieses Problem. Ein konfigurierbares und flexibles Rahmenwerk zur Vereinheitlichung solcher Verfahren wird entwickelt und mehrere vorhandene Verfahren werden in diesem Rahmenwerk ausgedr&uuml;ckt. Dies bringt neue Erkenntnisse &uuml;ber die Kernideen dieser Verfahren, erm&ouml;glicht experimentelle Studien in einer neuartigen Qualit&auml;t, und erleichtert die Forschung an Kombinationen und Weiterentwicklungen dieser Verfahren. Die Implementierung dieses Rahmenwerks im erfolgreichen Verifizierer CPAchecker wird in der bisher gr&ouml;&szlig;ten derartigen experimentellen Studie (120 verschiedene Konfigurationen, 671280 Ausf&uuml;hrungen) evaluiert. Hierzu wird ein Benchmarking-System pr&auml;sentiert, das mit Hilfe moderner Technologien signifikante qualitative Messfehler existierender Systeme vermeidet.}, keyword = {Benchmarking,CPAchecker,Software Model Checking}, annote = {This is a German summary of the dissertation <a href="https://www.sosy-lab.org/research/bib/Year/2017.complete.html#PhilippPredicateAnalysis">Towards Practical Predicate Analysis</a>.}, doifalse = {20.500.12116/19476}, urlpub = {https://dl.gi.de/handle/20.500.12116/19476}, }
    Additional Infos
    This is a German summary of the dissertation Towards Practical Predicate Analysis.

Articles in conference or workshop proceedings

  1. Dirk Beyer and Philipp Wendler. CPU Energy Meter: A Tool for Energy-Aware Algorithms Engineering. In Proceedings of the 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2020, Dublin, Ireland, April 25-30), part 2, LNCS 12079, pages 126-133, 2020. Springer. doi:10.1007/978-3-030-45237-7_8 Link to this entry Keyword(s): Benchmarking Publisher's Version PDF Presentation Video Supplement
    Abstract
    Verification algorithms are among the most resource-intensive computation tasks. Saving energy is important for our living environment and to save cost in data centers. Yet, researchers compare the efficiency of algorithms still in terms of consumption of CPU time (or even wall time). Perhaps one reason for this is that measuring energy consumption of computational processes is not as convenient as measuring the consumed time and there is no sufficient tool support. To close this gap, we contribute CPU Energy Meter, a small tool that takes care of reading the energy values that Intel CPUs track inside the chip. In order to make energy measurements as easy as possible, we integrated CPU Energy Meter into BenchExec, a benchmarking tool that is already used by many researchers and competitions in the domain of formal methods. As evidence for usefulness, we explored the energy consumption of some state-of-the-art verifiers and report some interesting insights, for example, that energy consumption is not necessarily correlated with CPU time.
    BibTeX Entry
    @inproceedings{TACAS20b, author = {Dirk Beyer and Philipp Wendler}, title = {CPU Energy Meter: A Tool for Energy-Aware Algorithms Engineering}, booktitle = {Proceedings of the 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS~2020, Dublin, Ireland, April 25-30), part 2}, pages = {126-133}, year = {2020}, series = {LNCS~12079}, publisher = {Springer}, doi = {10.1007/978-3-030-45237-7_8}, sha256 = {c5c8ad06f4b192e61799469a8fc6ca4661714aa2945e0ce07363a376ff06dcd7}, url = {https://www.sosy-lab.org/research/energy-measurement/}, presentation = {https://www.sosy-lab.org/research/prs/2021-03-31_TACAS20_CPU-Energy-Meter_Dirk.pdf}, abstract = {Verification algorithms are among the most resource-intensive computation tasks. Saving energy is important for our living environment and to save cost in data centers. Yet, researchers compare the efficiency of algorithms still in terms of consumption of CPU time (or even wall time). Perhaps one reason for this is that measuring energy consumption of computational processes is not as convenient as measuring the consumed time and there is no sufficient tool support. To close this gap, we contribute CPU Energy Meter, a small tool that takes care of reading the energy values that Intel CPUs track inside the chip. In order to make energy measurements as easy as possible, we integrated CPU Energy Meter into BenchExec, a benchmarking tool that is already used by many researchers and competitions in the domain of formal methods. As evidence for usefulness, we explored the energy consumption of some state-of-the-art verifiers and report some interesting insights, for example, that energy consumption is not necessarily correlated with CPU time.}, keyword = {Benchmarking}, video = {https://youtu.be/qzKAoBVTw2c}, }
  2. Dirk Beyer, Stefan Löwe, and Philipp Wendler. Benchmarking and Resource Measurement. In B. Fischer and J. Geldenhuys, editors, Proceedings of the 22nd International Symposium on Model Checking of Software (SPIN 2015, Stellenbosch, South Africa, August 24-26), LNCS 9232, pages 160-178, 2015. Springer-Verlag, Heidelberg. doi:10.1007/978-3-319-23404-5_12 Link to this entry Keyword(s): Benchmarking Publisher's Version PDF Supplement
    Abstract
    Proper benchmarking and resource measurement is an important topic, because benchmarking is a widely-used method for the comparative evaluation of tools and algorithms in many research areas. It is essential for researchers, tool developers, and users, as well as for competitions. We formulate a set of requirements that are indispensable for reproducible benchmarking and reliable resource measurement of automatic solvers, verifiers, and similar tools, and discuss limitations of existing methods and benchmarking tools. Fulfilling these requirements in a benchmarking framework is complex and can (on Linux) currently only be done by using the cgroups feature of the kernel. We provide BenchExec, a ready-to-use, tool-independent, and free implementation of a benchmarking framework that fulfills all presented requirements, making reproducible benchmarking and reliable resource measurement easy. Our framework is able to work with a wide range of different tools and has proven its reliability and usefulness in the International Competition on Software Verification.
    BibTeX Entry
    @inproceedings{SPIN15a, author = {Dirk Beyer and Stefan L{\"o}we and Philipp Wendler}, title = {Benchmarking and Resource Measurement}, booktitle = {Proceedings of the 22nd International Symposium on Model Checking of Software (SPIN~2015, Stellenbosch, South Africa, August 24-26)}, editor = {B.~Fischer and J.~Geldenhuys}, pages = {160-178}, year = {2015}, series = {LNCS~9232}, publisher = {Springer-Verlag, Heidelberg}, isbn = {978-3-319-23403-8}, doi = {10.1007/978-3-319-23404-5_12}, url = {https://www.sosy-lab.org/research/benchmarking/}, pdf = {https://www.sosy-lab.org/research/pub/2015-SPIN.Benchmarking_and_Resource_Measurement.pdf}, abstract = {Proper benchmarking and resource measurement is an important topic, because benchmarking is a widely-used method for the comparative evaluation of tools and algorithms in many research areas. It is essential for researchers, tool developers, and users, as well as for competitions. We formulate a set of requirements that are indispensable for reproducible benchmarking and reliable resource measurement of automatic solvers, verifiers, and similar tools, and discuss limitations of existing methods and benchmarking tools. Fulfilling these requirements in a benchmarking framework is complex and can (on Linux) currently only be done by using the cgroups feature of the kernel. We provide BenchExec, a ready-to-use, tool-independent, and free implementation of a benchmarking framework that fulfills all presented requirements, making reproducible benchmarking and reliable resource measurement easy. Our framework is able to work with a wide range of different tools and has proven its reliability and usefulness in the International Competition on Software Verification.}, keyword = {Benchmarking}, annote = {An <a href="https://www.sosy-lab.org/research/bib/Year/2017.complete.html#Benchmarking-STTT">extended version</a> of this article appeared in STTT.}, }
    Additional Infos
    An extended version of this article appeared in STTT.

Theses and projects (PhD, MSc, BSc, Project)

  1. Charlotte Gall. Updating the BenchExec Core Assignment for Modern CPU Architecture. Bachelor's Thesis, LMU Munich, Software Systems Lab, 2023. Link to this entry Keyword(s): Benchmarking
    BibTeX Entry
    @misc{GallBenchexecCoreAssignment, author = {Charlotte Gall}, title = {Updating the BenchExec Core Assignment for Modern CPU Architecture}, year = {2023}, keyword = {Benchmarking}, field = {Computer Science}, howpublished = {Bachelor's Thesis, LMU Munich, Software Systems Lab}, }
  2. Tobias Kleinert. Developing a Verifier Based on Parallel Portfolio with . Bachelor's Thesis, LMU Munich, Software Systems Lab, 2022. Link to this entry Keyword(s): Benchmarking PDF Presentation
    BibTeX Entry
    @misc{KleinertParPortfolioCoVeriTeam, author = {Tobias Kleinert}, title = {Developing a Verifier Based on Parallel Portfolio with \textsc{CoVeriTeam}}, year = {2022}, pdf = {https://www.sosy-lab.org/research/bsc/2022.Kleinert.Parallel_Portfolio_CoVeriTeam.pdf}, presentation = {https://www.sosy-lab.org/research/prs/2022-03-16_BA_Parallel_Portfolio_CoVeriTeam.pdf}, keyword = {Benchmarking}, field = {Computer Science}, howpublished = {Bachelor's Thesis, LMU Munich, Software Systems Lab}, }
  3. Robin Gloster. Cgroups v2 Support for . Bachelor's Thesis, LMU Munich, Software Systems Lab, 2022. Link to this entry Keyword(s): Benchmarking PDF Presentation
    BibTeX Entry
    @misc{GlosterCgroupsV2, author = {Robin Gloster}, title = {Cgroups v2 Support for \textsc{BenchExec}}, year = {2022}, pdf = {https://www.sosy-lab.org/research/bsc/2022.Gloster.Cgroups_v2_Support_for_BenchExec.pdf}, presentation = {https://www.sosy-lab.org/research/prs/2022-03-09_BA_Cgroups_v2_Support_for_BenchExec_Gloster.pdf}, keyword = {Benchmarking}, field = {Computer Science}, howpublished = {Bachelor's Thesis, LMU Munich, Software Systems Lab}, }
  4. Dennis Simon. Shareable Benchmarking Reports with Enhanced Filters and Dynamic Statistics for . Bachelor's Thesis, LMU Munich, Software Systems Lab, 2021. Link to this entry Keyword(s): Benchmarking PDF Presentation
    BibTeX Entry
    @misc{SimonBA, author = {Dennis Simon}, title = {Shareable Benchmarking Reports with Enhanced Filters and Dynamic Statistics for \textsc{BenchExec}}, year = {2021}, pdf = {https://www.sosy-lab.org/research/bsc/2021.Simon.Shareable_Benchmarking_Reports_with_Enhanced_Filters_and_Dynamic_Statistics_for_BenchExec.pdf}, presentation = {https://www.sosy-lab.org/research/prs/2021-04-28_BA_ShareableBenchmarkingReportsWithEnhancedFiltersAndDynamicStatisticsForBenchExec_Simon.pdf}, keyword = {Benchmarking}, field = {Computer Science}, howpublished = {Bachelor's Thesis, LMU Munich, Software Systems Lab}, }
  5. Petros Isaakidis. Energy Consumption Prediction of Verification Work. Bachelor's Thesis, LMU Munich, Software Systems Lab, 2020. Link to this entry Keyword(s): CPAchecker, Benchmarking, Energy Measurement
    BibTeX Entry
    @misc{IsaakidisEnergy, author = {Petros Isaakidis}, title = {Energy Consumption Prediction of Verification Work}, year = {2020}, keyword = {CPAchecker, Benchmarking, Energy Measurement}, field = {Computer Science}, howpublished = {Bachelor's Thesis, LMU Munich, Software Systems Lab}, }
  6. Maximilian Hailer. Measuring and Optimizing Energy Consumption of Verification Work on Clusters. Bachelor's Thesis, LMU Munich, Software Systems Lab, 2019. Link to this entry Keyword(s): Benchmarking, Energy Measurement PDF Presentation
    BibTeX Entry
    @misc{HailerEnergy, author = {Maximilian Hailer}, title = {Measuring and Optimizing Energy Consumption of Verification Work on Clusters}, year = {2019}, pdf = {https://www.sosy-lab.org/research/bsc/2019.Hailer.Measuring_and_Optimizing_Energy_Consumption_of_Verification_Work_on_Clusters.pdf}, presentation = {https://www.sosy-lab.org/research/prs/2019-10-30_BA_MeasuringAndOptimizingEnergyConsumptionOfVerificationWork_Hailer.pdf}, keyword = {Benchmarking, Energy Measurement}, field = {Computer Science}, howpublished = {Bachelor's Thesis, LMU Munich, Software Systems Lab}, }
  7. Laura Bschor. Modern Architecture and Improved UI for Tables of BenchExec. Bachelor's Thesis, LMU Munich, Software Systems Lab, 2019. Link to this entry Keyword(s): Benchmarking PDF Presentation
    BibTeX Entry
    @misc{BschorTables, author = {Laura Bschor}, title = {Modern Architecture and Improved {UI} for Tables of {{\sc BenchExec}}}, year = {2019}, pdf = {https://www.sosy-lab.org/research/bsc/2019.Bschor.Modern_Architecture_and_Improved_UI_for_Tables_of_BenchExec.pdf}, presentation = {https://www.sosy-lab.org/research/prs/2019-11-06_BA_ModernArchitectureAndImprovedUIforTablesOfBenchExec_Bschor.pdf}, keyword = {Benchmarking}, field = {Computer Science}, howpublished = {Bachelor's Thesis, LMU Munich, Software Systems Lab}, }
  8. Philipp Wendler. Towards Practical Predicate Analysis. PhD Thesis, University of Passau, Software Systems Lab, 2017. Link to this entry Keyword(s): Benchmarking, CPAchecker, Software Model Checking Publisher's Version PDF Presentation Supplement
    BibTeX Entry
    @misc{PhilippPredicateAnalysis, author = {Philipp Wendler}, title = {Towards Practical Predicate Analysis}, year = {2017}, url = {https://www.sosy-lab.org/research/phd/wendler/}, pdf = {https://www.sosy-lab.org/research/phd/2017.Wendler.Towards_Practical_Predicate_Analysis.pdf}, presentation = {https://www.sosy-lab.org/research/prs/2017-11-20_RigorosumWendler_TowardsPracticalPredicateAnalysis.pdf}, keyword = {Benchmarking,CPAchecker,Software Model Checking}, annote = {Nominated for the <a href="https://gi.de/aktuelles/wettbewerbe/dissertationspreis/">Dissertation award 2017</a> of the German <a href="https://gi.de/">Gesellschaft f&uuml;r Informatik (GI)</a>}, howpublished = {PhD Thesis, University of Passau, Software Systems Lab}, urn = {urn:nbn:de:bvb:739-opus4-5098}, }
    Additional Infos
  9. Nils Steinger. Measuring, Visualizing, and Optimizing the Energy Consumption of Computer Clusters. Bachelor's Thesis, University of Passau, Software Systems Lab, 2017. Link to this entry Keyword(s): Benchmarking PDF Supplement
    BibTeX Entry
    @misc{SteingerMeasuring, author = {Nils Steinger}, title = {Measuring, Visualizing, and Optimizing the Energy Consumption of Computer Clusters}, year = {2017}, url = {https://www.sosy-lab.org/research/bsc/steinger}, pdf = {https://www.sosy-lab.org/research/bsc/2017.Steinger.Measuring,_Visualizing,_and_Optimizing_the_Energy_Consumption_of_Computer_Clusters.pdf}, keyword = {Benchmarking}, field = {Computer Science}, howpublished = {Bachelor's Thesis, University of Passau, Software Systems Lab}, }

Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Last modified: Fri Apr 19 01:04:40 2024 UTC