Intro
mubench is an in-depth, low-level benchmark for x86 processors. Its primary goal is to provide useful information for people who optimize assembly code and for people who write compilers. It measures latency and throughput for each individual instruction (sometimes several forms of the same instruction), as well as the throughput of arbitrary instruction mixes. The results produced by mubench are typically an order of magnitude more detailed than those found in AMD or Intel manuals.
mubench results for a variety of processors are available. If you find this information useful, please run mubench on your processor and upload the results.
mubench fully supports all SIMD instruction sets for the x86, including SSSE3, SSE3, SSE2, SSE, MMX, MMX Ext, 3DNow! and 3DNow! Ext. Support for non-SIMD instructions is partial: most data move, binary arithmetic, logical, shift/rotate and bit/byte instructions are supported, but other instructions, particularly branch and function call instructions or instructions manipulating the stack, are not supported. Floating-point instructions for the x87 are not supported. mubench only uses register-to-register (or immediate) forms of the instructions; memory operands are not supported. These limitations will be gradually removed in later releases.
Running
perl mubench.pl [options]
Options:
--(no-)accurate runs tests several times (default on) --mhz=2500 processor speed in MHz (normally autodetected from /proc/cpuinfo, set here if that is wrong, for example if you have SpeedStep enabled) --(no-)64bit benchmark 64-bit (amd64, emt64, x86-64) instructions (default autodetected) --(no-)32bit benchmark 32-bit instructions --(no-)pairs benchmark instruction mixes (default on, very slow; use --no-pairs for a very fast benchmark that runs in minutes) --include=add,sub benchmark only instructions matching the given list of patterns (regular expressions ok) --output=xml|csv|text select output format --outfile=file.xml output file to save results to (default mubench-results-<date>.xml if xml, standard output otherwise)
Run this benchmark on an otherwise idle system, or as close as possible to idle (the benchmark will try to compensate for occasional cpu usage).
The full benchmark takes 6-9 hours to comlpete on a x86-64 system, or 2-3 hours on a x86 system since there are fewer instructions to try.
Some errors are normal when running the benchmark, as it tries to compile and run instruction sets you may not have (just in case ;)
Contribute results
Run perl mubench.pl
with no options. It will produce a file "mubench-results-<date>.xml.bz2". This takes 6-9 hours. If you would like to run a quick benchmark, run perl mubench.pl --no-pairs
which takes 5-10 minutes and produces a limited set of results. Both forms are extremely helpful, and will be used to expand this site.
To upload your results, please go to the Support Requests > Submit New part of the SourceForge project page of mubench. Under "Upload and Attach a File:" click "Browse..." and select the "mubench-results-<date>.xml.bz2" file produced by mubench. In the Summary field, write "RESULT" and a description of your processor, for example "RESULT: Pentium M 1.4GHz". Click "UPLOAD".
Thanks!
Output
When running with --output=text
, the output looks like this:
instruction 1 instruction 2 latency throughput --------------------------------------------------------------- add r64, r64 1.0047 1.0076 add r32, r32 1.0043 0.47108 ...
All numbers are measured in clock cycles.
Latency = 2 means it takes two clock cycles for the result to be available. Throughput = 2 means a new instruction of the same kind can only be started once every two clock cycles (this is actually the reciprocal throughput, which is the form commonly used when talking about assembly code). Note that smaller latency and smaller throughput are faster. Many instructions on recent processors have throughput < 1, meaning more than one of the same instruction can run in the same clock cycle. It is normal to have some non-integer values, although a lot of instructions will typically have throughput = 1. The same instruction with different operands may have different performance.
Requires
Perl modules: IPC::Run >= 0.80 (built-in since mubench-0.2.1)
Recent versions of gcc and binutils (gcc >= 3.3, binutils >= 2.16.92 for SSSE3/MNI support) which must be in your path
Other utilities in the path: bzip2, md5sum, uname
Files used
Creates test.c and test in the current working directory.
Tries to read /proc/cpuinfo on startup.
See also
Software Optimization Guide for the AMD64 Processors, AMD (publication 25112)
IA-32 Intel Architecture Optimization Reference Manual, Intel (publication 248966)
Software optimization resources, Agner Fog
SSE/MMX docs, Stefano Tommesani
Copyright and license
Copyright 2006 by Alex Izvorski. mubench is licensed under the terms of the GNU General Public License.