Supercomputing Benchmark for CPU and GPU (FLOPS/GFLOPS)
by Marc Gloor

Copyright (C) 1992 Al Aburto <aburto@nosc.mil>
Copyright (C) 2000 Marc Gloor (K&R modifications)
Copyright (C) 2026 Marc Gloor (complete AI modernization)

Original K&R C code 1992 by Al Aburto (Upstream).
Benchmark modified 2026 for Linux by Marc Gloor.

My FLOPS/GFLOPS benchmark program was modernized early 2026 for benchmarking CPU and GPU speed. It measures the raw computational throughput of a processor (CPU and GPU) by executing a high-intensity floating-point loop across all available CPU cores. By utilizing Fused Multiply-Add (FMA) style operations, it estimates the theoretical raw Peak GFLOPS of the system.

Floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations. The fastest supercomputer (Nov 25) was the HPE Cray EX2555a with a peak of 2821 PFLOPS. In comparison, your i9 computer with 24 cores makes roughly 20 GFLOPS. In times of AI and their usage of vector databases, raw GPU floatingpoint processing power is crucial.




TECHNICAL SPECIFICATIONS

* Two versions part of this package, a CPU and a GPU version
* Parallelization: OpenMP (Automatic thread scaling)
* Operation Type: Single-precision Floating Point (float)
* Arithmetic: a = a * b + c (10 ops per iteration)
* Iterations: 1,000,000,000 (Defined as ITERATIONS)
* Cache Target: L1 Cache (Minimal memory footprint to avoid RAM bottlenecks)


COMPILATION

Use a compiler with OpenMP support (GCC/Clang recommended).

Build command (GNU Make), type 'make' at the console or use the compilers manually:

  $ gcc -O3 -fopenmp flops.c -o flops-cpu
  $ nvcc -O3 flops.cu -o flops-gpu

Note: -O3 is essential for the compiler to vectorize the arithmetic and pipelining correctly.


RUN THE BENCHMARK

Ensure the system cpu and gpu load is close to 0 before running flops. Stop any interfering job, process, network activity such as downloads, servers, clients.
Run the binary in a terminal:  ./flops

The program will display:
- Detected CPU/GPU Core count
- Total execution time
- Calculated MFLOPS/GFLOPS

Example output:

    


CAUTION

This test generates significant heat as it pushes all CPU/GPU cores to 100% load. Ensure your cooling system is operational.


$Id: floatingpoint.html,v 1.19 2026/03/12 13:30:21 gloor Exp $
Author:
marc_dot_gloor_at_u_dot_nus_dot_edu


home