Quick Tip : Multiply by -1.0f

An easy and often missed optimization is when an algorithm inverts the sign of a floating point number by multiplying it by -1.0f. This is slow and unnecessarily uses a multiplier unit on the CPU.

Lets look at how an IEEE-754 32bit float (or 64bit double) is encoded.


The most significant bit (bit 31) is the sign bit. Thus simply doing an XOR with 0x80000000 will do the same as multiplying by -1.0f.

When you consider a mulps (floating point multiply) takes 4 clock cycles on a Skylake CPU or 3 clocks on Broadwell and an xorps only takes 1 clock, depending on the algorithm, it can be a significant savings.

With simple scalar code, most compilers will automatically generate this optimization if you are building with at least -O2 optimizations enabled.

Generalizing SIMD vector sizes

We've already seen the massive performance improvements in several real world scenarios in the previous posts. In this post I'd like...