Introduction

My job involves a lot of software optimization. I have tuned and optimized Windows and Android internals, several popular applications and benchmarks. I often find myself repeating the same type of optimizations regardless of the nature of the software or algorithm. This is why I've decided to write a blog on simple optimization techniques that can produce significant gains and show you how to apply them to real world applications.

Many traditional optimization techniques (threading for example) no longer takes advantage of the capabilities of the modern x86 CPU. Thus I plan on covering a wide array of topics such as vectorization with SIMD (AVX instructions), caching and using performance profiling tools like VTune. I'll also link to various reusable optimized open source libraries and provide performance figures proving the gains. What happens in theory and what happens when you put code to metal are often two very different things and can only be qualified with hard data.

So whether its gaming, cryptography or scientific computing, if you want to get the best out of your new AMD Ryzen 1800x or Intel Core i7 7700k, stay tuned and lets untap the true potential of these modern CPUs.

Generalizing SIMD vector sizes

We've already seen the massive performance improvements in several real world scenarios in the previous posts. In this post I'd like...