DescriptionMaximizing the performance on modern hardware platforms has become more and more difficult, due to different levels of parallelism and complicated memory hierarchies. Auto tuning can help developers to address these challenges by writing code that automatically adjusts to the underlying hardware platform. AutoTuneTMP is a new C++-based auto tuning framework that uses just-in-time compilation to enable runtime-instantiable C++ templates. We use C++ template metaprogramming to provide data structures and algorithms that can be used to develop tunable compute kernels. These compute kernels can be tuned with different optimization strategies. We demonstrate for a first prototype the applicability and usefulness of our approach by tuning 6 parameters of a DGEMM implementation, achieving 68% peak performance on an Intel Skylake processor.