I have recently done some research into the effects on build and run time performance for C/C++ applications being compiled with Clang when different optimisation passes are used.
It produces some very interesting results. It showed that the '-O3' flag does not always use the most effective selection of optimisation passes for given code. And in fact, it may be more optimal (albeit very time consuming) to produce a custom optimisation pass list per compilation unit. As certain optimisation pass ordering produce good results for some patterns of code but bad results for others. Ideally a series of 'good' optimisation lists should be generated and matched against the patterns in the code that they handle best.
The most interesting results showed the amount of compile time that could be saved for the same performance. This is due to some optimisation passes not being applicable to the code they are being ran on, essentially causing longer build times for no gain. It is possible to configure a benchmarking tool to find and remove these passes in a relatively short period of time. Same as the increase in performance this is highly dependent on the code that is being ran. For a very large complex application these kinds of gains would have be targeted at specific files, or groups of files that are being compiled.
A proposal will be submitted soon for this technique and results to be shown at EuroLLVM 2016.