The Language, Optimizer, and Tools Mess

Erik Altman, IBM


For many years, new languages promised higher performance in addition to other desirable characteristics such as higher productivity, better static checking, better reusability and composability, etc. However, in practice, virtually no new languages win in performance over historic languages like C and Fortran. Why? Higher level semantics were supposed to provide additional information so that optimizers could perform better analyses and so that conservative analyses could be less conservative.

In practice, a number of factors have conspired against this vision. For example, the number of degrees of freedom offered by new languages appears to have increased faster than their abstraction level, thus obfuscating intent and hindering analysis. Higher abstraction levels have implicitly encouraged inefficient memory use. The large size of programs enabled by modern languages makes reasoning about them more difficult, especially since many of our analyses are nonlinear, a problem whether optimizations are static or dynamic. The large size of programs and the pairing of code from multiple libraries and frameworks also makes it more challenging for developers to test their code in realistic deployment scenarios.

This talk will offer some musings about these challenges and some ways we might address them.

Erik Altman has been with IBM Research for more than 15 years, and has served as a Research Staff Member and as Technical Assistant to the Vice-President for Systems. He currently manages the Dynamic Optimization Group at the IBM T.J. Watson Research Center. His research has focused on compilers, architecture and microarchitecture, and most recently performance tools for enterprise software and enabling heterogeneous systems. He has authored or co-authored more than 30 conference and journal papers, and has more than 25 patents and pending patent applications. He was one of the originators of IBM's DAISY binary translation project that has great impact on both industry and academia. He was also one of the original architects of the Cell processor chip that is deployed in the Sony PlayStation 3 game consoles. He is currently the Editor-in-Chief of IEEE Micro and the Chair of ACM (Association for Computing Machinery) SIGMICRO (Special Interest Group on Microarchitecture). He has been the program co-chair and general chair of PACT (International Conference on Parallel Architecture and Compilation Techniques). He has also served as program chair for CASES (International Conference on Compilers, Architecture, and Synthesis for Embedded Systems), and is currently program co-chair for NPC'2011(Network and Parallel Computing). He co-founded the Workshop on Binary Translation and has served on numerous program committees in leading research conferences.

He has also served as guest editor of IEEE Computer, the ACM Journal of Instruction Level Parallelism (JILP),and the IBM Journal of Research and Development. He received his PhD from McGill University in 1995.

Formally Verifying a Compiler: Why? How? How Far?

Xavier Leroy, INRIA


Given the complexity and sophistication of code generation and optimization algorithms, and the difficulty of systematically testing a compiler, it is unsurprising that bugs occur in compilers and cause miscompilation: incorrect executable code is silently generated from a correct source program. The formal verification of a compiler is a radical solution to the miscompilation issue. By applying formal methods (program proof) to the compiler itself, compiler verification proves, with mathematical certainty, that the generated executable code behaves exactly as prescribed by the semantics of the source program.

Why indulge in compiler verification? Miscompilation bugs are uncommon enough that, for ordinary, non-critical software, they are negligible compared with the bugs already present in the source program. This is no longer true, however, for critical software, as found in aircraft, medical equipment or nuclear plants, for example. There, miscompilation is a concern, which is currently addressed in unsatisfactory ways such as turning all optimizations off. The risk of miscompilation also diminishes the usefulness of formal methods (model checking, static analysis, program proof) applied at the source level: the guarantees so painfully obtained by sourcelevel verification may not extend to the compiled code that actually runs.

How to verify a compiler? For every pass, there is a choice between 1- proving its implementation correct once and for all, and 2- leaving the implementation untrusted, but at each run feed its input and output into a validator: a separate algorithm that tries to establish semantic preservation between input and output code, and aborts compilation (or turns the optimization off) if it cannot. If the soundness of the validator is proved once and for all, approach 2 provides soundness guarantees as strong as approach 1, often at reduced proof costs. I will illustrate both approaches on the verification of two compilation passes: register allocation by graph coloring, and software pipelining.

How far can compiler verification go? The state of the art is the CompCert compiler, which compiles almost all of the C language to PowerPC, ARM and x86 assembly and has been mechanically verified using the Coq proof assistant. From a compiler standpoint, however, CompCert leaves much room for improvement: the compilation technology used is early 1980.s, few optimizations are implemented, and the performance of the compiled code barely matches that of gcc -O1. Much future work remains, in particular on loop optimizations, which raise semantic challenges that call for principled solutions.

Xavier Leroy is a senior research scientist at INRIA near Paris, where he leads the Gallium research team. He received his PhD from University of Paris 7 in 1992. His work focuses on programming languages and systems and their interface with the formal verification of software for safety and security. He is the architect and lead author of the Objective Caml functional programming language and of its implementation. He has published about 50 research papers, and received the 2007 Monpetit prize of the French Academy of sciences.