|EPHAM||Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted MethodsProgram Finalized|
|ODES||7th Workshop on Optimizations for DSP and Embedded
||OPEN64|| 2nd Annual Workshop on Open64 ||WISH|| Workshop on Infrastructures for Software/Hardware co-design
Program finalized||STMCS|| canceled|
|SSA-RA||Tutorial on SSA-based Register Allocation|
|7:30am - 5:30pm||Registration - registration desk|
|7:45am - 8:30am||Breakfast|
|8:30am - 10:30am||Morning Session 1|
|10:30am - 11:00am||Break|
|11:00am - 12:30pm||Morning Session 2|
|12:30pm - 1:30pm||Lunch|
|1:30pm - 3:00pm||Afternoon Session 1|
|3:00pm - 3:30pm||Break|
|3:30pm - 5:30pm||Afternoon Session 2|
|Room A||Room B||Room E||Room F|
Optimizations are crucial to meet the performance, power and cost requirements that DSP and embedded systems have. The aim of the ODES workshop is to give the opportunity to researchers and practitioners working on this, to share their findings and get feedback. We think interacting with the community is crucial to do relevant research, and therefore ODES tries to maximize the interaction by carefully selecting the program committee members that review the submissions and at the workshop itself, reserving enough time for discussion.
Topics of interest include, but are not limited to:
Please find more information at this workshop's URL: http://www.imec.be/odes
A considerable amount of work has been performed with the Open64 compiler infrastructure over the last few years, we anticipate hearing a number of significant research topics and encouraging results at this workshop.
Topics of discussion will be (but are not limited to) findings and results in
The workshop provides a forum for discussion of your findings and experiences with a broad range of Open64 researchers and developers. It is also the main opportunity for the participants to exchange their expectations and wishes for future development of Open64.
Please find more information at this workshop's URL: http://www.capsl.udel.edu/conferences/open64/2009/
|8:45am - 9:00am||Welcome|
|9:00am - 10:30am||Programming Models|
|Design and Implementation of a High Level Framework
|A MapReduce Framework in Heterogenous GPU Environment|
Dehao Chen, Chuntao Hong, Wenguang Chen, Haibo Lin, Weimin Zheng
|GPU Kernels as Data-Parallel Array Computations in Haskell|
Sean Lee, Manuel M.T. Chakravarty, Vinod Grover, Gabriele Keller
|10:30am - 11:00am||Break|
|11:00am - 12:00 noon||Application and Performance Frameworks|
|Analytical Performance Prediction for Evaluation and Tuning of GPGPU Applications|
Sara Kaghsorkhi, Wen-mei Hwu
|GPU-Accelerated Text Mining|
Yongpeng Zhang, Frank Mueller, Xiaohui Cui, Thomas Potok
|12:00 noon - 12:15pm||Wrapup|
We invite papers in this emerging discipline which include, but are not limited, to the following areas of interest.
Feb. 15th 2009: Paper submission deadline
Mar. 8th 2009: Notification of acceptance
Mar. 15th 2009: Camera-ready version of papers due
Mar. 22nd 2009: The workshop
Vinod Grover, NVIDIA Corporation
Richard Johnson, NVIDIA Corporation
Manuel M T Chakravarty, University of New South Wales
Rudi Eigenman, Purdue University
Anwar Ghuloum, Intel
Naga Govindaraju, Microsoft
Wen-mei Hwu, University of Illinois, Urbana-Champaign
Miriam Leeser, Northeastern University
Dinesh Manocha, University of North Carolina
Shane Ryoo, ZeroSoft
Bratin Saha, Intel
Bixia Zheng, AMD
Papers of 6-10 pages may be submitted using any format. The abstract should clearly state the problem being studied, the methods used, and the results. If the results are preliminary, the authors should state their expectation for the final results. To submit, please send a pdf of your submission to email@example.com Final submissions should use the standard ACM conference format (two columns with 9 pt Times Roman font, etc.)
An important key to unlocking the performance potential of current and future multicore platforms is to choose a programming model that lets you express concurrency without requiring that you explicitly manage it. This half-day tutorial introduces the Intel© Threading Building Blocks concurrency library, which offers a rich and complete approach to expressing parallelism in C++ programs.
By using a high-level concurrency platform like Intel© Threading Building Blocks (Intel TBB), developers express the concurrency in their applications and let the runtime library manage the low-level, platform-specific details of implementing and scheduling the concurrency on the hardware. It is these implementation and scheduling details that will need to change as architectures change. By using a concurrency platform instead of using a native threading library directly, an application is generally easier to write, is more likely to be correct, and will continue to enjoy increased performance as platforms evolve.
After completing this half-day tutorial, attendees will (1) understand why a concurrency library such as TBB is often a better choice than using a native threading library directly, (2) have a good understanding of the features and capabilities of the TBB library and (3) be positioned to begin using the library for developing their own applications.
ABOUT THE PRESENTER
Michael Voss is a Senior Staff Software Engineer in the Intel Performance, Analysis and Threading Lab where he is one of the senior developers of the Intel© Threading Building Blocks library. His interests include languages and compilers for parallel computing, adaptive compilation, and optimization. He received his Ph.D. from the School of Electrical and Computer Engineering at Purdue University in 2001. He was an assistant professor in the Edwards S. Rogers Department of Electrical and Computer Engineering at the University of Toronto from 2001 - 2005 and an adjunct Professor from 2005 - 2007.
OUTLINE OF TOPICS
1. Exploiting Multicore Platforms (15 minutes) a. Increasing core counts is the new free lunch b. Threading to the platform is the wrong way to go i. Achieving portable performance is very difficult ii. High potential to introduce bugs iii. It does not provide a path forward! c. Concurrency platforms are the key to forward-scaling parallelism i. Intel TBB / Microsoft* Parallel Pattern Library ii. the OpenMP API iii. Data parallel programming models 2. An overview of Intel Threading Building Blocks (15 minutes) a. A generic C++ library b. Available as a commercial and open-source package c. Overview of the basic components available in the library 3. Concurrent Containers (30 minutes) a. Introduction to concurrent_map, concurrent_queue and concurrent_vector b. compare / contrast with other approaches c. Provide code examples 4. Parallel Algorithms (30 minutes) a. introduction to parallel_for, parallel_reduce, parallel_do and pipeline b. discuss Range concept and Partitioner hint c. compare and contrast with other approaches d. Provide code examples ---- Mid-point Break --- 5. Tasks and the task scheduler (45 minutes) a. Introduction to work-stealing task schedulers b. How to define tasks using Intel TBB c. Provide code examples 6. Exception handling and cancellation support (15 minutes) a. The Intel TBB model exceptions and cancellation b. Provide code examples 7. tbb_thread (15 minutes) a. An overview of the thread interface b. When to use task and when to use threads c. Provide code examples 8. Wrap-up (15 minutes) a. Future directions for TBB b. How to get TBB c. How to interact with the TBB community
Community WEB page: http://www.threadingbuildingblocks.org/
The advent of multi-core processors in the mainstream computing market is forcing programmers to shift from writing sequential code to building concurrent applications. Today, concurrent programs are typically synchronized using locks. However, it is hard to write correct and scalable programs using locks. Fine-grained locks may not compose and are prone to deadlocks. Coarse-grained locks may not deliver the performance promised by multi-core processors. Recently, Transactional Memory (TM) has been proposed as an alternative to locks for multi-core programming. TM has attracted a lot of attention as a safer, more modular, and more scalable concurrency control mechanism.
This tutorial will provide a comprehensive overview of transactional memory: description of transactional language extensions and their semantics as well as discussion of important aspects of a software transactional memory (STM) system implementation. We will show how to extend C/C++ with transactional constructs and how these constructs alleviate some of the problems related to programming with locks. We will also present the fundamentals of implementing STM algorithms for C/C++ in the TM runtime, describe how TM support can be elegantly expressed in a form of an application interface and discuss generation of the code by the TM compiler that makes appropriate use of such interface.
ABOUT THE PRESENTERS
Yang Ni is a Research Scientist in Intel's Programming Systems Lab. He has been working on programming languages for platforms from mobile devices to chip multi processors. His current research focuses on transactional memory. He is a major contributor to the Intel C/C++ TM compiler. Yang received his Ph.D. in Computer Science from Rutgers University.
Adam Welc is a Research Scientist in Intel's Programming Systems Lab. His work is in the area of programming language design and implementation, with specific interests in concurrency control, compiler and run-time system optimizations, transactional processing as well as architectural support for programming languages and applications. Adam received the Master of Science in Computer Science from Poznan University of Technology, Poland, in July 1999. He continued his graduate studies at Purdue University, receiving the Master of Science in Computer Science in May 2003, and the Ph.D. in Computer Science in March 2006.
Traditional evaluation methodologies
assume that either the hardware or the software components of a system
are fixed. For example, an accepted methodology for evaluating
the effectiveness of a compiler optimization is to compare the execution
of two differently compiled binaries on the same hardware. Likewise,
an accepted methodology for evaluating microarchitectural hardware changes
has been to measure relatively short, but representative, samples of
the same program's execution with multiple configurations of a detailed
Nonetheless, the co-design
of hardware and software systems is being pursued as part of many industry
and academic projects. Researchers have therefore been forced
to build their own custom infrastructures and invent methodologies to
demonstrate the viability of their ideas. The purpose of this workshop
is for experienced practitioners in this area to share their gained
expertise and knowledge to a wider audience in the hopes of broadening
community understanding. Identifying readily-available building blocks
and tools, as well as opportunities for further improvements in this
area are the goals of this workshop.
Topics of interest include,
but are not limited to:
Authors should submit a 3 page double spaced extended abstract by February 16th, 2009 to firstname.lastname@example.org. Final submission is a slide set for a 20 minute presentation.
General ChairUma Srinivasan (Intel)
Program ChairAnne Holler (VMware)
ProgramSchedule for Workshop on Infrastructures for Software/Hardware co-design (WISH) Sunday, 22 March 2009, Afternoon, Room 4
David R. Ditzel
|2:30-3:00|| "Dynamic Binary Translation from 32-bit to 64-bit
mode for x86 Virtualization"|
Yu-Hsin Joyce Chen
|3:30-4:00|| "Run-time Data Dependence Analysis
Using Detected Loop Regions in Binary Codes"|
Yukinori SATO and Tadao NAKAMURA
|4:00-4:30|| "Cheetah: A Light-Weight,
Super-Fast Tracing Infrastructure for Itanium Processors"|
Ram Srinivasan, Chris Krieger, Jim Callister
|4:30-5:00|| "Simulation-based Performance
Evaluation of Co-designed Hardware/Software"|
Naveen Neelakantam and Craig Zilles
|5:00-5:30|| "Measure of Similarity Degree between
loops Based on Graph Dependency"|
Lamia Djoudi and William Jalby
In the past years, we discovered that the interference graphs of SSA-form programs are chordal. This has several attractive implications for register allocation: First, coloring the interference graph is no longer NP-complete. Second, the number of needed registers is generally less than in non SSA-form programs. Finally, this number is equal to maximal number of simultaneously live variables.
This opens the door to what we call SSA-based register allocation. The main advantage of this new approach, is that the two problems, spilling, then coloring/coalescing can be optimized separately in two consecutive (more clean and simpler) phases. This is especially interesting for the design of compilers (either aggressive or just in time) for embedded processors.
The goal of this tutorial is to explain the basics of this in-two-phases approach and to provide the elements to help compiler writers build both memory friendly, fast and competitive (fewer spill code) register allocators. To illustrate this, we will also present new possible approaches to spilling, coalescing, and SSA destruction, as well as experiments with an SSA-based register allocator.
We will start from scratch, explain the key ideas in detail via examples, and present both theoretical and experimental results. We have divided the material such that each speaker will present a well-rounded portion.
ABOUT THE PRESENTERS
Philip Brisk received his B.S., M.S., and Ph.D. degrees, all in Computer Science, from UCLA in 2002, 2003, and 2006 respectively. Since 2006, he has been a postdoctoral scholar with the Ecole Polytechnique Federale de Lausanne (EPFL). His research interests include application-specific processor design, reconfigurable computing, and compilers.
Alain Darte is a CNRS research scientist at LIP, ENS Lyon, France. He received his Ph.D. degree at ENS-Lyon in 1993. His main scientific interests are in mathematical tools, automatic program transformations, and optimizations for parallelizing compilers and for compiler-based tools used to automatically synthesize hardware accelerators. His main contributions concern polyhedral techniques for loop transformations and memory optimizations, software pipelining, and register allocation, in connexion with industrial partners such as Hewlett Packard (past Pico project) and STMicroelectronics. He is member of the editorial board of ACM TECS and has served in many program committees such as ASAP, CASES, CC, CGO, DATE, Euro-Par, ICS, PLDI.
Jens Palsberg is a Professor of Computer Science at UCLA. He received a Ph.D. in Computer Science from University of Aarhus, Denmark in 1992. His research interests span the areas of compilers, embedded systems, programming languages, software engineering, and information security. He is an associate editor of ACM Transactions of Programming Languages and Systems, a member of the editorial board of Information and Computation, and a former member of the editorial board of IEEE Transactions on Software Engineering. He is serving as the vice chair of ACM SIGBED, Special Interest Group on Embedded Systems, and he has served as vice chair of computer science at UCLA, as associate head of computer science at Purdue University, as general chair of POPL, as conference chair of LICS, and as a program chair for POPL, EMSOFT, MEMOCODE, PASTE, SAS, SREIS, and TACAS.
Fabrice Rastello is a researcher of Computer Science at Inria, ENS Lyon, France. He has done his PhD thesis at LIP, ENS Lyon, France (received in 2000) in the field of automatic parallelization (tiling, heterogeneous computing). Then he worked for two years in a compiler group at STMicroelectronics (Grenoble, France) where he mainly developed back-end optimizations in LAO (linear assembly optimizer). His research topic is the compiler optimizations for DSP/VLIW/Media like embedded processors. He still works in collaboration with STmicroelectronics' compiler team for the ST200 family DSP processors. His current research work is focused on JIT compilation, SSA-based optimizations, code compression, register allocation and instruction-cache optimizations.