Final Program
Monday September 9 | ||
---|---|---|
9:00-10:00 | Keynote: David J. Kuck | |
10:00-10:30 | Break | |
10:30-12:00 | Session 1A: Compilers | Session 1B: Power & Energy |
12:30-14:30 | Lunch with SRC Poster Session | |
14:30-16:00 | Session 2A: GPU and Energy | Session 2B: Memory System Management |
16:00-16:30 | Break | |
16:30-18:00 | SRC Presentations | |
18:30-21:00 | Reception with Surgeons' Hall Museum Visit | |
Tuesday September 10 | ||
9:00-10:00 | Keynote: Câlin Caşcaval | |
10:00-10:30 | Break | |
10:30-12:30 | Session 3: Best Papers | |
12:30-14:00 | Lunch | |
14:00-18:00 | Guided Tour Edinburgh and Edinburgh Castle | |
19:00-00:00 | Dinner at the Hub | |
22:30-00:00 | Optional Ghost Tour (Strong Nerves required) | |
Wednesday September 11 | ||
9:00-10:00 | Keynote: Per Stenström | |
10:00-10:30 | Break | |
10:30-12:00 | Session 4A: Runtime & Scheduling | Session 4B: Caches & Memory Hierarchy (1) |
12:00-13:30 | Lunch | |
13:30-15:00 | Session 5A: GPU | Session 5B: Caches & Memory Hierarchy (2) |
15:00-15:30 | Break | |
15:30-17:30 | Session 6A: Networking, Debugging, & Microarchitecture | Session 6B: Compiler Optimization |
Keynote Details:
A Comprehensive Approach to HW/SW Codesign by David J. Kuck
Abstract: Energy/performance results for parallel (and sequential) computing are still, usually hard to predict and often disappointing. A model using invariant-based equations is being applied to predict energy/performance as HW and SW are changed in codesign studies. The physical model consists of HW nodes chosen to match architectural issues, together with automatically extracted SW codelets that are easy to measure and model. HW/SW measurements of computational capacity (BW used) and power [based on HW counters and SW modification (Decan)] are used by the Cape tool to evaluate tradeoffs quickly and find optimal solutions to various codesign problems. Codelets from a number of real applications are being analyzed and modeled.Biography: David J. Kuck is an Intel Fellow working on HW/SW codesign in Intel’s Software and Solutions Group. He was a Professor of CS/ECE at the University of Illinois (UIUC) and founder of the Center for Supercomputing Research and Development. He was a founder and Chairman of KAI from 1979 until 2000 when it was acquired by Intel. He is a Fellow of the IEEE, ACM and AAAS, and has received the IEEE Piore Award, the IEEE Computer Society’s Computer Pioneer Award, the ACM-IEEE Eckert-Mauchly and Kennedy Awards, and is a member of the National Academy of Engineering.
Parallel Programming for Mobile Computing by Câlin Caşcaval
Abstract: Personal computing is going mobile and applications are changing to adapt to take advantage of new opportunities offered by permanent availability and connectivity. Mobile devices are a significant departure from traditional computing. On one hand, they are very personal, always on, always connected. They promise to fulfill the promise of being the hub for our digital lives. On the other hand, they are much more constrained in terms of resources than desktops. Even though progress in their computing capabilities has been staggering, they continue to rely on battery power and are packaged in appealing packages that are a nightmare for thermal dissipation. In this talk I will present the challenges facing programmers for mobile devices driven by architectural and packaging constraints, as well as the changes in applications domains. I will give examples on how we used concurrency to improve performance and power efficiency, in a number of projects at Qualcomm Research, including the Zoomm parallel browser.Biography: Dr. Câlin Caşcaval is Director of Engineering at the Qualcomm Silicon Valley Research Center, where he is leading projects in the area of parallel software for mobile computing. Previously, he worked at the IBM TJ Watson Research Center, where he worked on systems software, programming models, and compilers for a number of large scale parallel systems projects, including Blue Gene and PERCS. He led the implementation of the first UPC compiler to scale to hundreds of thousands of processors, as well as research into parallel programming languages and parallel programming abstractions. He collaborates extensively with academia and has more than 50 peer-reviewed publications and more than 40 patent disclosures. Câlin has a PhD in Computer Science from the University of Illinois at Urbana-Champaign.
Towards Automatic Resource Management in Parallel Architectures by Per Stenström
Abstract: As we have embarked on the multi/many-core roadmap resource management, especially managing parallelism, is left in the hands of programmers. A major challenge moving forwards is therefore how to off-load programmers from the daunting task of managing hardware resources in future parallel architectures to meet higher demands on performance and power efficiency. In this talk I will focus on a number of emerging technologies being developed at Chalmers and elsewhere that can help off-loading programmers from parallelism management. These include task-based dataflow programming models and transactional memory. I will also present a framework for resource management and recent findings concerning how to manage memory hierarchies more power efficiently.Biography: Per Stenström is professor at Chalmers University of Technology. His research interests are in parallel computer architecture. He has authored or co-authored three textbooks and more than 130 publications in this area. He has been program chairman of the IEEE/ACM Symposium on Computer Architecture, the IEEE High-Performance Computer Architecture Symposium, and the IEEE Parallel and Distributed Processing Symposium and acts as Senior Associate Editor of ACM TACO and Associate Editor-in-Chief of JPDC. He is a Fellow of the ACM and the IEEE and a member of Academia Europaea and the Royal Swedish Academy of Engineering Sciences.
Session Details:
Session 1A: Compilers
INSPIRE The Insieme Parallel Intermediate RepresentationHerbert Jordan, Simone Pellegrini, Peter Thoman, Klaus Kofler, Thomas Fahringer (University of Innsbruck)
Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting
N. Vaivaswatha, R. Govindarajan (Indian Institute of Science, Bangalore)
Interprocedural Strength Reduction of Critical Sections in Explicitly-Parallel Programs
Rajkishore Barik (Intel Labs), Jisheng Zhao (Rice University), Vivek Sarkar (Rice University)
Session 1B: Power & Energy
ThermOS: System Support for Dynamic Thermal Management of Chip Multi-ProcessorsFilippo Sironi (Politecnico di Milano), Martina Maggio (Lund University), Riccardo Cattaneo, Giovanni F. Del Nero,
Donatella Sciuto, Marco D. Santambrogio (Politecnico di Milano)
Coordinated Power-Performance Optimization in Manycores
Hiroshi Sasaki, Satoshi Imamura, Koji Inoue (Kyushu University)
An Opportunistic Prediction-based Thread Scheduling to Maximize Throughput/Watt in AMPs
Arunachalam Annamalai, Rance Rodrigues, Israel Koren, Sandip Kundu (University of Massachusetts at Amherst)
Session 2A: GPU & Energy
APOGEE: Adaptive Prefetching on GPU for Energy EfficiencyAnkit Sethia (University of Michigan), Ganesh Dasika (ARM Inc, Austin),
Mehrzad Samadi (University of Michigan), Scott Mahlke (University of Michigan)
Parallel Frame Rendering: Trading Responsiveness for Energy on a Mobile GPU
Jose-Maria Arnau (Universitat Politecnica de Catalunya),
Joan-Manuel Parcerisa (Universitat Politecnica de Catalunya), Polychronis Xekalakis (Intel)
Exploring Hybrid Memory for GPU Enenrgy Efficiency through Software-Hardware Co-Design
Bin Wang (Auburn University), Bo Wu (College of William and Mary), Dong Li (Oakridge National Lab),
Xipeng Shen (College of William and Mary), Weikuan Yu (Auburn University),
Yizheng Jiao (Auburn University), Jeffrey S. Vetter (Oak Ridge National Lab)
Session 2B: Memory System Management
S-CAVE: Effectively Managing SSD Caches in Virtual Machine EnvironmentsTian Luo (The Ohio State University), Siyuan Ma (The Ohio State University),
Rubao Lee (The Ohio State University), Xiaodong Zhang (The Ohio State University),
Deng Liu (VMWare), Li Zhou (VMWare)
Writeback-Aware Bandwidth Partitioning for Multi-core Systems with PCM
Miao Zhou, Yu Du, Bruce Childers, Rami Melhem, Daniel Mosse (University of Pittsburgh)
L1-Bandwidth Aware Thread Allocation in Multicore SMT Processors
Josué Feliu, Julio Sahuquillo, Salvador Petit, José Duato (Universitat Politècnica de València)
Session 3: Best Papers
A Unified View of Non-monotonic Core Selection and Application Steering in Heterogeneous Chip MultiprocessorsSandeep Navada, Niket K. Choudhary (Qualcomm),
Salil Wadhavkar, Eric Rotenberg (North Carolina State University)
Memory-centric System Interconnect Design with Hybrid Memory Cubes
Gwangsun Kim, John Kim (KAIST), Jung Ho Ahn, Jaeha Kim (Seoul National University)
Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
Onur Kayiran, Adwait Jog, Mahmut T. Kandemir, Chita R. Das (The Pennsylvania State University)
SMT-Centric Power-Aware Thread Placement in Chip Multiprocessors
Augusto Vega, Alper Buyuktosunoglu, Pradip Bose (IBM)
Session 4A: Runtime & Scheduling
Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-CoresKenzo Van Craeynest (Ghent University), Shoaib Akram (Ghent University), Wim Heirman (Ghent University),
Aamer Jaleel (Intel), Lieven Eeckhout (Ghent University)
DANBI: Dynamic Scheduling of Irregular Stream Programs for Many-Core Systems
Changwoo Min (Sungkyunkwan University and Samsung), Young Ik Eom (Sungkyunkwan University)
An Empirical Model for Predicting Cross-Core Performance Interference on Multicore Processors
Jiacheng Zhao (Institute of Computing Technology, CAS), Huimin Cui (Institute of Computing Technology, CAS),
Jingling Xue (University of New South Wales), Xiaobing Feng (University of Computing Technology, CAS),
Youliang Yan (Huawei), Wensen Yang (Huawei)
Session 4B: Caches & Memory Hierarchy (1)
Jigsaw: Scalable Software-Defined CachesNathan Beckmann, Daniel Sanchez (MIT)
Managing Shared Last-level Cache in a Heterogeneous Multicore Processor
Vineeth Mekkat, Anup Holey, Pen-Chung Yew, Antonia Zhai (University of Minnesota)
Reshaping Cache Misses to Improve Row-Buffer Locality in Multicore Systems
Wei Ding, Jun Liu, Mahmut Kandemir, Mary Jane Irwin (The Pennsylvania State University)
Session 5A: GPU
Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous SystemsJanghaeng Lee, Mehrzad Samadi, Yongjun Park, Scott Mahlke (University of Michigan)
Starchart: Hardware and Software Optimization Using Recursive Partitioning Regression Trees
Wenhao Jia (Princeton University), Kelly A. Shaw (University of Richmond),
Margaret Martonosi (Princeton University)
RSVM: a region-based software virtual memory for GPU
Feng Ji (NCSU), Heshan Lin (Virginia Tech), Xiaosong Ma (NCSU)
Session 5B: Caches & Memory Hierarchy (2)
The Case for a Scalable Coherence Protocol for Complex on-chip Cache Hierarchies in Many Core SystemsLucia G. Menezo, Valentin Puente, Jose Angel Gregorio (University of Cantabria)
Meeting Midway: Improving CMP Performance with Memory-Side Prefetching
Praveen Yedlapalli, Jagadish Kotra, Emre Kultursay, Mahmut Kandemir, Chita Das,
Anand Sivasubramaniam (The Pennsylvania State University)
Building Expressive, Area-Efficient Coherence Directories
Lei Fang (Zhejiang University), Peng Liu (Zhejiang University), Qi Hu (Zhejiang University),
Michael C. Huang (University of Rochester), Guofan Jiang (IBM GCG Systems & Technology Lab)
Session 6A: Network, Debugging, & Microarchitecture
Traffic Steering Between a Low-Latency Unswitched TL Ring and a High-Throughput Switched On-chip InterconnectJungju Oh, Alenka Zajic, Milos Prvulovic (Georgia Institute of Technology)
McRouter: Multicast within a Router for High Performance Network-on-Chips
Yuan He (The University of Tokyo), Hiroshi Sasaki (Kyushu University),
Shinobu Miwa (The University of Tokyo), Hiroshi Nakamura (The University of Tokyo)
A Debugging Technique for Every Parallel Programmer
Justin Gottschlich, Gilles Pokam, Cristiano Pereira, Youfeng Wu (Intel Corporation)
Breaking SIMD Shackles with an Exposed Flexible Microarchitecture and the Access Execute PDG
Venkatraman Govindaraju, Tony Nowatzki, Karthikeyan Sankaralingam (University of Wisconsin - Madison)
Session 6B: Compiler Optimization
Vectorization Past Dependent Branches Through SpeculationMajedul Haque Sujon (University of Texas - San Antonio), R. Clint Whaley (University of Texas - San Antonio),
Qing Yi (University of Colorado)
Automatic Vectorization of Tree Traversals
Youngjoon Jo, Michael Goldfarb, Milind Kulkarni (Purdue University)
Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
Roshan Dathathri, Chandan Reddy, Thejas Ramashekar, Uday Bondhugula (Indian Institute of Science, Bangalore)
Automatic OpenCL Work-Group Size Selection for Multicore CPUs
Sangmin Seo, Jun Lee, Gangwon Jo, Jaejin Lee (Seoul National University)