PACT 2013


Final Program

Monday September 9
9:00-10:00Keynote: David J. Kuck
10:00-10:30Break
10:30-12:00Session 1A:
Compilers
Session 1B:
Power & Energy
12:30-14:30Lunch with SRC Poster Session
14:30-16:00Session 2A:
GPU and Energy
Session 2B:
Memory System Management
16:00-16:30Break
16:30-18:00SRC Presentations
18:30-21:00Reception with Surgeons' Hall Museum Visit
Tuesday September 10
9:00-10:00Keynote: Câlin Caşcaval
10:00-10:30Break
10:30-12:30Session 3:
Best Papers
12:30-14:00Lunch
14:00-18:00Guided Tour Edinburgh and Edinburgh Castle
19:00-00:00Dinner at the Hub
22:30-00:00Optional Ghost Tour (Strong Nerves required)
Wednesday September 11
9:00-10:00Keynote: Per Stenström
10:00-10:30Break
10:30-12:00Session 4A:
Runtime & Scheduling
Session 4B:
Caches & Memory Hierarchy (1)
12:00-13:30Lunch
13:30-15:00Session 5A:
GPU
Session 5B:
Caches & Memory Hierarchy (2)
15:00-15:30Break
15:30-17:30Session 6A:
Networking,
Debugging, &
Microarchitecture
Session 6B:
Compiler Optimization

Keynote Details:

A Comprehensive Approach to HW/SW Codesign by David J. Kuck

Abstract: Energy/performance results for parallel (and sequential) computing are still, usually hard to predict and often disappointing. A model using invariant-based equations is being applied to predict energy/performance as HW and SW are changed in codesign studies. The physical model consists of HW nodes chosen to match architectural issues, together with automatically extracted SW codelets that are easy to measure and model. HW/SW measurements of computational capacity (BW used) and power [based on HW counters and SW modification (Decan)] are used by the Cape tool to evaluate tradeoffs quickly and find optimal solutions to various codesign problems. Codelets from a number of real applications are being analyzed and modeled.

Biography: David J. Kuck is an Intel Fellow working on HW/SW codesign in Intel’s Software and Solutions Group. He was a Professor of CS/ECE at the University of Illinois (UIUC) and founder of the Center for Supercomputing Research and Development. He was a founder and Chairman of KAI from 1979 until 2000 when it was acquired by Intel. He is a Fellow of the IEEE, ACM and AAAS, and has received the IEEE Piore Award, the IEEE Computer Society’s Computer Pioneer Award, the ACM-IEEE Eckert-Mauchly and Kennedy Awards, and is a member of the National Academy of Engineering.

Parallel Programming for Mobile Computing by Câlin Caşcaval

Abstract: Personal computing is going mobile and applications are changing to adapt to take advantage of new opportunities offered by permanent availability and connectivity. Mobile devices are a significant departure from traditional computing. On one hand, they are very personal, always on, always connected. They promise to fulfill the promise of being the hub for our digital lives. On the other hand, they are much more constrained in terms of resources than desktops. Even though progress in their computing capabilities has been staggering, they continue to rely on battery power and are packaged in appealing packages that are a nightmare for thermal dissipation. In this talk I will present the challenges facing programmers for mobile devices driven by architectural and packaging constraints, as well as the changes in applications domains. I will give examples on how we used concurrency to improve performance and power efficiency, in a number of projects at Qualcomm Research, including the Zoomm parallel browser.

Biography: Dr. Câlin Caşcaval is Director of Engineering at the Qualcomm Silicon Valley Research Center, where he is leading projects in the area of parallel software for mobile computing. Previously, he worked at the IBM TJ Watson Research Center, where he worked on systems software, programming models, and compilers for a number of large scale parallel systems projects, including Blue Gene and PERCS. He led the implementation of the first UPC compiler to scale to hundreds of thousands of processors, as well as research into parallel programming languages and parallel programming abstractions. He collaborates extensively with academia and has more than 50 peer-reviewed publications and more than 40 patent disclosures. Câlin has a PhD in Computer Science from the University of Illinois at Urbana-Champaign.

Towards Automatic Resource Management in Parallel Architectures by Per Stenström

Abstract: As we have embarked on the multi/many-core roadmap resource management, especially managing parallelism, is left in the hands of programmers. A major challenge moving forwards is therefore how to off-load programmers from the daunting task of managing hardware resources in future parallel architectures to meet higher demands on performance and power efficiency. In this talk I will focus on a number of emerging technologies being developed at Chalmers and elsewhere that can help off-loading programmers from parallelism management. These include task-based dataflow programming models and transactional memory. I will also present a framework for resource management and recent findings concerning how to manage memory hierarchies more power efficiently.

Biography: Per Stenström is professor at Chalmers University of Technology. His research interests are in parallel computer architecture. He has authored or co-authored three textbooks and more than 130 publications in this area. He has been program chairman of the IEEE/ACM Symposium on Computer Architecture, the IEEE High-Performance Computer Architecture Symposium, and the IEEE Parallel and Distributed Processing Symposium and acts as Senior Associate Editor of ACM TACO and Associate Editor-in-Chief of JPDC. He is a Fellow of the ACM and the IEEE and a member of Academia Europaea and the Royal Swedish Academy of Engineering Sciences.

Session Details:

Session 1A: Compilers

INSPIRE The Insieme Parallel Intermediate Representation
     Herbert Jordan, Simone Pellegrini, Peter Thoman, Klaus Kofler, Thomas Fahringer (University of Innsbruck)

Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting
     N. Vaivaswatha, R. Govindarajan (Indian Institute of Science, Bangalore)

Interprocedural Strength Reduction of Critical Sections in Explicitly-Parallel Programs
     Rajkishore Barik (Intel Labs), Jisheng Zhao (Rice University), Vivek Sarkar (Rice University)

Session 1B: Power & Energy

ThermOS: System Support for Dynamic Thermal Management of Chip Multi-Processors
     Filippo Sironi (Politecnico di Milano), Martina Maggio (Lund University), Riccardo Cattaneo, Giovanni F. Del Nero,
     Donatella Sciuto, Marco D. Santambrogio (Politecnico di Milano)

Coordinated Power-Performance Optimization in Manycores
     Hiroshi Sasaki, Satoshi Imamura, Koji Inoue (Kyushu University)

An Opportunistic Prediction-based Thread Scheduling to Maximize Throughput/Watt in AMPs
     Arunachalam Annamalai, Rance Rodrigues, Israel Koren, Sandip Kundu (University of Massachusetts at Amherst)

Session 2A: GPU & Energy

APOGEE: Adaptive Prefetching on GPU for Energy Efficiency
     Ankit Sethia (University of Michigan), Ganesh Dasika (ARM Inc, Austin),
     Mehrzad Samadi (University of Michigan), Scott Mahlke (University of Michigan)

Parallel Frame Rendering: Trading Responsiveness for Energy on a Mobile GPU
     Jose-Maria Arnau (Universitat Politecnica de Catalunya),
     Joan-Manuel Parcerisa (Universitat Politecnica de Catalunya), Polychronis Xekalakis (Intel)

Exploring Hybrid Memory for GPU Enenrgy Efficiency through Software-Hardware Co-Design
     Bin Wang (Auburn University), Bo Wu (College of William and Mary), Dong Li (Oakridge National Lab),
     Xipeng Shen (College of William and Mary), Weikuan Yu (Auburn University),
     Yizheng Jiao (Auburn University), Jeffrey S. Vetter (Oak Ridge National Lab)

Session 2B: Memory System Management

S-CAVE: Effectively Managing SSD Caches in Virtual Machine Environments
     Tian Luo (The Ohio State University), Siyuan Ma (The Ohio State University),
     Rubao Lee (The Ohio State University), Xiaodong Zhang (The Ohio State University),
     Deng Liu (VMWare), Li Zhou (VMWare)

Writeback-Aware Bandwidth Partitioning for Multi-core Systems with PCM
     Miao Zhou, Yu Du, Bruce Childers, Rami Melhem, Daniel Mosse (University of Pittsburgh)

L1-Bandwidth Aware Thread Allocation in Multicore SMT Processors
     Josué Feliu, Julio Sahuquillo, Salvador Petit, José Duato (Universitat Politècnica de València)

Session 3: Best Papers

A Unified View of Non-monotonic Core Selection and Application Steering in Heterogeneous Chip Multiprocessors
     Sandeep Navada, Niket K. Choudhary (Qualcomm),
     Salil Wadhavkar, Eric Rotenberg (North Carolina State University)

Memory-centric System Interconnect Design with Hybrid Memory Cubes
     Gwangsun Kim, John Kim (KAIST), Jung Ho Ahn, Jaeha Kim (Seoul National University)

Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
     Onur Kayiran, Adwait Jog, Mahmut T. Kandemir, Chita R. Das (The Pennsylvania State University)

SMT-Centric Power-Aware Thread Placement in Chip Multiprocessors
     Augusto Vega, Alper Buyuktosunoglu, Pradip Bose (IBM)

Session 4A: Runtime & Scheduling

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores
     Kenzo Van Craeynest (Ghent University), Shoaib Akram (Ghent University), Wim Heirman (Ghent University),
     Aamer Jaleel (Intel), Lieven Eeckhout (Ghent University)

DANBI: Dynamic Scheduling of Irregular Stream Programs for Many-Core Systems
     Changwoo Min (Sungkyunkwan University and Samsung), Young Ik Eom (Sungkyunkwan University)

An Empirical Model for Predicting Cross-Core Performance Interference on Multicore Processors
     Jiacheng Zhao (Institute of Computing Technology, CAS), Huimin Cui (Institute of Computing Technology, CAS),
     Jingling Xue (University of New South Wales), Xiaobing Feng (University of Computing Technology, CAS),
     Youliang Yan (Huawei), Wensen Yang (Huawei)

Session 4B: Caches & Memory Hierarchy (1)

Jigsaw: Scalable Software-Defined Caches
     Nathan Beckmann, Daniel Sanchez (MIT)

Managing Shared Last-level Cache in a Heterogeneous Multicore Processor
     Vineeth Mekkat, Anup Holey, Pen-Chung Yew, Antonia Zhai (University of Minnesota)

Reshaping Cache Misses to Improve Row-Buffer Locality in Multicore Systems
     Wei Ding, Jun Liu, Mahmut Kandemir, Mary Jane Irwin (The Pennsylvania State University)

Session 5A: GPU

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems
     Janghaeng Lee, Mehrzad Samadi, Yongjun Park, Scott Mahlke (University of Michigan)

Starchart: Hardware and Software Optimization Using Recursive Partitioning Regression Trees
     Wenhao Jia (Princeton University), Kelly A. Shaw (University of Richmond),
     Margaret Martonosi (Princeton University)

RSVM: a region-based software virtual memory for GPU
     Feng Ji (NCSU), Heshan Lin (Virginia Tech), Xiaosong Ma (NCSU)

Session 5B: Caches & Memory Hierarchy (2)

The Case for a Scalable Coherence Protocol for Complex on-chip Cache Hierarchies in Many Core Systems
     Lucia G. Menezo, Valentin Puente, Jose Angel Gregorio (University of Cantabria)

Meeting Midway: Improving CMP Performance with Memory-Side Prefetching
     Praveen Yedlapalli, Jagadish Kotra, Emre Kultursay, Mahmut Kandemir, Chita Das,
     Anand Sivasubramaniam (The Pennsylvania State University)

Building Expressive, Area-Efficient Coherence Directories
     Lei Fang (Zhejiang University), Peng Liu (Zhejiang University), Qi Hu (Zhejiang University),
     Michael C. Huang (University of Rochester), Guofan Jiang (IBM GCG Systems & Technology Lab)

Session 6A: Network, Debugging, & Microarchitecture

Traffic Steering Between a Low-Latency Unswitched TL Ring and a High-Throughput Switched On-chip Interconnect
     Jungju Oh, Alenka Zajic, Milos Prvulovic (Georgia Institute of Technology)

McRouter: Multicast within a Router for High Performance Network-on-Chips
     Yuan He (The University of Tokyo), Hiroshi Sasaki (Kyushu University),
     Shinobu Miwa (The University of Tokyo), Hiroshi Nakamura (The University of Tokyo)

A Debugging Technique for Every Parallel Programmer
     Justin Gottschlich, Gilles Pokam, Cristiano Pereira, Youfeng Wu (Intel Corporation)

Breaking SIMD Shackles with an Exposed Flexible Microarchitecture and the Access Execute PDG
     Venkatraman Govindaraju, Tony Nowatzki, Karthikeyan Sankaralingam (University of Wisconsin - Madison)

Session 6B: Compiler Optimization

Vectorization Past Dependent Branches Through Speculation
     Majedul Haque Sujon (University of Texas - San Antonio), R. Clint Whaley (University of Texas - San Antonio),
     Qing Yi (University of Colorado)

Automatic Vectorization of Tree Traversals
     Youngjoon Jo, Michael Goldfarb, Milind Kulkarni (Purdue University)

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
     Roshan Dathathri, Chandan Reddy, Thejas Ramashekar, Uday Bondhugula (Indian Institute of Science, Bangalore)

Automatic OpenCL Work-Group Size Selection for Multicore CPUs
     Sangmin Seo, Jun Lee, Gangwon Jo, Jaejin Lee (Seoul National University)