PACT 2013

Final Program



Monday September 9

9:00-10:00			Keynote: David J. Kuck
			Keynote: David J. Kuck
10:00-10:30	Break

10:30-12:00	Break		Session 1A: Compilers	Session 1B: Power & Energy
10:30-12:00				Session 1B: Power & Energy
12:30-14:30	Lunch with SRC Poster Session		Session 1A: Compilers

14:30-16:00	Lunch with SRC Poster Session		Session 2A: GPU and Energy	Session 2B: Memory System Management
14:30-16:00				Session 2B: Memory System Management
16:00-16:30	Break		Session 2A: GPU and Energy

16:30-18:00	Break		SRC Presentations
			SRC Presentations

18:30-21:00	Reception with Surgeons' Hall Museum Visit


Tuesday September 10

9:00-10:00	Reception with Surgeons' Hall Museum Visit		Keynote: Câlin Caşcaval
			Keynote: Câlin Caşcaval
10:00-10:30	Break

10:30-12:30	Break		Session 3: Best Papers
10:30-12:30
12:30-14:00	Lunch		Session 3: Best Papers

14:00-18:00	Lunch		Guided Tour Edinburgh and Edinburgh Castle
			Guided Tour Edinburgh and Edinburgh Castle

19:00-00:00	Dinner at the Hub


22:30-00:00	Dinner at the Hub		Optional Ghost Tour (Strong Nerves required)
			Optional Ghost Tour (Strong Nerves required)

Wednesday September 11

9:00-10:00	Keynote: Per Stenström

10:00-10:30	Keynote: Per Stenström		Break
			Break
10:30-12:00	Session 4A: Runtime & Scheduling	Session 4B: Caches & Memory Hierarchy (1)
10:30-12:00	Session 4A: Runtime & Scheduling	Session 4B: Caches & Memory Hierarchy (1)
12:00-13:30	Lunch

13:30-15:00	Lunch		Session 5A: GPU	Session 5B: Caches & Memory Hierarchy (2)
13:30-15:00				Session 5B: Caches & Memory Hierarchy (2)
15:00-15:30	Break		Session 5A: GPU

15:30-17:30	Break		Session 6A: Networking, Debugging, & Microarchitecture	Session 6B: Compiler Optimization
15:30-17:30				Session 6B: Compiler Optimization
			Session 6A: Networking, Debugging, & Microarchitecture

Keynote Details:

A Comprehensive Approach to HW/SW Codesign by David J. Kuck

Abstract: Energy/performance results for parallel (and sequential) computing are still, usually hard to predict and often disappointing. A model using invariant-based equations is being applied to predict energy/performance as HW and SW are changed in codesign studies. The physical model consists of HW nodes chosen to match architectural issues, together with automatically extracted SW codelets that are easy to measure and model. HW/SW measurements of computational capacity (BW used) and power [based on HW counters and SW modification (Decan)] are used by the Cape tool to evaluate tradeoffs quickly and find optimal solutions to various codesign problems. Codelets from a number of real applications are being analyzed and modeled.

Biography: David J. Kuck is an Intel Fellow working on HW/SW codesign in Intel’s Software and Solutions Group. He was a Professor of CS/ECE at the University of Illinois (UIUC) and founder of the Center for Supercomputing Research and Development. He was a founder and Chairman of KAI from 1979 until 2000 when it was acquired by Intel. He is a Fellow of the IEEE, ACM and AAAS, and has received the IEEE Piore Award, the IEEE Computer Society’s Computer Pioneer Award, the ACM-IEEE Eckert-Mauchly and Kennedy Awards, and is a member of the National Academy of Engineering.

Parallel Programming for Mobile Computing by Câlin Caşcaval

Abstract: Personal computing is going mobile and applications are changing to adapt to take advantage of new opportunities offered by permanent availability and connectivity. Mobile devices are a significant departure from traditional computing. On one hand, they are very personal, always on, always connected. They promise to fulfill the promise of being the hub for our digital lives. On the other hand, they are much more constrained in terms of resources than desktops. Even though progress in their computing capabilities has been staggering, they continue to rely on battery power and are packaged in appealing packages that are a nightmare for thermal dissipation. In this talk I will present the challenges facing programmers for mobile devices driven by architectural and packaging constraints, as well as the changes in applications domains. I will give examples on how we used concurrency to improve performance and power efficiency, in a number of projects at Qualcomm Research, including the Zoomm parallel browser.

Biography: Dr. Câlin Caşcaval is Director of Engineering at the Qualcomm Silicon Valley Research Center, where he is leading projects in the area of parallel software for mobile computing. Previously, he worked at the IBM TJ Watson Research Center, where he worked on systems software, programming models, and compilers for a number of large scale parallel systems projects, including Blue Gene and PERCS. He led the implementation of the first UPC compiler to scale to hundreds of thousands of processors, as well as research into parallel programming languages and parallel programming abstractions. He collaborates extensively with academia and has more than 50 peer-reviewed publications and more than 40 patent disclosures. Câlin has a PhD in Computer Science from the University of Illinois at Urbana-Champaign.

Towards Automatic Resource Management in Parallel Architectures by Per Stenström

Abstract: As we have embarked on the multi/many-core roadmap resource management, especially managing parallelism, is left in the hands of programmers. A major challenge moving forwards is therefore how to off-load programmers from the daunting task of managing hardware resources in future parallel architectures to meet higher demands on performance and power efficiency. In this talk I will focus on a number of emerging technologies being developed at Chalmers and elsewhere that can help off-loading programmers from parallelism management. These include task-based dataflow programming models and transactional memory. I will also present a framework for resource management and recent findings concerning how to manage memory hierarchies more power efficiently.

Biography: Per Stenström is professor at Chalmers University of Technology. His research interests are in parallel computer architecture. He has authored or co-authored three textbooks and more than 130 publications in this area. He has been program chairman of the IEEE/ACM Symposium on Computer Architecture, the IEEE High-Performance Computer Architecture Symposium, and the IEEE Parallel and Distributed Processing Symposium and acts as Senior Associate Editor of ACM TACO and Associate Editor-in-Chief of JPDC. He is a Fellow of the ACM and the IEEE and a member of Academia Europaea and the Royal Swedish Academy of Engineering Sciences.

Session Details:

Session 1A: Compilers

INSPIRE The Insieme Parallel Intermediate Representation
Herbert Jordan, Simone Pellegrini, Peter Thoman, Klaus Kofler, Thomas Fahringer (University of Innsbruck)

Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting
N. Vaivaswatha, R. Govindarajan (Indian Institute of Science, Bangalore)

Interprocedural Strength Reduction of Critical Sections in Explicitly-Parallel Programs
Rajkishore Barik (Intel Labs), Jisheng Zhao (Rice University), Vivek Sarkar (Rice University)

Session 1B: Power & Energy

ThermOS: System Support for Dynamic Thermal Management of Chip Multi-Processors
Filippo Sironi (Politecnico di Milano), Martina Maggio (Lund University), Riccardo Cattaneo, Giovanni F. Del Nero,
Donatella Sciuto, Marco D. Santambrogio (Politecnico di Milano)

Coordinated Power-Performance Optimization in Manycores
Hiroshi Sasaki, Satoshi Imamura, Koji Inoue (Kyushu University)

An Opportunistic Prediction-based Thread Scheduling to Maximize Throughput/Watt in AMPs
Arunachalam Annamalai, Rance Rodrigues, Israel Koren, Sandip Kundu (University of Massachusetts at Amherst)

Session 2A: GPU & Energy

APOGEE: Adaptive Prefetching on GPU for Energy Efficiency
Ankit Sethia (University of Michigan), Ganesh Dasika (ARM Inc, Austin),
Mehrzad Samadi (University of Michigan), Scott Mahlke (University of Michigan)

Parallel Frame Rendering: Trading Responsiveness for Energy on a Mobile GPU
Jose-Maria Arnau (Universitat Politecnica de Catalunya),
Joan-Manuel Parcerisa (Universitat Politecnica de Catalunya), Polychronis Xekalakis (Intel)

Exploring Hybrid Memory for GPU Enenrgy Efficiency through Software-Hardware Co-Design
     Bin Wang (Auburn University), Bo Wu (College of William and Mary), Dong Li (Oakridge National Lab),
     Xipeng Shen (College of William and Mary), Weikuan Yu (Auburn University),
     Yizheng Jiao (Auburn University), Jeffrey S. Vetter (Oak Ridge National Lab)

Session 2B: Memory System Management

S-CAVE: Effectively Managing SSD Caches in Virtual Machine Environments
     Tian Luo (The Ohio State University), Siyuan Ma (The Ohio State University),
     Rubao Lee (The Ohio State University), Xiaodong Zhang (The Ohio State University),
     Deng Liu (VMWare), Li Zhou (VMWare)

Writeback-Aware Bandwidth Partitioning for Multi-core Systems with PCM
Miao Zhou, Yu Du, Bruce Childers, Rami Melhem, Daniel Mosse (University of Pittsburgh)

L1-Bandwidth Aware Thread Allocation in Multicore SMT Processors
Josué Feliu, Julio Sahuquillo, Salvador Petit, José Duato (Universitat Politècnica de València)

Session 3: Best Papers

A Unified View of Non-monotonic Core Selection and Application Steering in Heterogeneous Chip Multiprocessors
Sandeep Navada, Niket K. Choudhary (Qualcomm),
Salil Wadhavkar, Eric Rotenberg (North Carolina State University)

Memory-centric System Interconnect Design with Hybrid Memory Cubes
Gwangsun Kim, John Kim (KAIST), Jung Ho Ahn, Jaeha Kim (Seoul National University)

Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
Onur Kayiran, Adwait Jog, Mahmut T. Kandemir, Chita R. Das (The Pennsylvania State University)

SMT-Centric Power-Aware Thread Placement in Chip Multiprocessors
Augusto Vega, Alper Buyuktosunoglu, Pradip Bose (IBM)

Session 4A: Runtime & Scheduling

Fairness-Aware Scheduling on Single-ISA Heterogeneous Multi-Cores
Kenzo Van Craeynest (Ghent University), Shoaib Akram (Ghent University), Wim Heirman (Ghent University),
Aamer Jaleel (Intel), Lieven Eeckhout (Ghent University)

DANBI: Dynamic Scheduling of Irregular Stream Programs for Many-Core Systems
Changwoo Min (Sungkyunkwan University and Samsung), Young Ik Eom (Sungkyunkwan University)

An Empirical Model for Predicting Cross-Core Performance Interference on Multicore Processors
     Jiacheng Zhao (Institute of Computing Technology, CAS), Huimin Cui (Institute of Computing Technology, CAS),
     Jingling Xue (University of New South Wales), Xiaobing Feng (University of Computing Technology, CAS),
     Youliang Yan (Huawei), Wensen Yang (Huawei)

Session 4B: Caches & Memory Hierarchy (1)

Jigsaw: Scalable Software-Defined Caches
Nathan Beckmann, Daniel Sanchez (MIT)

Managing Shared Last-level Cache in a Heterogeneous Multicore Processor
Vineeth Mekkat, Anup Holey, Pen-Chung Yew, Antonia Zhai (University of Minnesota)

Reshaping Cache Misses to Improve Row-Buffer Locality in Multicore Systems
Wei Ding, Jun Liu, Mahmut Kandemir, Mary Jane Irwin (The Pennsylvania State University)

Session 5A: GPU

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems
Janghaeng Lee, Mehrzad Samadi, Yongjun Park, Scott Mahlke (University of Michigan)

Starchart: Hardware and Software Optimization Using Recursive Partitioning Regression Trees
Wenhao Jia (Princeton University), Kelly A. Shaw (University of Richmond),
Margaret Martonosi (Princeton University)

RSVM: a region-based software virtual memory for GPU
Feng Ji (NCSU), Heshan Lin (Virginia Tech), Xiaosong Ma (NCSU)

Session 5B: Caches & Memory Hierarchy (2)

The Case for a Scalable Coherence Protocol for Complex on-chip Cache Hierarchies in Many Core Systems
Lucia G. Menezo, Valentin Puente, Jose Angel Gregorio (University of Cantabria)

Meeting Midway: Improving CMP Performance with Memory-Side Prefetching
Praveen Yedlapalli, Jagadish Kotra, Emre Kultursay, Mahmut Kandemir, Chita Das,
Anand Sivasubramaniam (The Pennsylvania State University)

Building Expressive, Area-Efficient Coherence Directories
Lei Fang (Zhejiang University), Peng Liu (Zhejiang University), Qi Hu (Zhejiang University),
Michael C. Huang (University of Rochester), Guofan Jiang (IBM GCG Systems & Technology Lab)

Session 6A: Network, Debugging, & Microarchitecture

Traffic Steering Between a Low-Latency Unswitched TL Ring and a High-Throughput Switched On-chip Interconnect
Jungju Oh, Alenka Zajic, Milos Prvulovic (Georgia Institute of Technology)

McRouter: Multicast within a Router for High Performance Network-on-Chips
Yuan He (The University of Tokyo), Hiroshi Sasaki (Kyushu University),
Shinobu Miwa (The University of Tokyo), Hiroshi Nakamura (The University of Tokyo)

A Debugging Technique for Every Parallel Programmer
Justin Gottschlich, Gilles Pokam, Cristiano Pereira, Youfeng Wu (Intel Corporation)

Breaking SIMD Shackles with an Exposed Flexible Microarchitecture and the Access Execute PDG
Venkatraman Govindaraju, Tony Nowatzki, Karthikeyan Sankaralingam (University of Wisconsin - Madison)

Session 6B: Compiler Optimization

Vectorization Past Dependent Branches Through Speculation
Majedul Haque Sujon (University of Texas - San Antonio), R. Clint Whaley (University of Texas - San Antonio),
Qing Yi (University of Colorado)

Automatic Vectorization of Tree Traversals
Youngjoon Jo, Michael Goldfarb, Milind Kulkarni (Purdue University)

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
Roshan Dathathri, Chandan Reddy, Thejas Ramashekar, Uday Bondhugula (Indian Institute of Science, Bangalore)

Automatic OpenCL Work-Group Size Selection for Multicore CPUs
Sangmin Seo, Jun Lee, Gangwon Jo, Jaejin Lee (Seoul National University)

Conference

Workshops & Tutorials

ACM Student Research Competition

Previous PACTs