School of Informatics - University of Edinburgh Institute for Computing Systems Architecture - School of Informatics
Institute for Computing
Systems Architecture
CArD - Compiler and Architecture Design Group

Publications

2013-2012-2011- 2010-2009-2008- 2007-2006-2005- 2004-2003-2002- 2001-
2000-1999-1998- 1997

2013 Top

A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening
Alberto Magni, C. Dubache, and M.F.P. O'Boyle, In Proceedings of the 2013 Conference on High Performance Computing Networking, Storage and Analysis (SC '13), November 2013

Dynamic microarchitectural adaptation using machine learning
C. Dubach, T. M. Jones, and E. V. Bonilla, To appear in ACM Transactions on Architecture and Code Optimization (TACO '13), September 2013

PARTANS: An Autotuning Framework for Stencil Computation on Multi-GPU Systems
Thibaut Lutz, Christian Fensch, and Murray Cole ACM Transactions on Architecture and Code Optimization (TACO '13), Vol 9(4), September 2013

CAeSaR: unified Cluster-Assignment Scheduling and communication Reuse for clustered VLIW processors
Vasileios Porpodas, and Marcelo Cintra, International Conference on Compilers Architecture and Synthesis for Embedded Systems (CASES '13), September 2013

Aligned Scheduling: Cache-efficient Instruction Scheduling for VLIW Processors
Vasileios Porpodas and Marcelo Cintra, Intl. Wksp. on Languages and Compilers for Parallel Computing (LCPC '13), September 2013

DRIFT: Decoupled compileR-based Instruction-level Fault-Tolerance
Konstantina Mitropoulou, Vasileios Porpodas, and Marcelo Cintra, Intl. Wksp. on Languages and Compilers for Parallel Computing (LCPC '13), September 2013.

OpenCL Task Partitioning in the Presence of GPU Contention
Dominik Grewe, Zheng Wang, and Michael F.P. O'Boyle, In International Workshop on Languages and Compilers for Parallel Computing (LCPC '13), September 2013.

Conference Proceedings
Björn Franke, Jingling Xue, SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES '13), Seattle, WA, USA, June 20-21, 2013

Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description
Harry Wagstaff, Miles Gould, Björn Franke, and Nigel P. Topham, Design Automation Conference (DAC '13), June 2013

LUCAS: Latency-adaptive Unified Cluster Assignment and instruction Scheduling
Vasileios Porpodas, and Marcelo Cintra, Conf. on Languages, Compilers and Tools for Embedded Systems (LCTES '13), June 2013

CASTED: Core-Adaptive Software Transient Error Detection for Tightly Coupled Cores
Konstantina Mitropoulou, Vasileios Porpodas, and Marcelo Cintra, Intl. Parallel and Distributed Processing Symposium (IPDPS '13), May 2013.

Designing a Physical Locality Aware Coherence Protocol for Chip-Multiprocessors
Christian Fensch, Nick Barrow-Williams, Robert Mullins, and Simon Moore, IEEE Transaction on Computer, Vol 62(5), May 2013.

A Parallel Dynamic Binary Translator for Efficient Multi-Core Simulation
O. Almer, I. Böhm, T. Edler von Koch, B. Franke, S. Kyle, V. Seeker, C. Thompson, and N. Topham, International Journal of Parallel Programming (IJPP '12), Volume 41, Issue 2, April 2013

The Smart Cache: An Energy-Efficient Cache Architecture Through Dynamic Cache Adaptation
Karthik T. Sundararajan, Timothy M. Jones, and Nigel P. Topham, International Journal of Parallel Programming (IJPP '12), Volume 41, Issue 2, Special issue of best papers from SAMOS 2011, April 2013

Limits of region-based dynamic binary parallelization
Tobias J. K. Edler von Koch and Björn Franke, International conference on Virtual Execution Environments (VEE '13), March 2013

Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems
Dominik Grewe, Zheng Wang, and Michael F.P. O'Boyle, In International Symposium on Code Generation and Optimization (CGO '13), February 2013.

Smart, Adaptive Mapping of Parallelism in the Presence of External Workload
Murali Krishna Emani, Zheng Wang, and Michael F.P. O'Boyle, In International Symposium on Code Generation and Optimization (CGO '13), February 2013.

2012 Top

Exploring and predicting the effects of microarchitectural parameters and compiler optimisations on performance and energy
C. Dubach, T. M. Jones, and M. F. P. O'Boyle, ACM Transactions on Embedded Computing Systems / Special Issue on Software and Compilers for Embedded Systems (ACM TECS), 2012

Compiling a high-level language for gpus (via language support for architectures and compilers)
C. Dubach, P. Cheng, R. Rabbah, D. Bacon, and S. Fink, In Proceedings of the 33rd ACM SIGPLAN Symposium on Programming Language Design and Implementation (PLDI '12)

Autotuning Wavefront Abstractions for Heterogeneous Architectures
Siddharth Mohanty, and M. Cole, In Proceedings of the 2012 Third Workshop on Applications for Multi-Core Architectures (WAMCA '12)

UCIFF: Unified Cluster Assignment, Instruction Scheduling, and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores
Vasileios Porpodas and Marcelo Cintra, International Workshop on Languages and Compilers for Parallel Computing (LCPC '12), September 2012

Cooperative Partitioning: Energy-Efficient Cache Partitioning for High-Performance CMPs
Karthik T. Sundararajan, V. Porpodas, T.M. Jones, N.P. Topham, and B. Franke, International Symposium on High-Performance Computer Architecture (HPCA '12), February 2012

Auto-Tuning Parallel Skeletons
Alexander Collins, Christian Fensch, and Hugh Leather, Parallel Processing Letters, 22(02), 1240005 (16 pages), 2012.

Compiling for Automatically Generated Instruction Set Extensions
Alastair Murray and Björn Franke, Proceedings of the International Symposium on Code Generation and Optimization (CGO '12), April 2012, San Jose, CA, USA.

2011 Top

An empirical architecture-centric approach to microarchitectural design space exploration
C. Dubach, T. M. Jones, and M. F. P. O'Boyle, IEEE Transactions on Computers (IEEE TC), October 2011.

Increasing the Energy Efficiency of TLS Systems Using Intermediate Checkpointing
Salman Khan, Nikolas Ioannou, Polychronis Xekalakis, and Marcelo Cintra, Proceedings of the International Conference on High Performance Computing (HiPC '11), December 2011.

A Machine Learning-Based Approach for Thread Mapping on Transactional Memory Applications
Marcio Castro, Luis Fabricio W. Goes, Christiane P. Ribeiro, Murray Cole, Marcelo Cintra, and Jean-Francois Mehaut, Proceedings of the International Conference on High Performance Computing (HiPC '11), December 2011.

Phase-Based Application-Driven Power Management on the Single-chip Cloud Computer
Nikolas Ioannou, Matthias Gries, Michael Kauschke, and Marcelo Cintra, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '11), October 2011.

Cycle-Accurate Performance Modelling in an Ultra-Fast Just-In-Time Dynamic Binary Translation Instruction Set Simulator
I.Böhm, B.Franke, and N.Topham, Transactions on High-Performance Embedded Architectures and Compilers (HiPEAC '11), Volume 5, Issue 4, 2011.

Scalable Multi-Core Simulation Using Parallel Dynamic Binary Translation
O.Almer, I.Böhm, T.Edler von Koch, B.Franke, S.Kyle, V.Seeker, C.Thompson, and N.Topham, Proceedings of the International Symposium on Systems, Architectures, Modeling, and Simulation (SAMOS '11), Samos, Greece, July 19-22, 2011.

Smart Cache: A Self Adaptive Cache Architecture for Energy Efficiency
K.Sundararajan, T.Jones, and N.Topham, Proceedings of the International Symposium on Systems, Architectures, Modeling, and Simulation (SAMOS '11), Samos, Greece, July 19-22, 2011

Generalized Just-In-Time Trace Compilation using a Parallel Task Farm in a Dynamic Binary Translator
Igor Böhm, Tobias J.K. Edler von Koch, Stephen Kyle, Björn Franke, and Nigel Topham, Proceedings of the ACM SIGPLAN 2011 Conference on Programming Language Design and Implementation (PLDI '11), June 2011.

An Evaluation of an OS-Based Coherence Scheme for Tiled CMPs
Christian Fensch and Marcelo Cintra, International Journal of Parallel Programming (IJPP), Volume 39, issue 3, June 2011.

A Reconfigurable Cache Architecture for Energy Efficiency
K.Sundararajan, T.Jones and N.Topham, Proceedings of the ACM International Conference on Computing Frontiers (CF '11), Ischia, Italy, May 3-5, 2011.

Best Paper Award A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL
Dominik Grewe and Michael F.P. O'Boyle, International Conference on Compiler Construction (CC'11), March 2011.

Automatically Generating and Tuning GPU Code for Sparse Matrix-Vector Multiplication from a High-Level Representation
Dominik Grewe and Anton Lokhmotov, 4th Workshop on General Purpose Processing on Graphics Processing Units (GPGPU '11), March 2011.

A Learning-Based Approach to the Automated Design of MPSoC Networks
O.Almer, N.Topham and B.Franke, Architecture of Computing Systems (ARCS '11), Como, Italy, February 2011.

A Workload-Aware Mapping Approach For Data-Parallel Programs
Dominik Grewe, Zheng Wang and Michael O'Boyle, In 6th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC '11), January 2011.

2010 Top

A Predictive Model for Dynamic Microarchitectural Adaptivity Control
Christophe Dubach, Timothy M. Jones, Edwin V. Bonilla, and Michael F.P. O'Boyle, In 43nd IEEE/ACM International Symposium on Microarchitecture (MICRO '10), December 2010.

Toward a More Accurate Understanding of the Limits of the TLS Execution Paradigm
Nikolas Ioannou, Jeremy Singer, Salman Khan, Polychronis Xekalakis, Paraskevas Yiapanis, Adam Pocock, Gavin Brown, Mikel Lujan, Ian Watson, and Marcelo Cintra, Proceedings of the International Symposium on Workload Characterization (IISWC '10), December 2010.

Best Paper Award Partitioning Streaming Parallelism for Multi-cores: A Machine Learning Based Approach
Zheng Wang and Michael O'Boyle, In 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10), September 2010.

Proximity Coherence for Chip Multiprocessors
Nick Barrow-Williams, Christian Fensch and Simon Moore, In 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10), September 2010.

Best Paper Award Efficient Sequential Consistency Using Conditional Fences
Changhui Lin, Vijay Nagarajan, and Rajiv Gupta, In 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10), September 2010.

Semi-Automatic Extraction and Exploitation of Hierarchical Pipeline Parallelism Using Profiling Information
Georgios Tournavitis and Björn Franke, In 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10), September 2010.

Exploring the Unified Design-Space of Custom-Instruction Selection and Resource Sharing
Marcela Zuluaga and Nigel Topham, Proceedings of the International Symposium on Systems, Architectures, Modeling, and Simulation (IC-SAMOS '10), Samos, Greece, July 19-22, 2010.

Best Paper Award Cycle-Accurate Performance Modelling in an Ultra-Fast Just-In-Time Dynamic Binary Translation Instruction Set Simulator
Igor Böhm, Björn Franke and Nigel Topham, Proceedings of the International Symposium on Systems, Architectures, Modeling, and Simulation (IC-SAMOS '10), Samos, Greece, July 19-22, 2010.

Empirical Evaluation of Data Transformations for Network Infrastructure Applications
Damon Fenacci and Björn Franke, Proceedings of the International Symposium on Systems, Architectures, Modeling, and Simulation (IC-SAMOS '10), Samos, Greece, July 19-22, 2010.

Workload Characterization Supporting the Development of Domain-Specific Compiler Optimizations Using Decision Trees for Data Mining
Damon Fenacci, Björn Franke and John Thomson, 13th International Workshop on Software and Compilers for Embedded Systems (SCOPES'10), June 28-29 2010, Schloss Rheinfels, St. Goar, Germany.

Execution Supression: An Automated Iterative Technique for locating Memory Errors
Dennis Jeffrey, Vijay Nagarajan, Rajiv Gupta and Neelam Gupta, ACM Transactions on Programming Languages and Systems (TOPLAS '10), Vol. 32, No. 5, 36 pages, May 2010.

Profitability-Based Power Allocation for Speculative Multithreaded Systems
Polychronis Xekalakis, Nikolas Ioannou, Salman Khan and Marcelo Cintra, International Parallel and Distributed Processing Symposium (IPDPS '10), April 2010.

Integrated Instruction Selection and Register Allocation for Compact Code Generation Exploiting Freeform Mixing of 16- and 32-bit Instructions
Tobias Edler von Koch, Igor Böhm and Björn Franke, Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization (CGO '10), Toronto, Canada, April 2010.

Statistical Performance Modeling in Functional Instruction Set Simulators
Björn Franke, To Appear In: ACM Transactions on Embedded Computing Systems (TECS '10), 2010.

Adaptive Source-Level Data Assignment to Dual Memory Banks
Alastair C. Murray and Björn Franke, To Appear In: ACM Transactions on Embedded Computing Systems (TECS '10), 2010.

Generating Code for Holistic Query Evaluation
Konstantinos Krikellas, Stratis D. Viglas and Marcelo Cintra, International Conference on Data Engineering (ICDE '10), March 2010.

Adaptive Structured Parallelism for Distributed Heterogeneous Architectures: A Methodological Approach
Horacio Gonzalez-Velez and M. Cole, Concurrency and Computation: Practice and Experience, 2010.

Compiler-Directed Performance Model Construction for Parallel Programs
Martin Schindewolf, David Kramer and Marcelo Cintra, International Conference on Architecture of Computing Systems (ARCS '10), February 2010.

Adaptive Statistical Scheduling of Divisible Workloads in Heterogeneous Systems
H. Gonzalez-Velez and M. Cole, Journal of Scheduling, 2010.

Handling Branches in TLS Systems with Multi-Path Execution
Polychronis Xekalakis and Marcelo Cintra, International Symposium on High-Performance Computer Architecture (HPCA '10), January 2010.

2009 Top

Distance-Aware Round-Robin Mapping for Large NUCA Caches
Alberto Ros, Marcelo Cintra, Manuel E. Acacio and Jose M. Garcia, International Conference on High Performance Computing (HiPC '09), December 2009.

Portable Compiler Optimization Across Embedded Programs and Microarchitectures using Machine Learning
Christophe Dubach, Timothy M. Jones, Edwin V. Bonilla, Grigori Fursin and Michael F.P. O'Boyle, 42nd IEEE/ACM International Symposium on Microarchitecture (MICRO '09), December 2009.

Characterising Effective Resource Analyses for Parallel and Distributed Coordination
P. W. Trinder, M. I. Cole, H-W. Loidl and G. J. Michaelson, International Workshop on Foundational and Practical Aspects of Resource Analysis (FOPARA '09), Eindhoven, The Netherlands, November 2009.

Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators
D. Powell and B. Franke, Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis (CODES+ISSS '09), Grenoble, France, October 11-16, 2009.

Rapid Early-Stage Microarchitecture Design Using Predictive Models
Christophe Dubach, Timothy M. Jones and Michael F.P. O'Boyle, IEEE International Conference on Computer Design (ICCD '09), October 2009.

Energy-Efficient Register Caching with Compiler Assistance
Timothy M. Jones, Michael F.P. O'Boyle, Jaume Abella, Antonio González and Oğuz Ergin, ACM Transactions on Architecture and Code Optimization (TACO '09), volume 6, issue 4, October 2009.

Reducing Training Time in a One-shot Machine Learning-based Compiler
John Thomson, Michael O'Boyle, Grigori Fursin and Björn Franke, In Proceedings of International Workshop on Languages and Compilers for Parallel Computing (LCPC '09), Newark, Delaware, October 2009.

Exploring the Limits of Early Register Release: Exploiting Compiler Analysis
Timothy M. Jones, Michael F.P. O'Boyle, Jaume Abella, Antonio González and Oğuz Ergin, ACM Transactions on Architecture and Code Optimization (TACO '09), Volume 6, issue 3, September 2009.

Introducing Control-Flow Inclusion to Support Pipelining in Custom Instruction Set Extensions
M.Zuluaga, T.Kluter, P.Brisk, P.Ienne and N.Topham, In Proceedings of the 7th IEEE Symposium on Application Specific Processors (SASP '09), pages 114-21, San Francisco, CA, July 2009.

Design Space Exploration of Resource Sharing Solutions for Custom Instruction Set Extensions
Marcela Zuluaga and Nigel Topham, In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD '09), volume 28, issue 12, pages 1788-1801, December 2009.

Code Transformation and Instruction Set Extension
A.C. Murray, R.V. Bennett, B. Franke and N. Topham, ACM Transactions on Embedded Computing Systems (TECS '09), volume 8, issue 4 2009.

Compiler Directed Issue Queue Energy Reduction
Timothy M. Jones, Michael F.P. O'Boyle, Jaume Abella and Antonio González, Transactions on High-Performance Embedded Architectures and Compilers (HiPEAC '09), Volume 4, issue 1, 2009.

Stream Chaining: Exploiting Multiple Levels of Correlation in Data Prefetching
Pedro Diaz and Marcelo Cintra, Proceedings of the International Symposium on Computer Architecture (ISCA '09), June 2009.

Raced Profiles: Efficient Selection of Competing Compiler Optimizations
Hugh Leather, Michael O'Boyle and Bruce Worton, Proceedings of the ACM SIGPLAN/SIGBED 2009 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES '09), June 2009.

Towards a Holistic Approach to Auto-Parallelization: Integrating Profile-Driven Parallelism Detection and Machine-Learning Based Mapping
Georgios Tournavitis, Zheng Wang, Björn Franke and Michael O'Boyle, Proceedings of the ACM SIGPLAN 2009 Conference on Programming Language Design and Implementation (PLDI '09), June 2009.

Combining Thread Level Speculation, Helper Threads, and Runahead Execution
Polychronis Xekalakis, Nikolas Ioannou, and Marcelo Cintra, Proceedings of the International Conference on Supercomputing (ICS '09), June 2009.

Automatic Feature Generation for Machine Learning Based Optimizing Compilation
Hugh Leather, Edwin Bonilla and Michael F.P. O'Boyle, Proceedings of the International Symposium on Code Generation and Optimization (CGO '09), March 2009.

Mapping Parallelism to Multi-cores: A Machine Learning Based Approach
Zheng Wang and Michael F.P. O'Boyle, Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2009.

High Speed CPU Simulation using LTU Dynamic Binary Translation
Daniel Jones and Nigel Topham, Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC '09), pages 50-64, January 2009.

Using Genetic Programming for Source-Level Data Assignment to Dual Memory Banks
Alastair Murray and Björn Franke, Proceedings of the 3rd Workshop on Statistical and Machine Learning Approaches to Architecture and Compilation (SMART '09), pages 75-89, January 2009.

An End-to-End Design Flow for Automated Instruction Set Extension and Complex Instruction Selection based on GCC
Oscar Almer, Richard Bennett, Igor Böhm, Alastair Murray, Xinhao Qu, Marcela Zuluaga, Björn Franke, Nigel Topham, Proceedings of the 1st International Workshop on GCC Research Opportunities (GROW '09), pages 49-60, January 2009.

Towards Automatic Profile-Driven Parallelization of Embedded Multimedia Applications
Georgios Tournavitis and Björn Franke, Proceedings of the 2nd Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG '09), January 2009.

2008 Top

Exploring and Predicting the Architecture/Optimising Compiler Co-Design Space
Christophe Dubach, Timothy M. Jones and Michael F.P. O'Boyle, Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '08), October 2008.

Resource Sharing in Custom Instruction Set Extensions
M. Zuluaga and N. Topham, Proceedings of the 6th IEEE Symposium on Application Specific Processors (SASP '08), pages 7-13, June 2008.

MILEPOST GCC: Machine Learning Based Research Compiler
Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Ayal Zaks, Bilha Mendelson, Edwin Bonilla, John Thomson, Hugh Leather, Chris Williams, Michael O'Boyle, Proceedings of the GCC Summit 2008, June 2008.

A Partial Scan Based Test Generation for Asynchronous Circuits
Dilip P. Vasudevan and Aris Efthymiou, Proceedings of IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems (DDECS '08), April 2008.

Fast Cycle-Approximate Instruction Set Simulation
Björn Franke, Proceedings of the 11th International Workshop on Software and Compilers for Embedded Systems (SCOPES '08), pages 69-78, March 2008.

Fast Source-Level Data Assignment to Dual Memory Banks
Alastair Murray and Björn Franke, Proceedings of the 11th International Workshop on Software and Compilers for Embedded Systems (SCOPES '08), pages 43-52, March 2008.

Instruction Cache Energy Saving Through Compiler Way-Placement
Timothy M. Jones, Sandro Bartolini, Bruno De Bus, John Cavazos and and Michael F.P. O'Boyle, Proceedings of Design, Automation and Test in Europe (DATE '08), March 2008.

An OS-Based Alternative to Full Hardware Coherence on Tiled CMPs
Christian Fensch and Marcelo Cintra, Proceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA '08), pages 355-366, February 2008.

Evaluating the Effects of Compiler Optimisations on AVF
Timothy M. Jones, Michael F.P. O'Boyle and Oğuz Ergin, 12th Annual Workshop on the Interaction between Compilers and Computer Architecture (INTERACT) in conjunction with HPCA-14, February 2008.

Automatic Code Generation Using Dynamic Programming
Igor Böhm, VDM Verlag Dr. Mueller e.K., February 2008.

Automatic Feature Generation for Setting Compilers Heuristics
Hugh Leather, Elad Yom-Tov, Mircea Namolaru and Ari Freund, Proceedings of the 2nd Workshop on Statistical and Machine Learning Approaches to Architecture and Compilation (SMART '08), January 2008.

An Adaptive Parallel Pipeline Pattern for Grids
Horacio Gonzalez and Murray Cole, IEEE International Parallel & Distributed Processing Symposium (IPDPS 2008), pages 1-11, 2008.

Scheduling DAGs on Grids with Copying and Migration
Israel Hernandez and Murray Cole, Parallel Processing and Applied Mathematics 2007 (PPAM 2007), pages 1019-1028 (LNCS 4967), 2008.

2007 Top

Microarchitectural Design Space Exploration Using An Architecture-Centric Approach
Christophe Dubach, Timothy M. Jones, and Michael F.P. O'Boyle, Proceedings of the 40th International Symposium on Microarchitecture (MICRO '07), pages 262-273, December 2007.

A Cost-Aware Parallel Workload Allocation Approach Based on Machine Learning
Shun Long, Grigori Fursin, and Björn Franke, Proceedings of the International Conference on Network and Parallel Computing (NPC '07), pages 506-515 (LNCS 4672), September 2007.

Using Predictive Modeling for Cross-Program Design Space Exploration in Multicore Systems
Salman Khan, Polychronis Xekalakis, John Cavazos, and Marcelo Cintra, Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT '07), pages 327-338, September 2007.

Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Grigori Fursin, Albert Cohen, Michael F.P. O'Boyle and Olivier Temam, In Transactions on High Performance Embedded Architectures and Compilation Techniques, July 2007.

A Compiler Cost Model for Speculative Parallelization
Jialin Dou and Marcelo Cintra, ACM Transactions on Architecture and Code Optimization (TACO), vol. 4, no. 2, June 2007.

Combining Source-to-Source Transformations and Processor Instruction Set Extension for the Automated Design-Space Exploration of Embedded Systems
Richard V. Bennett, Alastair C. Murray, Björn Franke, and Nigel Topham, Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES '07), pages 83-92, June 2007.

Fast Compiler Optimisation using Code-Feature Based Performance
Christophe Dubach, John Cavazos, Björn Franke, Grigori Fursin, Michael O'Boyle and Olivier Temam, Proceedings of the 4th International Conference on Computing Frontiers, pages 131-142, May 2007.

Rapidly Selecting Good Compiler Optimizations using Performance Counters
John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael F.P. O'Boyle and Olivier Temam, Proceedings of the International Symposium on Code Generation and Optimization (CGO '07), pages 185-197, March 2007.

Designing Efficient Processors Using Compiler-Directed Optimisations
Timothy M. Jones, Michael F.P. O'Boyle, Jaume Abella, Antonio González and Oğuz Ergin, 11th Annual Workshop on the Interaction between Compilers and Computer Architecture (INTERACT '07), pages 50-57, February 2007.

MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization
Grigori Fursin, John Cavazos, Michael O'Boyle and Olivier Temam, Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2007), pages 245-260, January 2007.

A Structural Approach for Modelling Performance of Systems using Skeletons
Gagarine Yaikhom, Murray Cole, Stephen Gilmore and Jane Hillston, Electronic Notes in Theoretical Computer Science 190 (ENTCS 2007), pages 167-183, 2007.

Reliable DAG Scheduling with Rewinding and Migration
Israel Hernandez and Murray Cole, First International Conference on Networks for Grid Applications (GridNets 2007), pages 1-8, 2007.

A structural approach for modelling performance of workflow systems
Gagarine Yaikhom, Murray Cole, Stephen Gilmore and Jane Hillston, Proceedings of 5th International Workshop on the Quantitative Aspects of Programming Languages (QAPL 2007), 2007.

Adaptive Structured Parallelism for Computational Grids
Horacio Gonzalez and Murray Cole, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2007), pages 140-141, 2007.

Reactive Grid Scheduling of DAG applications
Israel Hernandez and Murray Cole, Parallel and Distributed Computing and Networks (PDCN 2007), pages 92-97, 2007.

2006 Top

Towards fully adaptive pipeline parallelism for heterogeneous distributed environments
Horacio Gonzalez-Velez and Murray Cole, Proceedings of International Symposium on Parallel and Distributed Processing and Applications (ISPA '06), pages 916-926 (LNCS 4330), December 2006.

Quantifying Uncertainty in Points-To Relations
Constantino G. Ribeiro and Marcelo Cintra, Intl. Workshop on Languages and Compilers for Parallel Computing (LCPC '06), November 2006.

Automatic Performance Model Construction for the Fast Software Exploration of New Hardware Designs
John Cavazos, Christophe Dubach, Felix Agakov, Edwin Bonilla, Michael F.P. O'Boyle, Grigori Fursin, and Olivier Temam, Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES '06), pages 24-34, October 2006.

Method-Specific Dynamic Compilation using Logistic Regression
John Cavazos and Michael F.P. O'Boyle, Proceedings of the 21st Conference on Object-Oriented Programming Languages, Systems, and Applications (OOPSLA '06), pages 229-240, October 2006.

Predictive Search Distributions
Edwin V. Bonilla, Christopher K.I. Williams, Felix Agakov, John Cavazos, John Thomson, and Michael F.P. O'Boyle, Proceedings of the 23rd International Conference on Machine Learning (ICML'06), pages 121-128, June 2006.

Combining measurement and stochastic modelling to enhance scheduling decisions for a parallel Mean Value Analysis algorithm
Gagarine Yaikhom, Murray Cole and Stephen Gilmore, Proceedings of the International Conference on Computational Science (ICCS '06), pages 929-936, May 2006.

Using Machine Learning to Focus Iterative Optimization
Felix Agakov, Edwin V. Bonilla, John Cavazos, Björn Franke, Grigori Fursin, Michael F.P. O'Boyle, John Thomson, Marc Toussaint, and Christopher K.I. Williams, Proceedings of the International Symposium on Code Generation and Optimization (CGO '06), pages 295-305, March 2006.

Iterative Collective Loop Fusion
Tom J. Ashby and Michael F.P. O'Boyle, International Conference on Compiler Construction (part of ETAPS 2006), pages 202-216, March 2006.

Hybrid Optimizations: Which Optimization Algorithm to Use?
John Cavazos, J. Eliot B. Moss, and Michael F.P. O'Boyle, International Conference on Compiler Construction (part of ETAPS 2006), pages 124-138, March 2006.

Self-adaptive skeletal task farm for computational grids
Horacio Gonzalez-Velez, Parallel Computing 32(7-8), pages 479-490, 2006.

2005 Top

Automatic Tuning of Inlining Heuristics
John Cavazos and Michael F.P. O'Boyle, International Conference for High Performance Computing, Networking, and Storage (SC|05), November 2005.

A Practical Method For Quickly Evaluating Program Optimizations
Grigori Fursin, Albert Cohen, Michael F.P. O'Boyle, and Oliver Temam, Proceedings of the 1st International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2005), pages 29-46 (LNCS 3793), November 2005.

Compiler Directed Early Register Release
Timothy M. Jones, Michael F.P. O'Boyle, Jaume Abella, Antonio González, and Oğuz Ergin, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT '04), pages 110-119, September 2005.

A heuristic search algorithm based on Unified Transformation Framework
Shun Long and Grigori Fursin, Proceedings of the 7th International Workshop on High Performance Scientific and Engineering Computing (HPSEC '05), pages 137-144, June 2005.

Design Space Exploration of a Software Speculative Parallelization Scheme
Marcelo Cintra and Diego R. Llanos, IEEE Transactions on Parallel and Distributed Systems, 16(6), pages 562-576, June 2005.

Probabilistic Source-Level Optimisation of Embedded Programs
Björn Franke, Michael F.P. O'Boyle, John Thomson, and Grigori Fursin, Proceedings of the 2005 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'05), pages 78-86, June 2005.

IATAC: A Smart Predictor to Turn-Off L2 Cache Lines
Jaume Abella, Antonio González, Xavier Vera, and Michael F.P. O'Boyle, ACM Transactions on Architecture and Code Optimization, 2(1), pages 55-77, March 2005.

A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems
Björn Franke and Michael F.P. O'Boyle, IEEE Transactions on Parallel and Distributed Systems, 16(3), pages 234-245, March 2005.

Software Directed Issue Queue Power Reduction
Timothy M. Jones, Michael F.P. O'Boyle, Jaume Abella, and Antonio González, Proceedings of the International Symposium on High Performance Computer Architecture (HPCA '05), pages 144-153, February 2005.

2004 Top

Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization.
Jialin Dou and Marcelo Cintra, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '04), pages 203-214, September 2004.

Cross Component Optimisation in a High Level Category-Based Language.
T.J. Ashby, A.D. Kennedy, and M.F.P. O'Boyle, Proceedings of the Euro-Par, pages 654-661 (LNCS 3149), August 2004.

Why Skeletal Parallel Programming Matters
Murray Cole, Proceedings of Euro-Par, page 37 (LNCS 3149), August 2004.

Adaptive Java Optimisation using Instance-based Learning.
Shun Long and Michael O'Boyle, 18th Annual ACM International Conference on Supercomputing (ICS'04), pages 237-246, June 2004.

A comparative study of intrinsic parallel programming methodologies
H. Gonzalez-Velez, A. de Luca and V. Gonzalez-Velez, Proceedings of the First International Conference on Electrical and Electronics Engineering (ICEEE), pages 200-205, June 2004.

Evaluating the performance of skeleton-based high level parallel programs
Anne Benoit, Murray Cole, Stephen Gilmore and Jane Hillston, Proceedings of the International Conference on Computational Science (ICCS '04), pages 289-296 (LNCS 3038), June 2004.

Speculative Parallelization of a Randomized Incremental Convex Hull Algorithm.
Marcelo Cintra, Diego R. Llanos, and Belén Palop, Workshop on Computational Geometry and Applications (CGA), pages 188-197 (LNCS 3045), May 2004.

Fast and Accurate Method for Determining a Lower Bound on Execution Time.
Grigori Fursin, Michael F.P. O'Boyle, Olivier Temam, and Gregory Watts, Concurrency Practice and Experience, 16(2-3), pages 271-292, February 2004.

The Effect of Cache Models on Iterative Compilation for Combined Tiling and Unrolling,
Peter Knijenburg, Toru Kisuki, Kyle Gallivan and Michael F.P. O'Boyle, Concurrency Practice and Experience, 16(2-3), pages 247-270, February 2004.

Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming
Murray Cole, Parallel Computing, 30(3), pages 389-406, 2004.

The Integration of Task and Data Parallel Skeletons
Herbert Kuchen and Murray Cole, Parallel Processing Letters, 12(2), pages 141-156, 2002.

Automated Cost Analysis of a Parallel Maximum Segment Sum Program Derivation
Yasushi Hayashi and Murray Cole, Parallel Processing Letters, 12(1), pages 95-112, 2002.

Static Performance Prediction of Skeletal Programs
Yasushi Hayashi and Murray Cole, Parallel Algorithms and Applications, 17(1), pages 59-84, 2002.

2003 Top

Towards general and exact distributed invalidation.
MIchael F.P. O'Boyle, Rupert W. Ford, and Elena A. Stöhr, Journal of Parallel and Distributed Computing, 63(11), pages 1123-1137, November 2003.

Compiler Parallelization of C Programs for Multi-Core DSPs with Multiple Address Spaces.
Björn Franke , Michael F.P. O'Boyle, ACM SIGDA CODES-ISSS, pages 219-224, October 2003.

Combining Program Recovery, Auto-parallelisation and Locality Analysis for C programs on Multi-processor Embedded Systems.
Björn Franke and Michael F.P. O'Boyle, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '03), pages 104-113, September 2003.

Toward Efficient and Robust Software Speculative Parallelization in Multiprocessors.
Marcelo Cintra and Diego R. Llanos, International Symposium on Principles and Practice of Parallel Programming (PPoPP '03), pages 13-24, June 2003.

Array recovery and high-level transformations for DSP applications.
Björn Franke and Michael F.P. O'Boyle, ACM Transactions on Embedded Computing Systems 2(2), pages 132-162, May 2003.

Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation.
Peter M.W. Knijnenburg, Toru Kisuki, and Michael F.P. O'Boyle, Journal of SuperComputing 24(1), pages 43-67, January 2003.

2002 Top

Iterative Compilation
Grigori Fursin, Michael F.P. O'Boyle, and Peter W.M. Knijnenburg, Proceedings of Languages and Compilers for Parallel Computing (LCPC), pages 171-187 (LNCS 2268), October 2002.

Compile Time Barrier Synchronisation Minimisation.
Michael F.P. O'Boyle and Elena A. Stöhr, IEEE Transactions on Parallel and Distributed Systems 13(6), pages 529-543, June 2002.

Integrating Loop and Data Transformations for Global Optimisation.
Michael F.P. O'Boyle and Peter M.W. Knijnenberg, Journal of Parallel and Distributed Computing 62, pages 563-590, April 2002.

Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors.
Marcelo Cintra and Josep Torrellas, International Symposium on High Performance Computer Architecture (HPCA '02), pages 43-54, February 2002.

2001 Top

An Empirical Evaluation of High Level Transformations for Embedded Processors.
Björn Franke and Michael F.P O'Boyle, International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), pages 59-66, November 2001.

Towards an Adaptive Java Optimising Compiler: An Empirical Evaluation of Program Transformations.
Shun Long and Michael F.P O'Boyle, 3rd Workshop on Java for High Performance Computing, ACM ICS, June 2001.

Compiler Transformation of Pointers to Explicit Array Accesses in DSP Applications.
Björn Franke and Michael F.P O'Boyle, International Conference on Compiler Construction (part of ETAPS 2001), pages 69-87 (LNCS 2027), April 2001.

Towards Automatic Parallelisation for Multi-Processor DSPs.
Björn Franke and Michael F.P O'Boyle, Workshop on Software and Compilers for Embedded Systems (SCOPES '01), March 2001.

Coordinating Heterogeneous Parallel Systems with Skeletons and Activity Graphs
Murray Cole and Andrea Zavanella, Journal of Systems Integration, 10(2), pages 127-143, 2001.

2000 Top

Automatic Array Access Recovery in Pointer based DSP Codes.
Björn Franke and Michael F.P. O'Boyle, 2nd Workshop on Media Processors and DSPs (MP-DSP), IEEE Micro, December 2000.

The Effect of Cache Models on Iterative Compilation for Combined Tiling and Unrolling.
Peter M.W. Knijnenburg, Toru Kisuki, Kyle Gallivan, and Michael F.P. O'Boyle, Proceedings of the 3rd Workshop on Feedback Directed and Dynamic Optimization, pages 31-40, November 2000.

Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation.
Toru Kisuki, Peter M.W. Knijnenburg, and Michael F.P. O'Boyle, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT 2000), pages 237-246, October 2000.

Exact Distributed Invalidation.
Rupert W. Ford, Elena A. Stöhr, and Michael F.P. O'Boyle, Proceedings of the Euro-Par, pages 395-404 (LNCS 1900), August 2000.

Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors.
Marcelo Cintra, Josc F. Martinez, and Josep Torrellas, International Symposium on Computer Architecture (ISCA 2000), pages 13-24, June 2000.

Frame: An Imperative Coordination Language for Parallel Programming
Murray Cole, Technical report EDI-INF-RR0026, 2000.

Activity Graphs: A Model-Independent Intermediate Layer for Skeletal Co-ordination
Murray Cole and Andrea Zavanella, Proceedings of ACM Symposium on Applied Computing (SAC 2000), Vol 1, pages 255-261, March 2000.

1999 Top

Efficient Parallelization using Combined Loop and Data Transformations.
Michael F.P. O'Boyle and Peter M.W. Knijnenburg, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '99), pages 283-291, October 1999.

OCEANS: Optimizing Compilers for Embedded Applications.
Michael Barreteau, Peter M.W. Knijnenburg, Michael F.P. O'Boyle et al, Proceedings of the Euro-Par, pages 1171-1175 (LNCS 1685), August 1999.

BSP-based Cost Analysis of Skeletal Programs
Yasushi Hayashi and Murray Cole, Proceedings of the Scottish Workshop on Functional Programming, pages 20-28, August 1999.

Non-singular Data Transformations: Definition, Validity, Applications.
Michael F.P O'Boyle and Peter M.W. Knijnenburg, International Journal on Parallel Programming 27(3), pages 131-159, June 1999.

A Feasibility Study in Iterative Compilation.
Toru Kisuki, Peter M.W. Knijnenburg, Michael F.P. O'Boyle, François Bodin, and Harry A.G. Wijshoff, 2nd International Symposium on High Performance Computing, pages 121-132 (LNCS 1615), May 1999.

Excel-NUMA: Toward Programmability, Simplicity, and High Performance.
Zheng Zhang, Marcelo Cintra, and Josep Torrellas. IEEE Transactions on Computers, Special Issue on Cache Memory and Related Problems, 48(2), pages 256-264, February 1999.

1998 Top

Integrating Loop and Data Transformations for Global Optimisation.
Michael F.P. O'Boyle and Peter M.W. Knijnenberg, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '98), pages 12-19, October 1998.

Iterative Compilation in a Non-linear Optimisation Space.
François Bodin, Toru Kisuki, Peter M.W. Knijnenburg, Michael F.P. O'Boyle, and Erven Rohou, Proceedings of the Workshop on Profile and Feedback Directed Compilation, Organized in conjunction with PACT'98, October 1998.

OCEANS: Optimizing Compilers for Embedded Systems.
Michel Barreteau, François Bodin, Peter Brinkhaus, Zbigniew Chamski, Henri-Pierre Charles, Christine Eisenbeis, John Gurd, Jan Hoogerbrugge, Ping Hu, William Jalby, Peter M.W. Knijnenburg, Michael O'Boyle, Erven Rohou, Rizos Sakellariou, André Seznec, Elena A. Stöhr, Menno Treffers, and Harry A.G. Wijshoff, Proceedings of the Euro-Par, pages 1123-1130 (LNCS 1470), September 1998.

MARS: A Distributed Memory Approach to Shared Memory Compilation Languages.
Michael F.P. O'Boyle, Compilers and Runtime Systems for Scalable Computing, pages 259-274 (LNCS 1511) May 1998.

First Fast Sink: A compiler algorithm for barrier placement optimisation
Elena A. Stöhr and Michael F.P. O'Boyle, Future Generation Computer Systems, 13(4-5), North-Holland, March 1998.

1997 Top

OCEANS: Optimizing Compilers for Embedded Applications
Bas Aarts, Michel Barreteau, François Bodin, Peter Brinkhaus, Zbigniew Chamski, Henri-Pierre Charles, Christine Eisenbeis, John R. Gurd, Jan Hoogerbrugge, Ping Hu, William Jalby, Peter M.W. Knijnenburg, Michael F.P. O'Boyle, Erven Rohou, Rizos Sakellariou, Henk Schepers, André Seznec, Elena A. Stöhr, Marco Verhoeven, and Harry A.G. Wijshoff, Proceedings of the Euro-Par, pages 1351-1356 (LNCS 1300), August 1997.

Prefetching and Multithreading Performance on a Bus-based Multiprocessor with Petri Nets.
Edward Moreno, Marcelo Cintra, and Sergio Kofuji, Proceedings of the Euro-Par, pages 1017-1024 (LNCS 1300), August 1997.

A Monadic Calculus for Parallel Costing of a Functional Language of Arrays
C. B. Jay, M. I. Cole, M. Sekanina and P. A. Steckler, Proceedings of Euro-Par, pages 650-661 (LNCS 1300), August 1997.

Non-Singular Data Transformations: Definition, Validity and Application.
Michael F.P. O'Boyle and Peter M.W. Knijnenberg, ACM 11th International Conference on Supercomputing, pages 309-316, July 1997.

On Dividing and Conquering Independently
Murray Cole, Proceedings of Euro-Par, pages 634-637 (LNCS 1300), August 1997.

A Graph Based Approach to Minimising Barrier Synchronisation.
Elana A. Stöhr and Michael F.P. O'Boyle, ACM 11th International Conference on Supercomputing, pages 156-163, July 1997.

Barrier Synchronisation Minimisation.
Elana A. Stöhr and Michael F.P. O'Boyle, High Performance Computing and Networking, pages 791-800 (LNCS 1225) April 1997.

Recursive 3D Mesh Indexing with Improved Locality
George Chochia and Murray Cole, High Performance Computing and Networking, pages 1014-1015 (LNCS 1225), April 1997.