The 13th International Symposium on Applied Reconfigurable Computing (ARC2017) is held from Monday the 3rd of April to Friday the 7th of April 2017.

On Monday and Friday, there will be tutorials. The main symposium will be held on Tuesday, Wednesday and Thursday.

The schedule below shows the different category sessions in their respective timeslots. You may click the links in the schedule to go to the respective description in the detailed program. Or click here to go to the detailed program directly.

Program PDF version.

ARC2017 Time Schedule:

April 3
April 4
April 5
April 6
April 7
09:00 Tutorial:
Keynote 1:
Onur Mutlu
(ETH Zurich)
Keynote 2:
Walid Najjar
(UC Riverside)
Keynote 3:
Cathal McCabe
10:00 Break
10:30 Tutorial:
PYNQ (cont.)
Adaptive Architectures
Design Space Exploration
FPGA based design
ρ-VEX (cont.)
14:00 Tutorial:
PYNQ (cont.)
Embedded Computing and Security
Fault Tolerance
Neural Networks
ρ-VEX (cont.)
15:30 Break
16:00 Tutorial:
PYNQ (cont.)
Simulation and Synthesis
Social Event
(including dinner)
Languages and Estimation Techniques
ρ-VEX (cont.)


Rethinking Memory System Design
(and the Computing Platforms We Design Around It)
Onur Mutlu (ETH Zurich)
Tuesday, April 4, 09:00

Abstract: The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck. At the same time, DRAM and flash technologies are experiencing difficult technology scaling challenges that make the maintenance and enhancement of their capacity, energy efficiency, and reliability significantly more costly with conventional techniques. In fact, recent reliability issues with DRAM, such as the RowHammer problem, are already threatening system security and predictability.

In this talk, we first discuss major challenges facing modern memory systems in the presence of greatly increasing demand for data and its fast analysis. We then examine some promising research and design directions to overcome these challenges and thus enable scalable memory systems for the future. We discuss three key solution directions: 1) enabling new memory architectures, functions, interfaces, and better integration of memory and the rest of the system, 2) designing a memory system that intelligently employs emerging non-volatile memory (NVM) technologies and coordinates memory and storage management, 3) reducing memory interference and providing predictable performance to applications sharing the memory system. If time permits, we will also touch upon our ongoing related work in combating scaling challenges of NAND flash memory.

An accompanying paper, slightly outdated (circa 2015), can be found here.

Speaker bio: Onur Mutlu is a Professor of Computer Science at ETH Zurich. He is also a faculty member at Carnegie Mellon University, where he previously held the William D. and Nancy W. Strecker Early Career Professorship. His current broader research interests are in computer architecture, systems, and bioinformatics. He is especially interested in interactions across domains and between applications, system software, compilers, and microarchitecture, with a major current focus on memory and storage systems. He obtained his PhD and MS in ECE from the University of Texas at Austin and BS degrees in Computer Engineering and Psychology from the University of Michigan, Ann Arbor. His industrial experience spans starting the Computer Architecture Group at Microsoft Research (2006-2009), and various product and research positions at Intel Corporation, Advanced Micro Devices, VMware, and Google. He received the inaugural IEEE Computer Society Young Computer Architect Award, the inaugural Intel Early Career Faculty Award, faculty partnership awards from various companies, and a healthy number of best paper or "Top Pick" paper recognitions at various computer systems and architecture venues. His computer architecture course lectures and materials are freely available on YouTube, and his research group makes software artifacts freely available online. For more information, please see his webpage at
Acceleration Through Hardware Multithreading
Walid Najjar (University of California Riverside, US)
Wednesday, April 5, 09:00
Abstract: Long memory latencies, as measure in CPU clock cycles, is probably the most daunting challenge to modern computer architecture. In multicore designs the long memory latency is mitigated with the use of massive cache hierarchies. This solution pre-supposes some forms of temporal or spatial localities. Irregular applications, by their very nature, suffer from poor data locality that results in high cache miss rates and long off-chip memory latency. Latency masking multithreading, where threads relinquish control after issuing a memory request, has been demonstrated as an effective approach to achieving a higher throughput. Multithreaded CPUs are designed for a fixed maximum number of threads tailored for an average application. FPGAs, however, can be customized to specific applications. Their massive parallelism is well know, and ideally suited to dynamically manage hundreds, or thousands, of threads. Multithreading, in essence, trades off memory bandwidth for latency. In this talk I describe how latency masking multithreaded execution on FPGAs can achieve a higher throughput that CPUs and/or GPUs on two sets of applications: sparse linear algebra and database operations.
Speaker bio: Walid A. Najjar is a Professor in the Department of Computer Science and Engineering at the University of California Riverside. His areas of research include computer architectures and compilers for parallel and high-performance computing, embedded systems, FPGA-based code acceleration and reconfigurable computing. He received a B.E. in Electrical Engineering from the American University of Beirut in 1979, and the M.S. and Ph.D. in Computer Engineering from the University of Southern California in 1985 and 1988 respectively. From 1989 to 2000 he was on the faculty of the Department of Computer Science at Colorado State University, before that he was with the USC-Information Sciences Institute. He was elected Fellow of the IEEE and the AAAS.
Enabling Software Engineers to Program Heterogeneous, Reconfigurable SoCs
Cathal McCabe (Xilinx)
Thurday, April 6, 09:00
Abstract: In this talk, modern software trends will be explored with a focus on how we can enable software developers to exploit the benefits of reconfigurable hardware. This talk introduces PYNQ, a new open-source framework for designing with Xilinx Zynq devices, a class of All Programmable Systems on Chip (APSoCs) which integrates multiple processors and Field Programmable Gate Arrays (FPGAs) into single integrated circuits. The main goal of the framework is to make it easier for designers of embedded systems to use APSoCs in their applications. The APSoC is programmed in Python and the code is developed and tested directly on the embedded system. The programmable logic circuits are imported as hardware libraries and programmed through their APIs, in essentially the same way that software libraries are imported and programmed.

The framework combines three main elements:

  • the use of a high-level productivity language, Python in this case
  • Python-callable hardware libraries based on FPGA overlays
  • a web-based architecture incorporating the open-source Jupyter Notebook infrastructure served from Zynq's embedded processors
The result is a programming environment that is web-centric so it can be accessed from any browser on any computing platform or operating system. It enables software programmers to work at higher levels of design abstraction and to re-use both software and hardware libraries for reconfigurable computing. The framework is inherently extensible and integrates coherently with hardware–dependent code written in C and C++. The talk concludes with an outline of areas for continued development, and a call for community participation.
Speaker bio:

Cathal McCabe is a senior applications engineer in the Xilinx CTO (Chief Technology Officer) department. He based in the Xilinx European HQ in Dublin, Ireland, and manages the Xilinx University Program in EMEA.

Alongside his existing responsibilities, Cathal has been part of the development team in Xilinx working on hardware and software architectures for PYNQ.


PYNQ Workshop
Monday, April 3, 10:30

PYNQ is an open-source framework that enables programmers who want to use embedded systems to exploit the capabilities of Xilinx Zynq All Programmable SoCs (APSoC). It allows users to exploit custom hardware in the programmable logic without having to use ASIC-style CAD tools. Instead the APSoC is programmed in Python and the code is developed and tested directly on the embedded system. The programmable logic circuits are imported as hardware libraries and programmed through their APIs, in essentially the same way that software libraries are imported and programmed.

The framework combines four main elements: (1) the use of a high-level productivity language, Python in this case; (2) Python-callable hardware libraries based on FPGA overlays; (3) a web-based architecture incorporating the open-source Jupyter Notebook infrastructure served from Zynq's embedded processors; and (4) Jupyter Notebook's client-side, web apps. The result is a web-centric programming environment that enables software programmers to work at higher levels of design abstraction and to re-use both software and hardware libraries.

This tutorial will give a hands-on introduction to PYNQ framework. It will feature the latest version of PYNQ with Python 3.6 and Asyncio support for processor and fabric interrupts. Several new overlays will be introduced along with examples of overlay creation and binding into the PYNQ framework.

ρ-VEX Tutorial
Computer Engineering Laboratory, Delft University of Technology
Friday, April 7, 10:00

On the last day of ARC, a tutorial is organized to familiarize the participants with the ρ-VEX platform that is developed at Delft University of Technology. It is an open-source implementation of a design-time reconfigurable and run-time parameterizable VLIW processor. Design-time reconfigurability is realized by the highly generic VHDL code. It comes with a complete toolchain, simulator, debug & trace hardware and interfacing software.

The tutorial will highlight 2 use cases of the platform; - The FPGA prototype of the dynamic core - An FPGA overlay fabric consisting of 64 cores running on 200MHz targeting streaming image processing workloads

There will also be room for participants to port their application of interest to one (or both) the platforms to experiment with either the reconfigurable properties or the streaming fabric under guidance of the ρ-VEX developers. We have an industrial grade compiler, floating point emulation, math and C standard libraries and a simply Linux port, so we expect to be able to run most applications that are not too complex.
For more information about the platform, see
A full release (4.1) is available on the site if you wish to do some experiments before the tutorial.

Preliminary Program:

  • Intro
  • Demos
  • Release download & setup
  • Lunch
  • Hands-on running programs (compilation, simulation, synthesis & circuit simulation, run on board)
  • Dynamic core
  • Streaming platform
  • Maybe some larger programs that need OS support (FreeRTOS/Linux)
  • (configuration) Scheduling
  • Running participant's code

Detailed Program

TUESDAY (April 04, 2017)

Speaker: Onur Mutlu (ETH Zurich)
Session 1 - Adaptive Architectures
Improving the Performance of Adaptive Cache in Reconfigurable VLIW Processor
Sensen Hu, Anthony Brandon, Qi Guo and Yizhuo Wang
LP-P2IP: A Low-power Version of P2IP Architecture using Partial Reconfiguration (FP)
Álvaro Avelino, Valentin Obac, Naim Harb, Carlos Valderrama, Glauberto Albuquerque and Paulo Possa
NIM: An HMC-based Machine for Neuron Computation (SP)
Geraldo F. Oliveira, Paulo C. Santos, Marco A. Z. Alves and Luigi Carro
VLIW-based FPGA Computation Fabric for Medical Imaging (SP)
Joost Hoozemans, Rolf Heij, Jeroen van Straten and Zaid Al-Ars
Session 2 - Embedded Computing and Security
Hardware Sandboxing: A Novel Defense Paradigm Against Hardware Trojans in Systems on Chip (FP)
Christophe Bobda, Joshua Mead, Taylor Whitaker, Charles Kamhoua and Kevin Kwiat
Rapid Development of Gzip with MaxJ (FP)
Nils Voss, Tobias Becker, Oskar Mencer and Georgi Gaydadjiev
On the Use of (Non-)Cryptographic Hashes on FPGAs (SP)
Andreas Fiessler, Daniel Loebenberger, Sven Hager and Björn Scheuermann
An FPGA-based Implementation of a Pipelined FFT Processor for High-Speed Signal Processing Applications (SP)
Ngoc-Hung Nguyen, Sheraz Khan, Cheol-Hong Kim and Jong-Myon Kim
Session 3 - Simulation and Synthesis
Soft timing closure for soft programmable logic cores: The ARGen approach (FP)
Théotime Bollengier, Loïc Lagadec, Mohamad Najem, Jean-Christophe Le Lann and Pierre Guilloux
FPGA Debugging with MATLAB using a Rule-based Inference System (FP)
Habib Ul Hasan Khan and Diana Göhringer
Hardness Analysis and Instrumentation of Verilog Gate Level Code for FPGA-based Designs (SP)
Abdul Rafay Khatri, Ali Hayek and Josef Börcsök
A Framework for High Level Simulation and Optimization of Coarse-Grained Reconfigurable Architectures (SP)
Muhammad Adeel Pasha, Umer Farooq, Muhammad Ali and Bilal Siddiqui

WEDNESDAY (April 05, 2017)

Speaker: Walid Najjar (University of California Riverside, US)
Session 4 - Design Space Exploration
Parameter Sensitivity in Virtual FPGA Architectures (FP)
Peter Figuli, Weiqiao Ding, Shalina Percy Delicia Figuli, Kostas Siozios, Dimitrios Soudris and Jürgen Becker
Custom Framework for Run-time Trading Strategies (FP)
Andreea Ingrid Funie, Liucheng Guo, Xinyu Niu, Wayne Luk and Mark Salmon
Exploring HLS Optimizations for Efficient Stereo Matching Hardware Implementation (SP)
Karim M. A. Ali, Rabie Ben Atitallah, Nizar Fakhfakh and Jean-Luc Dekeyser
Architecture Reconfiguration as a Mechanism for Sustainable Performance of Embedded Systems in case of Variations in Available Power (SP)
Dimple Sharma, Victor Dimitriu and Lev Kirischian
Session 5 - Fault Tolerance
Exploring Performance and Soft Error Recovery in Dual-Core LockStep ARM A9 Processor Embedded into Xilinx Zynq-7000 APSoC (FP)
Ádria Oliveira, Lucas Antunes Tambara and Fernanda Kastensmidt
Applying TMR in Hardware Accelerators Generated by High-Level Synthesis Design Flow for Mitigating Multiple Bit Upsets in SRAM-based FPGAs (FP)
André Flores Dos Santos, Fabio Benevenuti, Lucas Tambara, Jorge Tonfat and Fernanda Lima Kastensmidt

THURSDAY (April 06, 2017)

Speaker: Cathal McCabe (Xilinx)
Session 6 - FPGA Based Designs
FPGA Applications in Unmanned Aerial Vehicles - A Review (FP)
Mustapha Bouhali, Farid Shamani, Zine Elabadine Dahmane, Abdelkader Belaidi and Jari Nurmi
Genomic Data Clustering on FPGAs for Compression (FP)
Enrico Petraglio, Rick Wertenbroek, Flavio Capitao, Nicolas Guex, Christian Iseli and Yann Thoma
A Quantitative Analysis of the Memory Architecture of FPGA-SoCs (FP)
Matthias Göbel, Ahmed Elhossini, Chi Ching Chi, Mauricio Alvarez Mesa and Ben Juurlink
Best Paper Announcement - Award sponsored by Springer
Session 7 - Neural Networks
Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms (FP)
Ruizhe Zhao, Xinyu Niu, Yajie Wu, Wayne Luk and Qiang Liu
An FPGA Realization of a Deep Convolutional Neural Network using a Threshold Neuron Pruning (FP)
Tomoya Fujii, Shimpei Sato, Hiroki Nakahara and Masato Motomura
Accuracy Evaluation of Long Short Term Memory Network Based Language Model with Fixed-Point Arithmetic (SP)
Ruochun Jin, Jingfei Jiang and Yong Dou
FPGA Implementation of a Short Read Mapping Accelerator (SP)
Mostafa Morshedi and Hamid Noori
Session 8 - Languages and Estimation Techniques
dfesnippets: An Open-Source Library for Data flow Acceleration on FPGAs (FP)
Paul Grigoras, Pavel Burovskiy, James Arram, Xinyu Niu, Kit Cheung, Junyi Xie and Wayne Luk
A Machine Learning Methodology for Cache Recommendation (FP)
Osvaldo Navarro, Jones Mori, Javier Ho mann, Fabian Stuckmann, and Michael Hübner
ArPALib: A Big Number Arithmetic Library for Hardware and Software implementations. A Case Study for the Miller-Rabin Primality Test (SP)
Jan Macheta, Agnieszka D¡browska-Boruch, Paweł Russek and Kazimierz Wiatr