Review of prominent strategies for mapping CNNs onto embedded systems

Moises Arredondo-Velazquez; Javier Diaz-Carmona; Alejandro-Israel  Barranco-Guti´errez; Cesar  Torres-Huitzil

Authors

Moises Arredondo-Velazquez Electronics Engineering Department, Technological Institute of Celaya, https://orcid.org/0000-0003-0198-274X
Javier Diaz-Carmona Instituto Tecnol´ogico de Celaya, Tecnol´ogico Nacional de M´exico, M´exico, Av. Tecnol´ogico y G. Cubas, s/n, 38010 Celaya, GTO, Mexico.
Alejandro-Israel Barranco-Guti´errez Instituto Tecnol´ogico de Celaya, Tecnol´ogico Nacional de M´exico, M´exico, Av. Tecnol´ogico y G. Cubas, s/n, 38010 Celaya, GTO, Mexico. https://orcid.org/0000-0002-5050-6208
Cesar Torres-Huitzil Tecnologico de Monterrey, Escuela de Ingenier´ıa y Ciencias, Campus Puebla, Av. Atlixcayotl 5718, Puebla C.P. 72453 Puebla, Mexico. https://orcid.org/0000-0002-8980-0615

Keywords:

Convolutional Neural Networks (CNN), Deep Learning, Embedded systems, Field Programmable Gate Arrays (FPGAs), Hardware accelerators, Layer Operation Chaining, Machine Learning, Single computation engine, Streaming architectures

Abstract

Convolutional neural networks (CNN) have turned into one of the key algorithms in machine learning for content classification of digital images. Nevertheless, the CNN computational complexity is considerable larger than classic algorithms, thus, CPU- or GPU-based platforms are generally used for CNN implementations in many applications, but often do not fulfill portable requirements due to resources, energy and real-time constrains. Therefore, there is a growing interest on real time processing solutions for object recognition using CNNs mainly implemented on embedded systems, which are limited both in resources and energy consumption. An updated review of prominent reported approaches for mapping CNNs onto embedded systems is described in this paper. Two main solutions trends for reducing the hardware CNN workload are distinguished through a deduced taxonomy. One is focused on algorithm level solutions to reduce the number of multiplications and CNN coefficients. On the other hand, hardware level solutions goal is to achieve processing time, power consumption and hardware resources reduction. Two dominant hardware level design strategies are pointed out as oriented to either reducing the energy consumption and resources utilization meeting real-time requirements or increasing the throughput at the expense of resources utilization. Finally, two identified design strategies for CNN hardware accelerators are proposed as opportunity research areas.

Downloads

Download data is not yet available.

Review of prominent strategies for mapping CNNs onto embedded systems

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Information