《電子技術(shù)應(yīng)用》
您所在的位置:首頁(yè) > 模拟设计 > 设计应用 > 基于FPGA的卷积神经网络并行加速器设计
基于FPGA的卷积神经网络并行加速器设计
2021年电子技术应用第2期
王 婷,陈斌岳,张福海
南开大学 电子信息与光学工程学院,天津300350
摘要: 近年来,卷积神经网络在许多领域中发挥着越来越重要的作用,然而功耗和速度是限制其应用的主要因素。为了克服其限制因素,设计一种基于FPGA平台的卷积神经网络并行加速器,以Ultra96-V2 为实验开发平台,而且卷积神经网络计算IP核的设计实现采用了高级设计综合工具,使用Vivado开发工具完成了基于FPGA的卷积神经网络加速器系统设计实现。通过对GPU和CPU识别率的对比实验,基于FPGA优化设计的卷积神经网络处理一张图片的时间比CPU要少得多,相比GPU功耗减少30倍以上,显示了基于FPGA加速器设计的性能和功耗优势,验证了该方法的有效性。
中圖分類號(hào): TN402
文獻(xiàn)標(biāo)識(shí)碼: A
DOI:10.16157/j.issn.0258-7998.200858
中文引用格式: 王婷,陳斌岳,張福海. 基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)并行加速器設(shè)計(jì)[J].電子技術(shù)應(yīng)用,2021,47(2):81-84.
英文引用格式: Wang Ting,Chen Binyue,Zhang Fuhai. Parallel accelerator design for convolutional neural networks based on FPGA[J]. Application of Electronic Technique,2021,47(2):81-84.
Parallel accelerator design for convolutional neural networks based on FPGA
Wang Ting,Chen Binyue,Zhang Fuhai
College of Electronic Information and Optical Engineering,Nankai University,Tianjin 300350,China
Abstract: In recent years, convolutional neural network plays an increasingly important role in many fields. However, power consumption and speed are the main factors limiting its application. In order to overcome its limitations, a convolutional neural network parallel accelerator based on FPGA platform is designed. Ultra96-v2 is used as the experimental development platform, and the design and implementation of convolutional neural network computing IP core adopts advanced design synthesis tools. The design and implementation of convolutional neural network accelerator system based on FPGA is completed by using vivado development tools. By comparing the recognition rate of GPU and CPU, the convolutional neural network based on FPGA optimized design takes much less time to process a picture than CPU, and reduces the power consumption of GPU by more than 30 times. It shows the performance and power consumption advantages of FPGA accelerator design, and verifies the effectiveness of this method.
Key words : parallel computing;convolutional neural network;accelerator;pipeline

0 引言

    隨著人工智能的快速發(fā)展,卷積神經(jīng)網(wǎng)絡(luò)越來(lái)越受到人們的關(guān)注。由于它的高適應(yīng)性和出色的識(shí)別能力,它已被廣泛應(yīng)用于分類和識(shí)別、目標(biāo)檢測(cè)、目標(biāo)跟蹤等領(lǐng)域[1]。與傳統(tǒng)算法相比,CNN的計(jì)算復(fù)雜度要高得多,并且通用CPU不再能夠滿足計(jì)算需求。目前,主要解決方案是使用GPU進(jìn)行CNN計(jì)算。盡管GPU在并行計(jì)算中具有自然優(yōu)勢(shì),但在成本和功耗方面存在很大的缺點(diǎn)。卷積神經(jīng)網(wǎng)絡(luò)推理過(guò)程的實(shí)現(xiàn)占用空間大,計(jì)算能耗大[2],無(wú)法滿足終端系統(tǒng)的CNN計(jì)算要求。FPGA具有強(qiáng)大的并行處理功能,靈活的可配置功能以及超低功耗,使其成為CNN實(shí)現(xiàn)平臺(tái)的理想選擇。FPGA的可重配置特性適合于變化的神經(jīng)網(wǎng)絡(luò)網(wǎng)絡(luò)結(jié)構(gòu)。因此,許多研究人員已經(jīng)研究了使用FPGA實(shí)現(xiàn)CNN加速的方法[3]。本文參考了Google提出的輕量級(jí)網(wǎng)絡(luò)MobileNet結(jié)構(gòu)[4],并通過(guò)并行處理和流水線結(jié)構(gòu)在FPGA上設(shè)計(jì)了高速CNN系統(tǒng),并將其與CPU和GPU的實(shí)現(xiàn)進(jìn)行了比較。




本文詳細(xì)內(nèi)容請(qǐng)下載:http://m.ihrv.cn/resource/share/2000003393




作者信息:

王  婷,陳斌岳,張福海

(南開(kāi)大學(xué) 電子信息與光學(xué)工程學(xué)院,天津300350)

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容