Xilinx INT8 optimizes embedded vision

Xilinx INT8 optimization provides the best performance and energy efficiency calculations for embedded vision applications using deep learning inference and traditional computer vision. Compared to other FPGA DSP architectures, Xilinx's integrated DSP architecture delivers 1.75x solution-level performance on INT8 deep learning operations.

This white paper explores the use of INT8 operations for embedded vision applications that use deep learning inference and computer vision on Xilinx DSP48E2 slices, as well as comparisons with other FPGAs. Xilinx's DSP architecture achieves 1.75x peak solution-level performance for INT8 multiply-accumulate (MACC) operations compared to other FPGAs that occupy the same amount of resources. Because embedded vision applications can use lower bit precision without sacrificing accuracy, an efficient INT8 implementation is required.

Xilinx's DSP architecture and libraries are carefully optimized for INT8 operations. This white paper describes how to use the DSP48E2 slice in Xilinx 16nm and 20nm All Programmable devices to handle two parallel INT8 MACC operations while sharing the same core weight. This white paper also explains why using the unique Xilinx technology, the minimum bit width of the input is 24 bits. In addition, this white paper details how to use the DSP48E2 slice in SIMD mode for basic arithmetic operations. Examples of how these functions can be used for embedded vision in the field of deep learning or other computer vision processing tasks are also provided.

Book catalog

INT8 for deep learning and computer vision

INT8 operation on Xilinx DSP Slice slices

Scalable INT8 optimization

DSP48E2 SIMD mode

Map INT8 optimization to deep learning applications

Other ways to create an INT8 link MACC

Map INT8 optimization to computer vision

Custom 2D convolution with scalable INT8 optimization

Median filter using SIMD operations

competition analysis

Compare Intel's Arria 10 devices to Xilinx's Zynq® UltraScale+ TM MPSoC in competitive analysis. When comparing the computational efficiency of embedded vision applications, the selected devices have comparable DSP density and device power consumption:

• Arria 10 SoC: SX220, SX270 and SX480

• Zynq UltraScale+ MPSoC: ZU3, ZU7 and ZU9 devices

Focus on general MACC performance for a wide range of applications including deep learning and computer vision.

Lighting

SHENZHEN CHONDEKUAI TECHNOLOGY CO.LTD , https://www.szfourinone.com

Posted on