Xilinx INT8 optimizes embedded vision

Xilinx INT8 optimization provides the best performance and energy efficiency calculations for embedded vision applications using deep learning inference and traditional computer vision. Compared to other FPGA DSP architectures, Xilinx's integrated DSP architecture delivers 1.75x solution-level performance on INT8 deep learning operations.

This white paper explores the use of INT8 operations for embedded vision applications that use deep learning inference and computer vision on Xilinx DSP48E2 slices, as well as comparisons with other FPGAs. Xilinx's DSP architecture achieves 1.75x peak solution-level performance for INT8 multiply-accumulate (MACC) operations compared to other FPGAs that occupy the same amount of resources. Because embedded vision applications can use lower bit precision without sacrificing accuracy, an efficient INT8 implementation is required.

Xilinx's DSP architecture and libraries are carefully optimized for INT8 operations. This white paper describes how to use the DSP48E2 slice in Xilinx 16nm and 20nm All Programmable devices to handle two parallel INT8 MACC operations while sharing the same core weight. This white paper also explains why using the unique Xilinx technology, the minimum bit width of the input is 24 bits. In addition, this white paper details how to use the DSP48E2 slice in SIMD mode for basic arithmetic operations. Examples of how these functions can be used for embedded vision in the field of deep learning or other computer vision processing tasks are also provided.

Book catalog

INT8 for deep learning and computer vision

INT8 operation on Xilinx DSP Slice slices

Scalable INT8 optimization

DSP48E2 SIMD mode

Map INT8 optimization to deep learning applications