GPU basic knowledge in detail

Graphics Processing Unit (English: Graphics Processing Unit, GPU), also known as display core, visual processor, display chip, is a special type of personal computer, workstation, game console and some mobile devices (such as tablets, smartphones) Etc.) A microprocessor that works on image operations.

The graphics processing unit (or GPU for short) will try to deal with everything from the outside of the PC to the connected display, whether you are playing games, editing videos or just staring at the desktop, all the images in the display are Stopped by the GPU. The geeks in this article will introduce to everyone what is the GPU, what it is, and why it is necessary to set up a common graphics card for games and image-intensive applications.

For the popular user, in reality, it is necessary to "send" the content to the expressor without having to use the self-powered graphics card. Like laptop or tablet users, weekday CPU chips are integrated with the GPU core, which is also known as the "nuclear display". It can be considered as a better cost performance for low-power equipment with low performance requirements.

Because of this, departmental laptops, tablets, and some PC users, it is hard, or even impossible, to upgrade their graphics processors to a higher level. This will lead to inefficiency in the game (and video editing, etc.), and can only set the graphics quality to low ability. For such users, as long as the motherboard support and the remaining space are sufficient, adding a new graphics card can make the (game) performance experience to a new level.

GPU development and status quo

1. The GPU was originally designed to speed up 3D rendering and was later taken for calculation.

2. The GPU now supports generic instructions that can be programmed with traditional C and C++, as well as Fortran.

3. The performance of a single high-end GPU has now reached the performance of traditional multi-core CPU clusters.

4. Some applications can achieve 100X acceleration through GPU acceleration compared to traditional multi-core CPUs. The GPU is still more suitable for some specific applications.

GPU programming model

1. In the GPU, work is allocated by applying or mapping a function (or kernel) in parallel on the scheduling space. For example, each point in a matrix is â€‹â€‹the scheduling space.

2. The kernel is the work that is described in a thread at each point in the scheduling space. In the scheduling space, each thread must start a thread.

3. Since the GPU is a coprocessor on a single PCI-e card, data must be explicitly copied from system memory to the GPU onboard memory.

4. The GPU is organized in the form of multiple groups of SIMDs. In each SIMD group (or warp, 32 threads in NIVIDA CUDA programming), all threads execute the same instruction in lockstep. Such multiple threads executing the same instruction in lockstep are called warp. Although the branches are allowed, if the threads in the same warp have different execution paths, some performance overhead will be incurred.

4. For memory-bound applications, all threads in the same warp should access adjacent data elements, and adjacent threads in the same warp should access adjacent data elements. This may require a rearrangement of the data layout and data access patterns.

5. The GPU has multiple memory spaces for developing data access modes. In addition to the ghost memory, there are also constant memory (read-only, cached), texture memory (read-only, cached, opTImized for neighboring regions of an array) and per-block shared memory (a fast memory space within each warp processor , managed explicitly by the programmer).

6. GPU programming has two main platforms, one is OpenCL, one is similar to OpenGL's industry standard, and the other is C/C++ Fortran's CUDA, which is programmed on NVIDIA's GPU.

7. The OpenCL/CUDA compiler does not convert C code into CUDA code. The main job of the programmer is to choose the algorithm and data structure. For example, on a GPU, cardinal sorting and merge sorting are better than heap sorting and quick sorting. Some programming effort is also required to write the necessary CUDA kernel(s) as well as to add code to transfer data to the GPU,launch the kernel(s), and then read back the results from the GPU.

What application is suitable for the GPU

1. There are many applications of parallel threads in the kernel.

2. Data exchange between threads occurs between adjacent threads in the kernel scheduling space, because per-block shared memory can be used.

3. Data parallel application, multiple threads do similar work, loop is the main source of data parallelism.

4. Applications that get good natural hardware support, such as reciprocal and inverse square roots, but turn on the "fastmath" option in programming to ensure that hardware support is used.

5. Need to do a lot of calculations for each data element, or can make full use of the wide memory interface (wide memory interface has questions here)

6. Do applications with less synchronization.

What app is not suitable for GPU

1. For applications with low degree of parallelism, if the number of threads required is less than 100, the GPU acceleration effect is not obvious.

2. Irregular task parallelism - Although the application requires a lot of threads, but these threads do different work, then the GPU can not be effectively utilized. However, this also depends on the specific work, how often the thread is scheduled, and the acceleration may still exist.

3. Frequent global synchronization, which requires a global barrier, which brings a lot of performance overhead.

4. Random thread-to-point synchronization applications occur between threads. The GPU does not support this. It is usually necessary to make a global barrier every time you synchronize. If you want to use the GPU, it is better to refactor the algorithm to avoid this problem.

5. Applications that require less computation (compared to data throughput). Although the GPU can bring an increase in computing performance in the CPU+GPU computing architecture, these enhancements are covered by the practice of transferring data to the GPU. For example, for two vector summation operations, if a very large vector is used, it is generally chosen to be calculated on the CPU, otherwise the time overhead for transmission to the GPU is large.

Hardware requirements.

You need a NVIDIA GeForce FX or ATI RADEON 9500 or higher graphics card. Some older graphics cards may not support the features we need (mainly single-precision floating-point data access and operations).

Software Requirements

First, you need a C/C++ compiler. You have a lot to choose from, such as: Visual Studio .NET 2003, Eclipse 3.1 plus CDT/MinGW, the Intel C++ Compiler 9.0 and GCC 3.4+ and so on. Then update your graphics card driver to support some of the latest features.

The source code included with this article uses two extension libraries, GLUT and GLEW. For Windows systems, GLUT can be downloaded here, and most of Linux's freeglut and freeglut-devel versions are integrated. GLEW can be downloaded from SourceForge. For shading languages, you can choose GLSL or CG. GLSL will be installed when you install the driver. If you want to use CG, you have to download the Cg Toolkit.

Choose one of them

If you are looking for an example of the DirectX version, take a look at Jens KrÃ¼gers' "Implicit Water Surface" demo (this example also seems to have an OpenGL version). Of course, this is just a highly sampled source code, not a tutorial.

There are some metaprogramming languages â€‹â€‹for GPUs that are completely abstracted from graphics shading programming. The underlying shading language is encapsulated so that you can use the advanced features of the graphics card without learning the shading language. BrookGPU and Sh are two well-known projects. .

Initialize OpenGL

GLUT

GLUT (OpenGLUTIlity Toolkit) This development package mainly provides a set of window functions, which can be used to process window events and generate simple menus. We use it to quickly generate an OpenGL development environment with as little code as possible. In addition, the development package has good platform independence and can run on all current mainstream operating systems (MS-Windows or Xfree/Xorg). On Linux / Unix and Mac).

[cpp] view plaincopy// include the GLUT header file

#include "GL/glut.h"

// call this and pass the command line arguments from main()

Void initGLUT(int argc, char **argv) {

gluTInit ( &argc, argv );

glutCreateWindow("SAXPY TESTS");

}

KNB6-40 Miniature Circuit Breaker

KNB6-40 Mini Circuit breakers, also named as the air switch which have a short for arc extinguishing device. It is a switch role, and also is a automatic protection of low-voltage electrical distribution. Its role is equivalent to the combination of switch. Fuse. Thermal Relay and other electrical components. It mainly used for short circuit and overload protection. Generally, According to the poles, mini Circuit breaker can be divided into 1P , 1P+N , 2P, 3P and 4P.

KNB6-40 Miniature Circuit Breaker,Electronics Miniature Circuits Breaker,Automatic Miniature Circuit Breaker,Mini Circuit Breaker

Wenzhou Korlen Electric Appliances Co., Ltd. , https://www.korlen-electric.com