One of the biggest challenges facing audio signals after digital encoding is the problem of massive data storage and transmission. The compression technology of digital audio signals is a very important part of the digital television broadcasting system. Compression efficiency and compression quality directly affect the transmission efficiency of digital TV broadcasting and the transmission quality of audio and video. This paper mainly makes a shallow analysis of digital audio compression technology.
Digital signals have obvious advantages over analog signals, but digital signals also have their own corresponding disadvantages, namely, the need for storage capacity and the increase in channel capacity requirements for transmission. Audio compression technology refers to the use of appropriate digital signal processing techniques for the original digital audio signal stream (PCM encoding) to reduce (compress) its code rate without loss of useful information or negligible loss. It is called compression coding. It must have a corresponding inverse transform called decompression or decoding. In general, audio compression technology can be divided into two categories: lossless data compression and lossy data compression.
Lossless Data Compression Use a lossless compression scheme to recover raw data information bit by bit after decompression. They eliminate statistical redundancy present in the audio signal by predicting values ​​in past samples. A small compression ratio can be achieved, preferably about 2:1, depending on the complexity of the original audio signal. Time domain predictive coding techniques make lossless compression feasible, thanks to time domain predictive code technology. they are:
1. Differential Algorithm The audio signal contains repetitive sounds, as well as a large amount of redundant and perceived uncorrelated sounds. The repeated data information is deleted during the encoding process and reintroduced during decoding. The audio signal is first decomposed into a number of sub-bands containing discrete tones. The DPCM is then applied using a predictor suitable for short term periodic signals. This encoding is adaptive and looks at the input signal energy to modify the quantization step size. This leads to the so-called adaptive DPCM (ADPCM).
2. The entropy encoder utilizes redundancy in the quantized subband coefficient representation to improve entropy coding efficiency. These coefficients are transmitted in increasing frequency order, producing large values ​​at low frequencies, and long strokes near zero at high frequencies. The VLC is taken from a different Huffman table that most closely matches the statistics of the low and high frequency values.
3. Block floating-point system The binary values ​​from the A/D conversion process are grouped into data blocks, either in the time domain, by using adjacent samples at the A/D conversion transmission end; or in the frequency domain, by FDCT The output uses adjacent frequency coefficients. The binary value in the data block is then scaled up so that the largest value is only below the fully scaled value. This scaling factor is called an index and is common to all values ​​in the block.
Therefore, each value can be determined by a mantissa (a sample value) and a positive number. The bit allocation calculation is derived from the HAS model, and the method of implementing data rate compression is to send an index value once per data block. The coding performance is good, but the noise is related to the signal content. Shielding techniques help to reduce this audible noise.
Lossy Data Compression The method of lossy data compression is to combine two or more processing techniques to take advantage of the fact that the HAS cannot detect specific spectral components in other high amplitudes. In this way, a high-performance data compression scheme and a much higher compression ratio from 2:1 to 20:1 can be obtained, depending on the complexity of the encoding/decoding process and the audio quality requirements.
Lossy data compression systems use perceptual coding techniques. The basic principle is to discard all signals below the threshold curve to eliminate perceptual redundancy in the audio signal. Therefore, these lossy data compression systems are also known as perceptual lossless. Perceptual lossless compression is feasible thanks to a combination of several technologies, such as:
1. Time and frequency domain masking of signal components.
2. Quantify the noise mask for each audible tone by assigning enough bits to ensure that the quantization noise level is always below the masking curve. At frequencies near the audible signal, an SNR of 20 or 30 DB is acceptable.
3. Joint Coding This technique takes advantage of redundancy in multi-channel audio systems. A large amount of the same data has been found in all channels. Therefore, data compression can be obtained by encoding these same data at a time, and indicating to the decoder that the data must be repeated in other channels.
The most important shielding effect of the audio decoding process is in the frequency domain. To take advantage of this property, the audio signal spectrum is decomposed into multiple sub-bands according to the time and frequency resolution matched to the HAS critical bandwidth.
The structure of the perceptual encoder consists of the following parts:
1. Multi-band filters are often referred to as filter banks and their function is to decompose the spectrum into sub-bands.
2. The bit allocator is used to estimate the masking threshold and assign bits based on the spectral energy of the audio signal and the psychological model.
3. Conversion and Quantization Processor 4. The data multiplexer is used to receive the quantized data and add sub-information (bit allocation and scaling factor information) of the decoding process.
3.1 Filter bank (there are three types of filter banks)
(1) Subband group. The signal spectrum is divided into equal-width frequency sub-bands. This is similar to the HAS process of frequency analysis, which divides the audio spectrum into critical bands. The width of the critical sub-band is variable. The bandwidth below 500 Hz is 100 Hz, and at 10 kHz, the bandwidth above 10 kHz is increased to several KHz. Subbands below 500H contain several critical bands. Subband filters have small overlaps and are typically used for adjacent time samples. Each subband signal is then uniformly quantized with the bit allocation of the subband to maintain a positive masking noise ratio (MNR). This ratio is positive when the shielding curve is above the noise curve.
(2) Conversion group. The modified DCT (MDCT) algorithm is commonly used to convert time domain audio signals into a large number of sub-bands (256 to 1024). There is also some overlap in this filter bank.
(3) Hybrid filter bank. They consist of a subband filter followed by an MDCT filter. This combination provides a finer frequency resolution.
3.2 Perceptual Models, Shielded Curves, and Bit Assignments Accurate psychoanalysis of input PCM signals is performed on their frequency and energy content, using the Fast Fourier Transform algorithm. The shielding curve is calculated from the hearing threshold and the frequency masking property of the HAS. The shape and level of the masking curve is related to the signal content. The difference between the spectral signal envelope and the masking curve determines the maximum number of bits (based on 6 dB per bit) required to encode all spectral components of the audio signal. This bit allocation process ensures that the quantization noise is below the audible threshold.
The masking threshold for each subband is derived from the masked curve. Each threshold determines the maximum quantized noise energy that is acceptable in each sub-band, which is audible at the threshold being at the beginning of the perceived lossless compression system noise.
3.3 The scaler and quantizer from each subband filter output sample is converted and quantized by two methods:
(1) Block floating point system. The system normalizes the maximum value in the data block to a fully scaled value. This block scaling factor is transmitted within the data stream and the decoder uses it to down convert all data values ​​in the block. In the MPEG first layer, the data block consists of 12 consecutive samples. The audio time consists of 384 samples (32 subbands, 12 samples per subband). The values ​​of all data blocks are then quantized and the quantization step value is determined by the bit allocator.
(2) Noise distribution and scalar quantization. In the previous method, each subband has a different scaling factor.
The second method uses the same scaling factor for several bands with approximately critical bandwidth. The value of this scaling factor is not derived from the standard process, but is part of the noise distribution process. Bit allocation is not performed here. After estimating the sub-band masking threshold, the scaling factor is used to modify all the quantization step values ​​in the scaling factor band to modify the quantization noise structure to better match the threshold frequency line. A non-uniform quantization process is used to adapt the quantization noise to the signal amplitude in an optimized manner. The next step is to encode the audio spectral values ​​with Huffman coding and get better data compression.
3.4 Data Multiplexer The block of 12 data samples from each quantizer output is multiplexed with other corresponding scaling factors and bit allocation information to form an audio frame in the encoded bitstream. Optional auxiliary data can also be inserted into the bitstream. The MPEG standard does not specify the types of data that can be transmitted and how these data types are formatted in the bitstream.
72V Charger,Electric Bike Chargers,Smart Charger,72V Lithium Battery Charger
HuiZhou Superpower Technology Co.,Ltd. , https://www.spchargers.com