DVD Technical Guide
12. Audio Format - Page 2
Review Pages
2. Concepts and Structure of the DVD Format
3. The Future of DVD
4. Design Concept of the Physical Specification
5. Features of the DVD Physical Specification
6. The DVD Data Format
7. Read-Only Disc File Format
8. Video Format
9. Video Format - Page 2
10. Video Format - Page 3
11. Audio Format
12. Audio Format - Page 2
13. Audio Format - Page 3
14. Audio Format - Page 4
15. Audio Format - Page 5
16. DVD-R and DVD-RW
17. DVD-R and DVD-RW - Page 2
18. DVD-R and DVD-RW - Page 3
19. DVD-R and DVD-RW - Page 4
20. DVD-RAM
21. DVD-RAM - Page 2
22. DVD-RAM - Page 3
23. DVD-RAM - Page 4
5.3 Audio Specification
|
5.3.1 Super Hi-Fi stereo audioAs a result of discussions with the music industry, the highest priority in the development of the DVD-Audio specification was given to the ability to perfectly reproduce the musical characteristics desired by music creators. That is, the producers of music strongly requested the features of complete transparency (that is, the playback sound quality is the same as the production sound quality), complete compatibility with the signal processing, editing, etc. capabilities of current and future studio equipment, and the ability to record past musical assets. As a result, linear PCM was chosen as the encoding scheme for the core audio content. Linear PCM is also used in DVD-Video, but as DVD-Audio gives even more weight to audio quality, the capabilities of DVD-Audio are considerably expanded to provide a 96 kHz bandwidth and a 144 dB dynamic range.
5.3.2 Audio specification details Audio Specifications
Audio Object | Video Object | |
---|---|---|
Encoding methods (mandatory) |
|
|
Encoding methods (optional) |
none | |
Audio specifications for Linear PCM and Packed PCM encoding schemes | ||
Sampling frequency | 48/96/192 kHz 44.1/88.2/176.4 kHz |
48/96 kHz |
Quantization depth | 16/20/24 bits | 16/20/24 bits |
Maximum number of channels | 6ch (fs: 48/96/44.1/88.2 kHz) or 2ch (fs: 192/176.4 kHz) |
8ch (2ch for Stereo + 6ch for Multi channel) |
Maximum bit rate | 9.6 Mbps (Linear PCM / Packed PCM) |
6.144 Mbps (Linear PCM) |
Frame rate | 1200Hz ( fs: 48/96/192 kHz) 1102.5Hz (fs: 44.1/88.2/176.4 kHz) |
600Hz (fs: 48/96 kHz) |
As with DVD-Video, audio data in DVD-Audio is combined with header information and management information to form audio packets, which are then combined into 2048-byte packs. These packs are multiplexed with other packs to form Objects and are recorded to the disc. Two types of objects are specified by DVD-Audio. Audio Objects (AOB) are intended for use in main audio playback, while Video Objects (VOB) are used for playback of images and audio. The encoding methods defined for the two kinds of objects are different. The encoding method for VOB follows and is identical to the DVD-Video specification, to maintain compatibility with that format. The AOB format, however, uses a new Linear PCM (Scalable) and Packed PCM to provide higher audio quality.
In order to satisfy various requests from the music industry, the audio specifications for Audio Objects are based on the DVD-Video specification, with extensions to provide further capabilities.
Sampling rates that are multiples of 44.1 kHz were added to allow the recording of currently-existing music assets with perfect transparency and no processing required. Sampling rates of 196 kHz and 176.4 kHz were added to meet the ultra-high bandwidth demands of next-generation audio discs. The bit rate was expanded to 9.6 Mbps to support specification extensions for multi-channel audio, in addition to providing ultra-high bandwidth.
Lossless compression (Packed PCM) was added as a means to achieve 96 kHz, 24-bit, 6-channel (13.824 Mbps) recording and a greater than 74 minute recording length.
5.3.3 Data rate and recording time
DVD-Audio takes advantage of the large (4.7 GB) capacity and high (10.08 Mbps) transfer rate of the DVD format to make it possible to record extremely high quality audio content and multi-channel audio content that just wasn't possible with previous media. On the other hand, DVD-Audio also makes it possible to record over 400 minutes of CD-quality audio. The use of lossless compression further expands the recording time and effective transfer rate (to a maximum of 13.842 Mbps before compression).
5.3.4 Scalable multi-channel
One major benefit provided by the Linear PCM encoding used in the DVD-Audio specification is scalability. Even in actual multi-channel recording, the surround channel signals, which consist primarily of echo, typically have much lower levels than the front channel signals and also require less bandwidth. In such cases, it is possible to improve efficiency by recording the surround channel signals with lower sampling frequencies and quantization bit depths.
In this context, scalability is the concept of grouping the channels into multiple channel groups according to various parameters of the source data, and then setting the optimal sampling frequency and quantization bit depth for each channel group. A channel group is simply a set of channels which are encoded with a common sampling frequency and bit depth. For example, the front three channels could be encoded at 96 kHz, 24 bits per sample, while the rear two channels and LFE (Low Frequency Effect) channel could be encoded at 48 kHz and 16 bits per sample. Or front right and left channels and rear channels could be encoded at 96 kHz, 20 bits per sample, while the center and LFE channels could be encoded at 48 kHz with 20 bits per sample. The following restrictions apply to channel group audio specifications in DVD-Audio.
- There may be at most two channel groups.
- 192 kHz and 176.4 kHz sampling frequencies may not be used in conjunction with scalability.
- The sampling frequencies must have common factors.
- The sampling frequency and quantization bit depth for channel group 2 must be less than or equal to those of channel group 1.
There are 21 allowable configurations of channels assigned to channel groups, and the relationship between channels, groups, and speaker position is also specified. Even if the sampling frequency and bit depth are common to all channels, the channels must be assigned to two channel groups if there are three or more channels.
Note 1: must include left and right front channels Note 2: channel group 1 only, two channels max. Note 3: if the sampling frequency differs between the channel groups, sample timing must be synchronized to the timing of the lower-frequency channel
|
5.3.5 Downmixing
In addition to conventional stereo playback, DVD-Audio also supports multi-channel audio to provide new kinds of sound stages. These new sound stages, wherein each channel provides audio quality which far surpasses that of CD audio, should make music a more powerful experience for the listener than ever before possible. Practically speaking, however, all users may not have an environment which allows them to reproduce multi-channel sound. Further, there will be different kinds of listening environments, such as listening outdoors. For these reasons, the DVD-Audio specification was designed to provide robust support for both multi-channel audio and two-channel stereo, and thus provides for various ways to reproduce multi-channel content in two-channel environments.
One of these methods is called downmixing. Each disc which contains multi-channel audio content can have recorded on the disc the relationship between the various channels and the left and right channels (Lmix and Rmix) of a mixed-down two-channel audio stream. When such content is played, the player performs the downmix processing according to the following formulas.Lmix = 0Lf 1Rf 2C 3Ls 4Rs 5LFE
Rmix = 0Lf 1Rf 2C 3Ls 4Rs 5LFE
(indicates phase (180)The downmix coefficients and may only be set for Linear PCM data in the AOBs, and the values may be different for each track. The coefficients may be set in extremely fine steps, with a minimum step size of 0.2 dB. This allows the artist to provide audio playback with the exactly the desired feel.
5.3.6 Bit shifting
When the quantization bit depths of the channel groups are different in a multi-channel audio stream the player treats both digital full scale values as the same signal level. As a result, when reproducing the multi-channel audio the channels with less bit depth will have quantization noise that is relatively greater, and thus has a larger influence on the total dynamic range. The DVD-Audio specification incorporates a method called bit shifting to reduce this influence.
For instance, if the peak signal level for channel group 2 is less than -12 dB, the upper three bits will always have the same value as the MSB. This means that the signal can be shifted upward by two bits, which allows 18 bits of data to maintain the original 20 bits of precision. After channel group 2 signals are shifted upward, data for channel group 1 is recorded to the disc at 20 bits per sample, while the lower four bits of the channel group 2 signals are truncated and the signals are recorded at 16 bits per sample. At the same time, information indicating that channel group 2 has been recorded with a two-bit upshift is also recorded to the disc. During playback, the channel group 2 signals will be downshifted by two bits. That is, the MSB of the 16-bit data is expanded to add two high-order bits that are the same as the MSB, while two bits of zeros are added to the low-order end of the samples, producing 20-bit data for playback. This allows the precision of an 18-bit sample to be preserved, thereby increasing the dynamic range by 12 dB over what would be obtained by simply truncating the channel group 2 signals to 16-bit samples, and thus reducing the overall noise level. This bit shifting is done to increase the efficiency of sample usage and expand the dynamic range of Linear PCM multi-channel audio. It cannot be used with Packed PCM, and may only be applied to Linear PCM channel group 2 signals. Samples may be shifted by one to four bits, and the shift amount may be changed on a per-track basis. Bit shifting may not be used when both channel groups are using the same quantization bit depth. In combination with the scalability features, bit shifting provides efficient data transfer while maintaining maximum bit resolution for channel group 2 signals.
5.3.7 Audio Selection
Audio selection playback example |
Note: AVTT = Audio Video Title (an audio title recorded in Video Format) AOTT = Audio Only Title (an audio title recorded in the new Audio Format; LPCM, Packed PCM) |
The DVD-Audio specification defines a feature called Audio Selection which allows support for audio data in two different formats in the same track. The user first sets up his system according to the playback capabilities of his player and playback system, and the player can then automatically select the correct audio data to play back between the two formats. The user may also specify the data to be played back, at playback time. Audio Selection is applicable to the following combinations of the two types of data (objects or streams).
- A combination of different numbers of channels (stereo, multi-channel)
- A combination of different encoding methods
- A combination of different numbers of channels and different encoding methods
Review Pages
2. Concepts and Structure of the DVD Format
3. The Future of DVD
4. Design Concept of the Physical Specification
5. Features of the DVD Physical Specification
6. The DVD Data Format
7. Read-Only Disc File Format
8. Video Format
9. Video Format - Page 2
10. Video Format - Page 3
11. Audio Format
12. Audio Format - Page 2
13. Audio Format - Page 3
14. Audio Format - Page 4
15. Audio Format - Page 5
16. DVD-R and DVD-RW
17. DVD-R and DVD-RW - Page 2
18. DVD-R and DVD-RW - Page 3
19. DVD-R and DVD-RW - Page 4
20. DVD-RAM
21. DVD-RAM - Page 2
22. DVD-RAM - Page 3
23. DVD-RAM - Page 4