MPEG-4 Overview

3. Part 3

Review Pages

MPEG-4: The Interactive Revolution - Page 3

Special consideration has also been given to synthetic or computer generated media for both graphics and audio. As an example, text to speech takes plain text as input, with parameters that characterise the type of voice to be used (age, gender, speech rate), and generates intelligible synthetic speech. A special language called Structured Audio Orchestra Language (SAOL) allows for the definition of instruments that can emulate characteristic sounds of natural acoustic instruments. These instruments can be combined to produce an orchestra, while a musical composition or score is created by sending a time-sequenced set of commands through a language known as Structured Audio Score Language (SASL). Added functionality, which can also be applied to other audio objects, includes speed change without a change in pitch or a change in pitch without altering the time scale.

In the realm of synthetic graphics, things are just as interesting. Among others, two objects, the facial animation object and body animation object are defined, both working in pretty much the same way. Facial Definition Parameters (FDP) and Facial Animation Parameters (FAP) control facial characteristics while the corresponding Body Definition Parameters (BDP) and Body Animation Parameters (BAP) control the virtual body model. It is thus possible to describe and render a face with almost any characteristics, which can change dynamically and whose behaviour can also be defined. Lip movement can be synchronised with text to speech (creating a compound object), to create a talking head.

Artificial movement can also be applied to real objects. For example, a static two-dimensional image of a flag can be made to wave and flutter in the wind by mapping a 2-D mesh onto the image. By applying mathematical transformations to the mesh, and hence because the flag and the 2D mesh are tied together, any alterations on the mesh’s make up will also create the corresponding deformities in the underlying object, giving the illusion of movement.

Another feature is alpha coding, whereby for any given graphics based object, the level of transparency of the rendered pixels is determined by the corresponding pixel value in the alpha channel. That is, the value of a pixel in the alpha channel, determines whether the corresponding pixel in the image is visible or not, and how it blends with any underlying graphical content. This can be used not only to create arbitrarily shaped objects (i.e. not just rectangular), but also to provide smooth blending with any background objects, thus avoiding what is known as aliasing, which appears as jagged edges around any overlayed object.

Objects are represented in a special coded form, so that a minimum amount of information needs to be transmitted to the receiving terminal, and hence rendering of the object can be accomplished locally on the receiving equipment. Local rendering provides several advantages. The local equipment can determine how best to render any object based on its capabilities.

For example, currently on the internet, something as simple as screen resolution poses a major problem for web creators, where they are forced to design for the most common resolution, overflowing outside screen boundaries on smaller resolution screens and leaving wide, blank spaces on larger ones. And if providers wish to cater for wireless devices, then a completely separate design needs to be created, not only due to screen resolutions and capabilities, but also bandwidth constraints. On MPEG-4 enabled devices, this is not a problem with the receiving equipment deciding on how best to render an object.