MPEG-4 Overview
2. Part 2
MPEG-4: The Interactive Revolution - Page 2
The standard also caters for a wider range of devices, operating over varying communications channels, from broadcast TV and broadband networks down to low bit-rate networks such as dial-up connections and wireless networks (mobile telephone). As with the internet, content compilation and composition is performed at the receiving equipment. Unlike the internet, an MPEG-4 enabled receiving device is capable of intelligently rendering the presentation depending on its capabilities or limitations, scaling down content when necessary or even ignoring it altogether. This means that content needs only to be designed once, leaving it up to the receiving device to decide what and how to render.
The aim eventually, is that the TV set of the future will be able to handle interactive multimedia content along with high quality broadcast content. Similarly, the internet will provide broadcast quality performance and presentations within its already multimedia rich layouts.
As was stated earlier, the internet uses HTML to describe scene content. HTML on its own however, only describes the placement of the content and not its behaviour. Usually, designers need to resort to scripting languages in order to achieve dynamic, interactive presentations. MPEG-4 provides a much more versatile language known as Binary Format for Scenes or BIFS, which is more closely related to the Virtual Reality Modelling Language (VRML), and is widely used on the internet for 3-D modelling and manipulation of 3-D objects.
BIFS not only offers a way to describe the scene’s contents and their placement, but also how objects behave in response to user events. It can also be used to animate objects and change their characteristics dynamically. One other feature of BIFS, and in keeping with the overall philosophy of the MPEG-4 standard to make the most efficient use of the available bandwidth, is that it is a binary format and not text as in the case of HTML and VRML, making it much more compact.
At the heart of any presentation is its content and MPEG-4 is very precise about the definition of its content. A single element used in a presentation is referred to as an object. Objects can be combined to produce compound objects, and a collection of objects and/or compound objects make up a scene.
An object can be natural or synthetic (i.e. computer generated), and includes still images, audio, video, text, 2D and 3D meshes, and synthetic face and body objects. Each object is independent of all other objects and can exist in 2 or 3 dimensional space, including sound. As has been said, objects can be combined to form new, compound objects such as a human figure with its associated voice. The designer or author creates a scene by combining as many objects and compound objects as are required.
An obvious advantage in using objects is reusability, since an object is defined once but can be used as often as required in a scene, with each instance having different characteristics such as size or colour. Perhaps the major advancement however, is the ability for the author to allow the end-user to interact with the scene’s objects – move them to a different location, change an object’s characteristics, change viewpoint. For example, in a hypothetical advertisement, where the latest model four-wheel-drive is cruising down a serene country road, the user can change the colour of the car, swap in the 2-door or 4-door model in the scene, add or remove roof-racks, or even change the background scenery from an autumn to a summer’s day.