Intel Unveils New Benchmark Tool For PC Platform Performance Evaluation
Intel unveiled a different approach to the sensitive task evaluating the performance of a PC platform, with the introduction of two new benchmark tools. The tools will be capable of evaluating the performance of a PC during gameplay and A/V consumer products.
The first utility is codenamed "Gaming Capability Assessment Tool" (GCAT) and the second one "Digital Home Capability Assessment Tool" (DHCAT). Information about both tools will be soon available through the www.intelcapabilitiesforum.net website, managed by Intel.
Intel's new approach to platform testing moves away from synthetic benchmarks and timedemos. Both synthetic and timedemos do a good job of testing and compiling the raw technical capabilities of a piece of hardware. Synthetic benchmarks are excellent for testing things like data throughput and memory latency, while timedemos do a reasonable job of testing complex loads that combine the various performance aspects of a piece of hardware.
Intel claims that the problem with the current benchmarks is that they are difficult to interpret. For example, it believes that measuring the average frame rate of a game demo does not safely lead to results that show the actual provided gaming experience.
In addition, timedemos are typically disable critical game features, such physics and AI, according to Intel. A timedemo is just a 3D movie; no actual gameplay is involved, so the average fps that is reported is based on how the system renders a 3D movie, not an actual game. However, actual games do use computing power for physics and AI, and this has an impact on gaming performance.
Intel Gaming Capabilities Assessment Tool (G-CAT)
Intel's Gaming Capabilities Assessment Tool (G-CAT) is unique in two respects. The results are based on an actual three minute session of gameplay, and they are given not in average frames per second, but in a five-point "user satisfaction" scale.
What Intel came up with is still related to frame rate, but it is no longer the average. Instead, two separate mathematical models are used to predict how actual users would react: The Threshold Model and the Bayesian Model.
The Threshold Model takes into account the fact that frame rate is irrelevant as long as it is imperceptible to humans. According to a reserarch, the frame rate had no effect on how the users rated their experience so long as it was above 40-45 fps. Below this threshold, frame rate did affect the user experience. So, instead of using the average frame rate to predict user experience, the Threshold Model uses the number of frames below the threshold (in a three minute period). The higher the number of frames below the threshold, the lower the users would rate their experience.
The Bayesian Model takes the variability of the frame rate into account as well as the speed. This should help increase the confidence interval for its results in most cases, but there are a few situations where its error of prediction can be quite high.
Neither of the two models that Intel proposed are perfect. However, both could be considered as superior to the existing timedemo style of benchmark because they reflect the actual user's experience, not just the average frame rate, which has been shown to be a poor indicator of the gaming experience. Intel includes the results from both models in the Gaming Capabilities Assessment Tool, so it is easy to compare them side by side.
Intel does not yet consider the G-CAT a finished product. Although it is scheduled for its 1.0 release sometime soon, there is a substantial roadmap for future features and improvements. Not only will the pool of games and genres be expanded thanks to additional market research that is being conducted, but work is being done to develop a general model that can estimate the gaming experience for all games.
Intel Digital Home Capability Assessment Tool (DH-CAT)
The purpose of the Digital Home Capabilities Assessment Tool (DH-CAT) is similar to that of the G-CAT: To evaluate the performance of a system in terms of how the user experiences it subjectively. The difference is in the type of experience. The G-CAT is designed to evaluate gaming performance, and thus focusses on 3D performance. However, there is another common application that is also quite system-intensive: Media playback. Although most any modern system is powerful enough to handle simple DVD playback, the recording and playing HDTV footage is requires a significantly more powerful system.
Unlike the G-CAT, the DH-CAT is not based on market research about user experience. Instead it evaluates a system based on which tasks it can perform with adequate performance. Because the tool is designed to test a system's suitability for use as a Media Center / Home Theater PC, it is quite obvious what tasks it must be capable of: Playing, recording, and even streaming various audio and video formats.
While Intel has designed these tools for Intel's customers to evaluate its platforms, both DH-CAT and GCAT tools will run on any platform that runs Windows XP or Windows MCE.
Although Intel's new approach to performance assessment could be an important step for the industry, it's possible that they're being developed partly because Intel has been edged consistently by AMD in standard processor benchmarks for the last year or two.
Intel's new approach to platform testing moves away from synthetic benchmarks and timedemos. Both synthetic and timedemos do a good job of testing and compiling the raw technical capabilities of a piece of hardware. Synthetic benchmarks are excellent for testing things like data throughput and memory latency, while timedemos do a reasonable job of testing complex loads that combine the various performance aspects of a piece of hardware.
Intel claims that the problem with the current benchmarks is that they are difficult to interpret. For example, it believes that measuring the average frame rate of a game demo does not safely lead to results that show the actual provided gaming experience.
In addition, timedemos are typically disable critical game features, such physics and AI, according to Intel. A timedemo is just a 3D movie; no actual gameplay is involved, so the average fps that is reported is based on how the system renders a 3D movie, not an actual game. However, actual games do use computing power for physics and AI, and this has an impact on gaming performance.
Intel Gaming Capabilities Assessment Tool (G-CAT)
Intel's Gaming Capabilities Assessment Tool (G-CAT) is unique in two respects. The results are based on an actual three minute session of gameplay, and they are given not in average frames per second, but in a five-point "user satisfaction" scale.
What Intel came up with is still related to frame rate, but it is no longer the average. Instead, two separate mathematical models are used to predict how actual users would react: The Threshold Model and the Bayesian Model.
The Threshold Model takes into account the fact that frame rate is irrelevant as long as it is imperceptible to humans. According to a reserarch, the frame rate had no effect on how the users rated their experience so long as it was above 40-45 fps. Below this threshold, frame rate did affect the user experience. So, instead of using the average frame rate to predict user experience, the Threshold Model uses the number of frames below the threshold (in a three minute period). The higher the number of frames below the threshold, the lower the users would rate their experience.
The Bayesian Model takes the variability of the frame rate into account as well as the speed. This should help increase the confidence interval for its results in most cases, but there are a few situations where its error of prediction can be quite high.
Neither of the two models that Intel proposed are perfect. However, both could be considered as superior to the existing timedemo style of benchmark because they reflect the actual user's experience, not just the average frame rate, which has been shown to be a poor indicator of the gaming experience. Intel includes the results from both models in the Gaming Capabilities Assessment Tool, so it is easy to compare them side by side.
Intel does not yet consider the G-CAT a finished product. Although it is scheduled for its 1.0 release sometime soon, there is a substantial roadmap for future features and improvements. Not only will the pool of games and genres be expanded thanks to additional market research that is being conducted, but work is being done to develop a general model that can estimate the gaming experience for all games.
Intel Digital Home Capability Assessment Tool (DH-CAT)
The purpose of the Digital Home Capabilities Assessment Tool (DH-CAT) is similar to that of the G-CAT: To evaluate the performance of a system in terms of how the user experiences it subjectively. The difference is in the type of experience. The G-CAT is designed to evaluate gaming performance, and thus focusses on 3D performance. However, there is another common application that is also quite system-intensive: Media playback. Although most any modern system is powerful enough to handle simple DVD playback, the recording and playing HDTV footage is requires a significantly more powerful system.
Unlike the G-CAT, the DH-CAT is not based on market research about user experience. Instead it evaluates a system based on which tasks it can perform with adequate performance. Because the tool is designed to test a system's suitability for use as a Media Center / Home Theater PC, it is quite obvious what tasks it must be capable of: Playing, recording, and even streaming various audio and video formats.
While Intel has designed these tools for Intel's customers to evaluate its platforms, both DH-CAT and GCAT tools will run on any platform that runs Windows XP or Windows MCE.
Although Intel's new approach to performance assessment could be an important step for the industry, it's possible that they're being developed partly because Intel has been edged consistently by AMD in standard processor benchmarks for the last year or two.