Arm Announces Ethos-N57 and Ethos-N37 NPUs, Mali-G57 GPU and Mali-D37 DPU
Arm is launching two new mainstream ML processors, as well as the company's latest Mali graphics and display processors.
The new suite of IP includes:
- Ethos-N57 and Ethos-N37 NPUs; enabling AI applications and balancing ML performance with cost, area, bandwidth, and battery life constraints
- Mali-G57 GPU; the first mainstream Valhall architecture-based GPU, bringing performance improvements
- Mali-D37 DPU; delivering a rich display feature set within the smallest area, making it the perfect DPU for entry-level devices and small display screens
Ethos-N57 and Ethos-N37 NPUs: enabling heterogeneous compute
Following the earlier introduction of the Arm ML processor (now referred to as the Ethos-N77), the Ethos-N57 and Ethos-N37 are the newest members of the Ethos NPU family. Arm Ethos is a suite of products designed to solve complex AI and ML compute challenges allowing the creation of more personalized experiences in everyday devices. As consumer devices become smarter there is a need for additional AI performance and efficiency via dedicated ML processors. Optimized for the most cost and battery life-sensitive designs, the new Ethos NPUs promise to deliver premium AI experiences on our everyday devices.
Both the Ethos-N57 and Ethos-N37 are designed with some basic principals in mind, for example:
- Optimized around support for Int8 and Int16 datatypes
- Advanced data management techniques minimizing data movement and associated power
- Over 200% performance uplift over many other NPUs through techniques such as Winograd implementation
Further, Ethos-N57 features include:
- Designed to provide a balance of ML performance and power efficiency
- Optimized for 2 TOP/s ML performance range
Ethos-N37 features include:
- Design to provide the smallest footprint ML inference processor (<1mm2)
- Optimized for 1 TOP/s ML performance range
Arm is already offering the Ethos-N77. Formerly known as the Arm ML processor, it remains the NPU for premium applications, with internal memory footprint configurable from 1MB to 4MB.
Here is how the numbers stack up:
Product |
Throughput |
MAC/Cycle |
Internal Memory |
Target |
Ethos-N77 |
Up to 4 TOP/s |
2048 8x8 |
1-4 MB |
Computational photography, premium smartphones, AR/VR |
Ethos-N57 |
Up to 2 TOP/s |
1024 8x8 |
512 KB |
Mainstream smartphones, smart home hubs |
Ethos-N37 |
Up to 1 TOP/s |
512 8x8 |
512 KB |
Smart cameras, entry smartphones, DTV |
All of these processors bring end-to-end compression technology that lowers DRAM requirements, minimizing system bandwidth by 1.5-3x with lossless compression for weights and activations using clustering, sparsity and workload tiling. This allows easy integration into existing designs without major modification to the memory structure. Ethos NPUs also provide hardware support for Winograd – and power-gating optimizations for sparsity that allow demanding ML workloads to be run at the endpoint.
The Ethos processor family can be used with Arm NN – an inference engine that bridges the gap between existing NN frameworks and the underlying CPU, GPU, and NPU IP. Arm NN allows developers to write applications just once, yet still target a wide range of endpoints. This is because Arm NN provides an abstraction layer, eliminating the challenges of programming multiple, heterogeneous processors and allowing workloads to be run across devices like phones, TVs, and throughout the smart home, with minimal effort.
Mali-G57 GPU
Mali-G57 brings high-fidelity gaming, console-like graphics on mobile, 4K/8K user interfaces on DTVs, and more complex augmented and virtual reality workloads to the mainstream. This is the biggest segment of the mobile market and Arm's recent announcement with Unity highlights our work to further optimize performance on Arm-based SoCs, CPUs, and GPUs, allowing developers to spend more time creating new content.
Key features of Mali-G57 include:
- 1.3x better performance density across a range of content compared to Mali-G52
- 1.3x improvements in energy efficiency leading to longer battery life
- Foveated rendering support for VR and 60 percent better on-device ML performance for more complex xR workloads
Mali-G57 is the first mainstream GPU based on the new Valhall architecture, following the premium Mali-G77 GPU launched back in May this year.
The GPU offers a 30 percent better performance density across a range of content, from high-fidelity games like Fortnite to high resolution 2D content, compared with Mali-G52 GPU. In addition, Mali-G57 provides double the texturing performance when compared with Mali-G52. This improves the performance for high resolution UI in 4K and 8K DTVs, AR and VR and gaming. The increase in compute and texture capabilities makes Mali-G57 a good choice to tackle more complex 3D workloads, such as physically based rendering (PBR), HDR rendering and volumetric effects, that are becoming common features on mobile devices.
Whereas the premium Mali-G77 has at least 7 cores, Mali-G57 has 1 to 6 cores depending on the configuration. The smaller area lends itself to mainstream smartphones solutions.
Mali-G77 saw a big uplift in ML performance – up to 60 percent – with Mali-G57 seeing similar improvements that will take more complex ML workloads to mainstream devices. The 60 percent increase in on-device ML performance is made possible by 2x more FMAs when compared to Mali-G52 (depending on the configuration) and architectural optimizations. This provides faster responsiveness to a wide range of ML use cases common on today’s smartphones, such as speech recognition, face detection and image quality enhancement.
The Valhall architecture is the current basis of Arm’s latest generation GPUs, and contains various improvements and new features following the Bifrost architecture. In addition to aligning better with the modern Vulkan API, the key features of the architecture consist of a new superscalar engine, a simplified scalar ISA and a new dynamic scheduling of instructions.
Mali-D37 display processor
The Mali-D37 DPU packs an array of display and performance features within the smallest area possible. For the end-user, this means better visuals and performance on lower cost devices where area matters most, such as entry-level smartphones, tablets, and small display screens up to 2K.
Key features of Mali-D37 include:
- Extreme area efficiency, with DPU configurable to an area of less than 1mm2 on 16nm for Full HD and 2K resolutions
- System power savings of up to 30% through offloading core display tasks from the GPU and memory management features, including MMU-600
- Preservation of key display features from premium Mali-D71, including mixed HDR and SDR composition when combined with Assertive Display 5
The display processor, a mainstream to entry-level DPU based on the Komeda architecture, keeps some of the Mali-D71’s core display and performance features, all within the smallest area possible. This is the first time that an Arm Mali DPU has targeted entry-level mobile markets. These range from mainstream and entry-level smartphones and tablets to small screens such as those in cars and airplanes and the latest smart speakers and home appliances (even these have small display screens).
The premium Mali-D71 utilizes dual pipeline Komeda architecture and can drive dual displays, supporting up to 4K resolutions with a maximum of 8 composition layers (8 layers while driving single display, 4 layers each when driving dual displays). Mali-D37 is the area optimised Komeda architecture for mainstream and entry-level devices running a single pipeline with 4 composition layers.
Despite being designed for mainstream and entry-level devices, Mali-D37 preserves some core features that make Mali-D71 such a DPU. When combined with Assertive Display 5 and with the help of some key image processing features. Mali-D37 offers a HDR experience, even on SDR displays, through supporting HDR10 and HLG, as well as Mixed HDR-SDR composition. This is achieved through performing necessary format conversions, composition and alpha-blending of mixed HDR and SDR layers simultaneously, resulting in a number of different display options, such as full screen HDR video, full screen HDR video with SDR UI/controls, HDR video picture-in-picture (PiP) with SDR background, or HDR video PiP with SDR UI and SDR background layer. The combination with Assertive Display 5 also delivers great sunlight visibility due to the product’s local tone mapping engine.
In addition, Mali-D37 maintains the same memory management benefits of Mali-D71 due to the MMU-600 and AFBC 1.2 support. The direct interface between the DPU and MMU-600 optimizes memory management providing memory latency and bandwidth benefits. It also contains a TLB prefetch to support efficient memory management for all pixel formats and rotations.
Depending on the use-case of performing scaling, composition and rotation operations, Mali-D37 can save up to 30 percent system power compared to running the same operations on the GPU.
The additional features of Assertive Display 5 and memory management also contribute towards the system power savings. The AFBC encoder unity saves memory bandwidth and system power through solving the rotation of uncompressed layers. Assertive Display offers display backlight power savings by preserving the visibility even at reduced backlight. Finally, the native command mode support of the Mali-D37 enables panels to self-refresh for static scenes, leading to significant power savings on the panel.