Intel Details Processor Graphics Gen11 Architecture
Intel has released a white paper that focuses on the components of Gen11 architecture, the company's processor graphics technology that will be inlcuded in the near future as part of Intel's 10nm Ice Lake processors scheduled for release later this year.
Intel’s on-die integrated processor graphics architecture offers real time 3D rendering and media performance. In addition, its underlying compute architecture also offers general purpose compute capabilities that delivers up to a teraFLOP performance. The architecture of Intel processor graphics delivers a full complement of high-throughput floating-point and integer compute capabilities, a layered high bandwidth memory hierarchy, and deep integration with on-die CPUs and other on-die SoCdevices. While Gen11 will typically ship with 64EUs, there may be different configurations.
Up to a TERAFLOP Performance Gen11 processor graphics is based on Intel’s 10nm process utilizing the 3rd generation FinFET technology. Additional refinements have been implemented throughout the micro architecture to provide significant performance per watt improvements. Gen11 supports all the major APIs DirectX, OpenGL, Vulka, OpenCL and Metal.
Gen11 consists of 64 execution units (EUs) which increases the core compute capability by 2.67 x1 over Gen9. Gen11 addresses the corresponding bandwidth needs by improving compression, increasing L3 cache as well as increasing peak memory bandwidth. In addition to the raw improvements in compute and memory bandwidth increases, Gen11 introduces key new features that enable higher performance by reducing the amount of redundant work.
Intel’s processors are SoCs integrating multiple CPU cores, Intel Gen11 processor graphics and additional fixed functions all ona single shared silicon die. The architecture implements multiple unique clock domains, which have been partitioned as a per-CPU core clock domain, a processor graphics clock domain, and a ring interconnect clock domain. The SoC architectureis designed to be extensible for a range of products and enable efficient wire routing between components within the SoC.
The following table presents the theoretical peak throughputof the compute architecture of Intel processor graphics, aggregated across the entire graphics product architecture. Values are stated as “per clock cycle”, as final product clock rates were not disclosed by Intel. It also shows a comparison to Gen9 GT2:
Key Gen11 technologies include:
- Coarse Pixel Shading (CPS)
- Position only shading (POSH) tile based rendering
- Adaptive sync
Coarse pixel shading (CPS)
Games today typically provide mechanism to render at lower resolution and then upscale to selected screen resolution to enable playable frame rates with high DPI screens. This may result in artifacts such as aliasing or jaggies resulting in markedly diminished visual quality. Intel says that Coarse pixel shading enables application developers with a new rate control on pixel shading invocations. According to the company, CPS is better than upscaling because it allows developers to preserve the visibility sampling at the render target resolution while sampling the more slowly varying color values at the coarse pixel rate. By removing the upsampling stage, CPS can improve the overall performance.
Position Only Shading Tile Based Rendering (PTBR)
The motivation of tile-based rendering is to reduce memory bandwidth by efficiently managing multiple render passes to data per tile on die.
In order to support tile based rendering, Gen11 adds a parallel geometry pipeline that acts as a tile binning engine. It is used ahead ofthe render pipeline for visibility binning pre-pass per tile. It loops over geometry per tile and consumes visibility stream for that tile. PTBR accomplishes its goal to keep data per tile on die by utilizing the L3 cache which has been enhanced to support color and Z formats. This collapses all the memory reads and writes within the L3cachetherebyreducing the external bandwidth.
Adaptive sync
Adaptive sync is a VESA DisplayPort (DP) standard whose function is to dynamically synchronize the display panel refresh rate with the varying GPU render rate. In Gen11, it provides stutter and tear free gaming possible on eDP panels that support the dynamically adjustable refresh rate range. When the frame render rate by the GPU falls in the supported refresh rate range of the panel, the display controller adapts and syncs the display refresh rate to that of the GPU.
When the GPU render rate is lower than the minimum refresh rate supported by the panel, display controller's low frame rate compensation feature makes sure to fill in the additional frames to decrease visual artifacts. When the GPU render rate is higher than the maximum refresh rate, the frame refresh on the panel occurs at the maximum refresh rate of the panel.
Memory
Intel processor graphics architecture has long pioneered sharing DRAM physical memory with the CPU. This unified memory architecture offers a number of system design, power efficiency, and programmability advantages over PCI Express-hosted discrete memory systems. The obvious advantage is that shared physical memory enables zero copybuffer transfers between CPUs and Gen11 compute architecture.
Gen11 supports LPDDR4 memory technology capable of delivering much higher bandwidth than previous generations.
Display Controller
Like Gen 9, the display controller in Gen11 is also integrated in the system agent largely because of the display’s affinity with memory. Over the life of a device, the display controller can consume far more memory bandwidth that any other client. This means that display controller is also one of the most active participants in power management of the SoC.
Another key feature of Gen11 platform is the integration of the USB Type-C subsystem. The display controller has dedicated outputs for USB Type-CandDisplayPort alt mode is supported on all USB Type-C outputs.
Additionally, output of the display controller can target the Thunderbolt controller, which can also tunnel DisplayPort.
Finally, support for DisplayPort Adaptive Sync has been added for the embedded display. This feature allows thedisplay in combination with a supported monitor to adjust the refresh rate based on the workload.