User:ScotXW/amdgpu

amdgpu is half of the device driver for AMD's GCN-based GPUs. There are two other "halfs" available, radeonsi, that is part of Mesa 3D and a proprietary half called AMDGPU-PRO. Originally the UMD radeonsi was written to work on top of the KMD radeon and AMDs proprietary UMD to work on top of fglrx.ko.

There is also RADV, which implements the Vulkan 1.0 specifications. RADV is strongly based on Intel's ANV driver but works on top of amdgpu (like radeonsi does, only that radeonsi implements OpenGL).

History

 * On January 9th, 2012 the first GCN-based product, the "Tahiti XT"-codenamed GPU (4313 million transistors organized as 2048:128:32) found on the Radeon HD 7970 graphics cards was released. Support for this card was added to the proprietary fglrx.ko as part of the AMD Catalyst for Linux driver package and later to /drm/radeon, serving as basis for the new UMD radeonsi in Mesa. Linux kernel 3.2 was released on January 4th, 2012 and 3.3 on March 18th, the same year.
 * On March 22nd, 2013 the first GCN 2-based product, the "Bonaire XT"-codenamed GPU (2080 million transistors organized as 896:56:16) found on Radeon HD 7790 was released.
 * On October 24th, 2013 the second GCN 2-based product, the "Hawaii XT"-codenamed GPU (6200 million transistors organized as 2816:176:64) was released.
 * On October 7th, 2014 at the XDC 2014 Alex Deucher made the very fist announcement of AMD's new Linux support with a shared open-source kernel-mode driver This was confirmed at the GDC 2015.
 * In April 2015, the initial release of the new amdgpu-stack was announced on the dri-devel mailing list.
 * On August 30th, 2015 Linux kernel 4.2 including drivers/gpu/drm/amd/amdgpu was released; an own mailing list "amd-gfx" has been created for this development.
 * DAL (Display Abstraction Layer) is part of amdgpu and was initially a huge set of patches (about 93,000 LoC) to the Linux kernel; but DAL has not been accepted into the Linux kernel; DAL is required to be distributed as part of AMD's new proprietary Linux driver "AMDGPU-Pro" and is being maintained and developed out-of-tree
 * the DAL code was initially made available to the public in April 2015 on cgit.freedesktop.org/~agd5f
 * On September 17th, 2015 at XDC there was another talk by Deucher and Zhou, the slides mention a new display component called "DAL" and a new new power component called "Powerplay" (though "powerplay" was replaced with "powertune")
 * On 2016-02-11 Harry Wentland proposed to upstream DAL Enabling new DAL display driver for amdgpu on Carrizo and Tonga and was rejected
 * On 2016-09-23 Harry Wentland presented DAL at the XDC2016: DAL presentation
 * On 2016-12-08 Harry Wentland wrote a RFC regarding the acceptance of the DAL patch set
 * The issue with out-of-tree code is that its developers need to rebase it from time to time and align it with current upstream. This consumes a lot of time, that might be used for fixing actual issues. To save time, DAL/DC hasn't been re-based from 4.4 to newer kernels. Though it seems that it has finally been moved to 4.14-devel to be mainlined into 4.15.
 * On February 16th, 2016 Vulkan 1.0 was released Vulkan press-release
 * In 2017-H1 Vega-GPUs will be available: hardware-wise "rasterisation and render output units" were changed and a much improved energy efficiency is expected from that.

Table
1 SteamOS is based on a Debian stable release though a newer Linux kernel is later back-ported. Current version of SteamOS is at Linux kernel 4.1. It seems likely that SteamOS 3 will be based on Debian 9.

Other half
amdgpu is only "half" of the graphics driver stack. Other halfs are Mesa and AMDGPU-PRO. AMDGPU-PRO Driver 17.30 (released 2017-07-27) is available for:
 * RHEL 6.9 / CentOS 6.9 (Linux 2.6.32)
 * RHEL 7.3 / CentOS 7.3 (Linux 3.10)
 * SLED/SLES 12 SP2
 * Ubuntu 16.04.2 (Linux 4.4, but also 4.8 HWE until 16.04.3 scheduled for August 2017)

Wild guesses

 * phoronix ran some tests reported a comparison of "Linux 4.10.0" to the "current DRM-Next state" with "Mesa 17.1-dev". DRM-Next could mean
 * Alex Deucher's: https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.11
 * Dave Airlie's: https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next
 * something else … sadly the author didn't bother to specify exactly what code he tested
 * the exact code of "Mesa 17.1-dev" is also only know to the author…
 * The tested hardware is a Sapphire Radeon RX 470 – a Radeon RX 480 was not tested – on a quad-code Xeon E3-1280 v5 (Skylake).

Deus Ex: Mankind Divided (2016-aug) – using the "Dawn Engine", reportedly based on the Glacier 2 game engine, which itself was used in Hitman: Absolution (2012) – scored significantly higher frame rates on the 4.11-drm-next code. Why? It's would be interesting to compare the two git-branches and see what kind of mechanics achieve twice the performance!

Explanations

 * https://www.x.org/wiki/RadeonFeature/#index5h2

There are three different Device Dependent X (DDX)-drivers available: The interface libraries for amdgpu are libdrm_generic and libdrm_amdgpu, cf. https://lwn.net/Articles/654542/.
 * xf86-video-ati = the old proprietary DDX driver; works on top of fglrx.ko in parallel with the proprietary OpenGL driver.
 * xf86-video-radeon = the FOSS DDX driver in Mesa; works on top of /drm/radeon
 * xf86-video-amdgpu = the FOSS DDX driver by AMD; works in top of /drm/amd/amdgpu in parallel with proprietary OpenGL driver and with Mesa; uses Glamor!

cf. Free_and_open-source_graphics_device_driver, Graphics Core Next, commons:Category:Diagrams illustrating AMD technology


 * Note: AMD publishes 3D shader and command queue documentation, AMD does do NOT publish register docs for recent GPUs. Information:
 * https://fail0verflow.com/media/33c3-slides/index.html#/74
 * https://www.x.org/wiki/RadeonFeature/#index5h2
 * https://wiki.gentoo.org/wiki/AMDGPU
 * drivers/gpu/drm/amd/include/asic_reg/
 * bif = Bus InterFace (implements i.a. the PCI Express endpoint)
 * dce = Display Controller Engine (this actually received an own brand-name: AMD Eyefinity)
 * gca = Graphics and Compute Array
 * gmc = Graphics Memory Controller (implements i.a. the GDDR5 SDRAM/HBM controller)
 * oss = Operating System Services
 * smu = System Management Unit (includes a LatticeMico32, fulfills system management and power management tasks)
 * <tt>uvd</tt> = Unified Video Decoder
 * <tt>vce</tt> = Video Coding Engine

For more statistics see e.g. heise.de and of course the man page for git.

Commands are sent to the GPU by putting them in rings:
 * Graphics ring
 * Compute rings
 * DMA rings (seem to be only GCN 1.1 aka GCN 2nd aka Bonaire, Hawaii and newer)

Commands are processed by the GPU Command Processor. It contains multiple sub-units (ME (Micro Engine), PFP (Prefetch Parser), CE (Constant Engine), DE (Dispatch Engine)), each of which is a custom ‘F32’ CPU running microcode firmware. Rings can call out to Indirect Buffers (IBs) with more commands

GCN 1:

All resource descriptors (sometimes called “fetch constants” on previous hardware) are read from memory instead of registers. In order to improve performance, the CP block adds a new constant engine. The constant engine (CE) runs in parallel to the 3D engine, and allows constants to be written to memory ahead of the main command stream.

Updating Shader Resource Descriptors (SRDs) using the CPU would be too slow. The CP provides a constant update engine to accelerate the process.

PM4 is the packet API used to program the GPU to perform a variety of tasks. The driver does not write directly to the GPU registers to carry out drawing operations on the screen. Instead, it prepares data in the format of "PM4 Command Packets" in either system or video (a.k.a. local) memory, and lets the Micro Engine to do the rest of the job.

There are two CP engines on SI: the Constant Engine (CE) and the Drawing Engine (DE). Previous ASICs only had a DE (previously referred to as the Micro-Engine or ME). Certain packets are only allowed on the DE or the CE. Packets can be executed in the DE via the ring or via an INDIRECT_BUFFER packet. Packets can only be executed on the CE by using the INDIRECT_BUFFER_CONST packet.


 * Draw Engine (DE): The standard graphics engine is now referred to as the Draw Engine. Most PM4 commands continue to be submitted to the DE command buffer.
 * Constant Engine (CE): The constant engine uses a second, separate command buffer to control constant uploads. The engine runs in parallel with the draw engine, allowing constant updates to get sufficiently ahead of the draws/dispatches that will use them. Additionally, CP has 64KiB of on-chip RAM (CE RAM) that acts as a staging buffer for constant updates. Shaders cannot read directly from the CE RAM.

The 64KiB is carved up between the 3 rings, with ring-0 (gfx) having 32KB. The driver is responsible for further subdividing the partitions to store an on-chip copy of the most up-to-date copy of every SRD.

There are two CP engines on GCN 1.0 (SI, Southern Island): the Constant Engine (CE) and the Drawing Engine (DE). Previous ASICs only had a DE (previously referred to as the Micro-Engine or ME).

The purpose of this packet is to provide a generic and flexible way for the CP to write n Dwords of data to any destination to which it has access. As applicable, the writes can be sent from the CE, PFP, ME or DE (Dispatch Engine). The CE (and PFP) are limited to the GRBM (Graphics Register Backbone Manager) and MC (memory controller?) as destinations.

Note: The split between SRBM and GRBM is mostly between "3D engine" and "everything else" – register writes to the 3D engine need to be synchronized with the data flowing through the engine so the register logic is *really* complicated. The rest of the registers don't need to be pipelined and so are managed by a separate block. Most graphics/3D registers sit behind a FIFO, while other (display/asic setup/config) registers do not.

Hardware
Note: The following tables refer always to the most current stable Linux kernel version, i.e. support for Polaris is available only since Linux kernel 4.7 and not present in earlier versions!