The right way to optimize code for Apple’s Metallic graphics framework


There’s a huge array of the way to optimize your Metallic graphics code to get high efficiency. Here is find out how to get began getting your code into higher form for the Metallic framework.

Apple GPU structure

Apple GPUs are Tile-Primarily based Deferred Renderers – which implies they use two important passes: tiling, and rendering. The general rendering pipeline is proven beneath.

You’ll be able to consider these two phases as one when geometry is calculated and created, and one other when all pixel rendering is processed.

In most trendy Apple GPU software program, geometry is calculated and damaged down into meshes and polygons, then rendered to a pixel-based picture, one picture per body.

Trendy Apple GPUs have particular subsections in every core that deal with shaders, textures, a pixel backend, and devoted tile reminiscence. Every core makes use of these 4 areas throughout rendering.

Throughout every body render, a number of passes are used, operating on a number of GPU cores, with every core processing a number of duties. Generally, the extra cores, the higher the efficiency.

Modern Apple GPU rendering pipeline.

GPU Counters

To measure this efficiency, GPU counters are used.

GPU counters hold monitor of every GPU’s load and measure if every does or would not have sufficient work. Additionally they discover efficiency bottlenecks.

Lastly, GPU counters optimize the instructions that take the longest as a way to pace up efficiency.

There are over 100 and fifty sorts of Apple GPU efficiency counters, and protecting all of them is past the scope of this text.

There may be the issue of constructing sense of all of the efficiency counter knowledge. To do that, you employ the Metallic System Hint and Metallic Debugger built-in to Xcode and Devices.

There are 4 Metallic GPU Counters which embrace vital methods to optimize Metallic in your apps and video games. They’re:

  1. Efficiency limiters
  2. Reminiscence Bandwidth
  3. Occupancy
  4. Hidden Floor Removing

Efficiency limiters, or limiter counters measure the exercise of a number of GPU subsystems by discovering the work being executed, and discovering stalls that may block or decelerate parallel execution.

Trendy GPUs execute math, reminiscence, and rasterization work in parallel (on the similar time). Efficiency limiters assist establish efficiency bottlenecks that decelerate your code.

You need to use Apple’s Devices app to make use of efficiency limiters to optimize your code. There are half a dozen totally different efficiency limiters in Devices.

Apple's Instruments app.

Apple’s Devices app.

Reminiscence Bandwidth Counters

Reminiscence bandwidth GPU counters measure transfers between GPU and system reminiscence. The GPU accesses system reminiscence at any time when buffers or textures are accessed.

However remember that System Degree Caches will also be triggered, which implies chances are you’ll sometimes discover small bursts of upper reminiscence throughput than precise DRAM switch speeds. That is regular.

In the event you see a reminiscence bandwidth counter with a excessive worth it doubtless implies that switch is slowing down your rendering. To alleviate these bottlenecks there are a number of issues you are able to do.

One approach to scale back reminiscence bandwidth slowdowns is to scale back the scale of working knowledge units. This speeds issues up as a result of much less knowledge is being transferred from system reminiscence.

One other approach is to solely load knowledge wanted by the present render go, and to solely retailer knowledge wanted by future render passes. This additionally reduces the general knowledge dimension.

You can too use block texture compression (ASTC) to scale back texture asset sizes, and lossless compression for textures generated at runtime.

Occupancy measures what number of threads are at the moment executing out of the overall thread pool. 100% occupancy means a given GPU is at the moment maxed out by way of the variety of threads and general work it will possibly deal with.

The Occupancy GPU counter measures the proportion of whole thread capability utilized by the GPU. This whole is the sum of the compute, vertex, and fragment occupancy.

Hidden Floor Removing often happens someplace in the course of every render go earlier than fragment processing – shortly after the Tiled Vertex Buffer is distributed to the GPU to be rasterized.

Depth buffers and hidden floor elimination are used to get rid of any surfaces that aren’t seen to the view’s digicam within the present scene. This accelerates efficiency as a result of these surfaces do not should be drawn.

For instance, surfaces on the backsides of opaque 3D objects do not should be drawn as a result of the digicam (and the viewer) by no means see them – so there isn’t any level in drawing them.

Surfaces hidden by different 3D objects in entrance of them relative to the digicam are additionally eliminated.

GPU counters can be utilized throughout hidden floor elimination to seek out the overall variety of pixels rasterized, the variety of fragment shaders (really the variety of calls to fragment shaders), and the variety of pixels saved.

GPU counters will also be used to reduce mixing, which additionally incurs a efficiency value.

To optimize drawing with hidden floor elimination, you may need to draw objects by order of visibility state – specifically testing whether or not or not objects are opaque, testing by translucency, and making an attempt to keep away from interleaving opaque and non-opaque meshes.


To get began with Metallic optimization, be sure you take a look at the WWDC movies Optimize Metallic apps and video games with GPU counters from WWDC20, Harness GPUs w Metallic additionally from WWDC20, and Delivering Optimized Metallic Apps + Video games from WWDC19.

Subsequent, learn Capturing a Metallic workload in Xcode and Metallic Debugging Varieties on the Metallic Debugger pages on Apple’s Developer Documentation web site.

There may be additionally Analyzing your Metallic workload within the Metallic Debugger documentation.

You may undoubtedly need to spend a number of time with Xcode’s Metallic Debugger and Hint documentation to study in-depth how the totally different GPU counters and efficiency graphs work. With out these, you’ll be able to’t get a detail-level view of what is really happening in your Metallic code.

For compressed textures, it is also worthwhile to learn up on Adaptive Scalable Texture Compression (ASTC) and the way it works in trendy rendering pipelines.

Metallic efficiency optimization is an unlimited and sophisticated topic – we have simply barely gotten began and can additional discover this matter in future articles.