Overview of the Graphics Pipeline

15 min readDec 24, 2023

I’ve found that 3D rendering involves a number of techniques & terms which some people aren’t too familiar with, so I’d like to demystify the process a bit. I’ll be speaking from my experience in Unity, but some of these details might vary in other situations.

With a better understanding of how rendering works, you can start to create more interesting and realistic graphics, write your own shaders, or just better appreciate different graphics techniques in 3D games & animations. So let’s get into it.

What’s the Graphics Pipeline?

By “graphics pipeline”, I’m referring to the process which renders a set of 3D information into a 2D image, ready to be sent to a screen or written to an image file.

Akin to a factory assembly line, each “station” along the pipeline has a different job, and they all work together to assemble a final product.

The history of 3D modeling roughly dates back to the ’60s, when Ivan Sutherland developed one of the earliest computer-aided design systems (called Sketchpad).

For anyone interested in the interplay between art and technology, it’s extremely useful to at least understand the basics of this fascinating process.

Before Rendering

Of course, first you need some kind of 3D content to be rendered.

Since this article is really more about rendering, I’ll only cover the content creation process very briefly. If you’re interested in learning more about how 3D content is created, check out the links at the end of the article!

3D Modeling

As you probably know, any 3D scene is made of individual 3D models, called meshes. Each mesh first be created in a modeling program such as Blender. Artists can sculpt these virtual objects vertex-by-vertex, or use higher-level tools to sculpt the desired geometry.

UV Mapping

Before a mesh can receive a texture, it must first be “unwrapped”, because texture files are 2D. This process is analogous to peeling an orange and laying the peel flat.

Each point on the mesh is associated with a 2D coordinate, and this is stored within the mesh file as a UV map. This allows specific parts of a 2D texture to correspond with particular surfaces on the 3D model.

(The terms U and V refer to the axis of the 2D texture space — as X/Y/Z are already used for the 3D object space)

If you’ve ever seen a 3D render that had visible seams or weird gaps in the texture, that’s due to poor UV mapping. Precise UV maps are crucial to ensure that the textures will look natural once applied to the model. Creating an efficient UV map that minimizes seams and distortion is a form of art in itself.

Texturing

A UV-mapped mesh is ready for a texture to be projected onto it. This is where models gain personality and realism.

It’s important to understand that textures aren’t just for color — they can also be normal maps (which fake the illusion of bumps and dents), specular maps or roughness maps (dictating reflective properties), and any other effect that a technical artist can dream up. In any case, the idea is to imbue surfaces with the illusion of real-world properties, convincing the eye they have depth, wear, or reflectivity.

Rigging & Animation

If the model has moving parts, such as limbs, it must be rigged. Rigging involves assigning a skeleton to a mesh, where bones and joints define how the model can be animated. Weights are assigned to each vertex, signaling how much influence a particular bone has over it.

Once properly rigged, animators can manipulate the bones to create animations.

Setting the Scene

Individual meshes are then placed into some kind of scene, where they are positioned, rotated, and scaled to build the final layout of the world that will be rendered.

This stage also involves setting up lights that simulate real-world lighting conditions, cameras that define the perspective and field of view, and other environmental factors (like fog or particle systems) that contribute to the atmosphere of the scene.

Rendering

Finally, the scene is loaded into the rendering engine or game engine.

Here, additional pre-processing might occur such as further optimization, level-of-detail (LOD) generation, collision mesh generation for physics interactions, and the setup of data structures needed for efficient rendering (such as octrees or BSP trees).

Only after all these steps are completed, the rendering pipeline can take over, processing the fully prepared scene to generate the final images you see on the screen or in a rendered video.

Stages of the Pipeline

The 3D scene is transferred to the graphics card piece-by-piece by packing vertex and material data into vertex buffer objects, which the graphics card is able to read more quickly from VRAM (instead of constantly pulling the data from system RAM).

Vertex Processing

Once the graphics card has received the 3D scene information, the first stage of the actual pipeline is vertex processing, led by the vertex shader.

Vertex data is a combination of information, such as position, vertex color (not always used), vertex normals, and texture coordinates.

First, the Model Transformation translates the vertex positions from their local (model) space to a globally common coordinate system known as world space.
Then, the View Transformation rotates and translates the vertices according to the camera’s position and orientation to create a camera-specific viewpoint space.
Next, Projection Transformation is applied, which manipulates vertex positions to a perspective division, emulating the way the human eye perceives depth: objects closer to the camera appear larger compared to those further away. This calculation uses a Projection Matrix, which specifies the 3D volume to be projected (often referred to as the “frustum”), to determine what should be visible on-screen.
After the vertices are processed, Clipping is performed to discard any geometry outside the defined viewing frustum. The surviving vertices are passed through the perspective divide which normalizes their positions to a 1x1x1 unit cube — a process known as Normalized Device Coordinates (NDCs) transformation.
Further viewport transformation then scales these NDCs to screen space coordinates, resulting in vertices properly laid out on a 2D plane ready for rasterization.

That’s how the vertex stage lets us convert 3D space coordinates into 2D screen coordinates! We’re already one step closer to the final 2D render.

Primitive Assembly

Once our vertices have been transformed and projected in the vertex shader stage, we move on to primitive assembly.

In this stage, the transformed vertices are grouped into triangles. Graphics hardware can only render primitive shapes (like triangles) because vertices are just infinitesimal points in space; they have no surface area or volume.

Triangles can represent a surface with any shape and complexity by tessellating them, i.e., fitting them together like tiles, to form the desired geometry. And their mathematical simplicity makes them highly efficient for rendering calculations.

So the pipeline takes a set of vertices and applies rules based on the primitive type: two vertices form an edge, three edges form a triangular face, and so on.

Also during primitive assembly, an optimization process known as face culling usually occurs. Face culling discards primitives (faces) that don’t contribute to the final image. For instance, back-face culling removes triangles facing away from the camera. Culling helps to speed up rendering by conserving computational resources (there’s no need to account for faces that aren’t visible in the scene).

Later on in the pipeline, these flat faces can be cleverly textured and shaded to create the impression of having more 3D depth and detail than they actually do.

Geometry Shader & Tessellation

In more advanced pipelines, tessellation & geometry shaders further refine shapes immediately after vertex processing.

Tessellation is the process of subdividing surface geometry into smaller patches, allowing for finer detail. Controlled via Hull and Domain shaders, tessellation operates by taking “patch primitives” (a grouping of vertices) and tesselates them based on control points and tessellation factors. The tessellated output are mathematically refined to produce smoother surfaces, enhancing fine features, like the curvature on a character’s face or the subtleties of a dynamic surface such as rippling water.

Following tessellation, the Geometry Shader receives these primitives and can add vertices and primitives or modify existing ones. This flexibility enables a variety of complex visual effects. For instance, the geometry shader can take a single triangle and output a volume of geometry to simulate 3D fur or grass. It operates on whole primitives (points, edges, or triangles) as input, and can output zero or more primitives, making it highly versatile (but also potentially performance-intensive).

Basically, this is a way of manipulating vertex positions within the shader. In the case of a water surface, a low-polygon plane can be converted into a high-poly, realistic water body by adjusting the vertex positions based on mathematical wave functions, adding suitable details, and then further refined by tessellation for a natural appearance.

It’s worth noting that (even though the geometry shader receives primitives as inputs) the geometry shader’s output is a group of vertex sequences which may need to undergo another primitive assembly step prior to rasterization.

Rasterization

Rasterization is essentially where the abstract 3D primitives start to get real 2D representation. So, a rasterizer needs to figure out which pixels on your screen correspond to which triangle (from the primitive stage).

A useful analogy is that of a spotlight casting the primitives’ silhouettes onto a flat surface.

To accomplish this, it utilizes a scan-line method or an edge function to test each pixel, one-by-one, to see if it lies within the boundaries of a given primitive.

Once a pixel is determined to lie within a primitive (i.e. inside a given triangle), it’s converted into a fragment. A fragment is closely related to the concept of a pixel, they can be considered potential pixels. Each one represents all the data required to compute the final pixel color: position, depth, and vertex attributes like color, texture coordinates, and normal vectors.

This data is calculated through a process called interpolation, where vertex properties are blended across the fragments.

The problem here is that we lose an entire dimension. So, Z-buffering (depth testing) is performed to manage the apparent depth of each fragment. Each fragment carries a ‘z’ value that signifies its depth in the scene. The rasterizer checks this value against a depth buffer which stores depth information of all the pixels rendered so far. This way, we ensure that only the closest fragment to the camera is used for each pixel, to appropriately handle opaque obstructions (called “occlusions”).

The fragments which pass the depth test are then transferred to the next stage.

Fragment Shader

At this stage in the pipeline, fragments are processed by their corresponding fragment shaders (sometimes also referred to as pixel shaders). Keep in mind that each fragment contains data generated by previous pipeline stages such as vertex positions, interpolated texture coordinates, normals, and color data.

The function of the fragment shader is to calculate the final color and other attributes of each pixel on the screen. In other words, it’s the pixel-level shader, where previous shaders worked on the vertex level.

Fragment shaders are highly programmable and can be written using high-level shading languages such as GLSL (OpenGL Shading Language), HLSL (High-Level Shading Language for Direct3D), or Metal Shading Language for the Metal API. This shader code can receive a wider variety of inputs:

Interpolated values from the vertex shader (like texture coordinates, normal vectors, and vertex colors).
Uniforms which are constant during the rendering of a mesh and can be used to pass scene information like lights or the camera properties.
Samplers that can refer to textures which the shader can access to retrieve color data or perform more complex operations like shadow mapping or normal mapping.

Since shaders at this stage are so fine-grained and flexible, this is where a lot of really awesome visual effects can happen. To fully understand the power of fragment shaders, we need to talk about texture sampling and texture filtering.

Notes on Texture Sampling & Filtering

Often, a fragment needs to get information from a texture (like color information). Remember, fragments are essentially potential pixels.

Once the texture is loaded into VRAM, it needs to perform a lookup operation (such as texture2D, depending on the shader language). This operation samples from the coordinate on the texture image which corresponds to the fragment's position.

This is where mesh UV’s become relevant again. Because the vertices of each triangle on the mesh are assigned a UV coordinate, the graphics pipeline is able to take these coordinates and interpolates them across the surface of the triangle to provide the individual fragments with their corresponding UVs.

In other words, the sample coordinate is initially derived from the vertex, but then it’s interpolated using the primitive to determine the coordinate of each individual fragment. That’s how the UVs help the fragment shader sample and manipulate textures on the pixel level (instead of just the vertex level, as before).

Texture data are sampled on the pixel level by using these interpolated UV coordinates. For every fragment generated, the graphics card performs a texture lookup operation to fetch the texel that corresponds to the fragment’s UVs.

Texture filtering is then used to resolve the scenarios where multiple texels may correspond to a single screen pixel (magnification) or when multiple screen pixels may sample from a single texel (minification). In simpler terms, filtering is necessary is because the mapping between texture coordinates and screen pixels is almost never one-to-one.

Magnification (texture is closer than its natural resolution): The texture appears larger than its source resolution. In this case, filtering options like Bilinear or Trilinear filtering may be used, which blend the nearest texels.
Minification (texture is farther away than its natural resolution): The texture appears smaller than its source resolution. Here, the process can become more complex, as simply picking the nearest texel (as in Nearest Neighbor filtering) would produce a very pixelated look. This is where Mipmapping comes into play. Mipmaps are in fact pre-calculated, down-scaled versions of the texture that are used when the texture is viewed at smaller sizes. For even higher quality, Anisotropic filtering can be applied (which takes into account the angle at which the texture is viewed).

Most graphics cards have a lot of specialized hardware designed to make this happen as quickly as possible. For example, texture mapping units perform high-speed lookups and filtering just for this purpose.

(The actual hardware and driver implementations could optimize this process in various ways, but conceptually this is the role textures play within the fragment shader stage.)

Notes on Graphics Hardware

Now that we’ve covered several different types of shaders, let’s briefly talk about how they’re actually executed. You might be thinking this is a huge amount of data to process in the short time that is one frame, and you’d be right.

One of the key features of modern GPUs is their shader cores, which are individual processing units designed to handle the various types of shaders used in the graphics pipeline (vertex, geometry, tessellation, fragment/pixel, and compute shaders). You see, unlike CPUs that typically have fewer cores with higher individual processing capabilities, GPUs consist of thousands of smaller, more specialized cores that work concurrently to quickly process massive blocks of data. These cores can run thousands of shader programs in parallel, allowing for the simultaneous manipulation of incredible quantities of vertices and/or fragments.

Furthermore, as you might imagine, many stages in the graphics pipeline can be processed in parallel. While a group of vertices are being processed in the vertex shader stage, previously processed vertices can be assembled into primitives, and even earlier processed fragments can be undergoing pixel operations at the same time.

And as a final note, GPUs leverage SIMD architecture, where one instruction is applied to multiple data points simultaneously. For example, the same lighting calculation can be applied to all the vertices of a mesh, all in a single pass, which significantly speeds up processing.

All this is to say that graphics cards are highly optimized for processing many, many pixels in a very short amount of time. SIMD, in addition to parallel processing capability, is essential for the real-time rendering (in video games and interactive media) that we know and love.

Depth & Stencil Testing

This is often the final stage after all fragments have been rasterized. Here the colors of the pixels are blended to create the final image.

There’s a lot of flexibility in different blending functions. It’s used for temporal anti-aliasing (TAA) and motion blur. Transparent or semi-transparent materials can be simulated with alpha blending. In the “replace” blending mode, then the final color actually replaces the fragment color in the framebuffer instead of blending with it. And so on.

Post-Processing

As the name suggests, the post-processing stage happens after the render has been processed (but before the final image is sent to the screen).

Some effects can’t be integrated into earlier stages because they depend on information from the complete render. For instance, a certain effect might sample a group of neighboring pixels, which means the pixel colors must be calculated before the effect is possible.

Some common examples include:

Bloom: An approximation of how bright light disperses to create a “glow” effect
Color Correction: Adjusts color tones, like contrast or saturation, giving technical artists more control over the mood of the final render
Depth of Field: Simulates the effect of a camera lens focus, where a certain depth is emphasized and other depths get blurred
Motion Blur: Simulates blurring when objects (or the camera) move rapidly
Anti-Aliasing: Rasterization can produce jagged edges because of the square nature of pixels. AA is a technique of smoothing out these edges, but it can be performance-intensive

Postprocessing is usually done within an additional fragment shader, processed just before the presentation stage.

Blitting / Presenting

We’re basically done; it’s time to present the final image data.

At this point in the process, the color data has been written to an off-screen buffer, called the backbuffer. In order to display the render on the screen (or write it to a file), that backbuffer needs to be copied over to the front buffer.

If you’re familiar with data structures, think of a buffer as an array of pixels.

Blitting — short for “block image transfer” — is the term for pushing image data to the front buffer, i.e., transferring the entirety of the pixel data to the screen or to a file.

On older GPUs, blitting is often performed using specialized hardware (predictably known as the “blitter”), which is optimized for copying pixel data between buffers.

Modern GPUs are much more streamlined. Rather than copying all the data from one buffer to another, the buffers can simply swap roles, where the backbuffer become the next frame’s front buffer. Also worth noting: modern GPUs don’t typically have dedicated blitting hardware and are able to handle a whole variety of complex operations beyond simply bulk-copying to a buffer, using more generalized shader cores which can perform tons of other operations in addition to simple blitting.

So, in other words, “blitting” is the classic term for transferring the final image. While you’re likely to hear that term a lot when studying 3D graphics, these days blitting is actually more of a subset of the presentation process.

That’s It!

So that’s how an arbitrary collection of 3D data gets rendered into a coherent flat image on your screen!

I’d like to write more about 3D graphics in the future. In the meantime, if these concepts are new to you and you found this interesting, you should download Blender and start learning how to use it!

Glossary of Terms

BSP: Binary Space Partitioning is an optimization technique used to divide a 3D into smaller subspaces, making it easier to cull non-visible geometry.
Blitting: A basic operation where pixel data are rapidly transferred from one buffer to another.
Fragment: A “potential pixel”, containing lots of information such as color, depth, texture coordinates, etc.
Interpolation: The process of estimating/blending values between two endpoint values (often 0–1). Often used to calculate intermediate values, especially colors, between vertices.
LoD: Level of Detail is another optimzation technique, which adjusts the amount of detail depending on an object’s distance from the viewer (camera). For instance, far-away meshes might switch to low-polycount versions.
Matrix: A mathematical object, used for perspective transformations. Matrices are used “under the hood” for all kinds of spatial operations, like rotation and scaling.
Mipmap: A set of pre-computed textures at progressively lower resolutions. Similar to the concept of LoD, the idea is to optimize performance by using lower-quality resources when something is too far away for much detail to be visible.
NDC: Normalized Device Coordinates are the coordinate space after the perspective division.
Normals: A “normal” is a vector which is perpendicular to a surface. So, they define the direction that a surface faces.
Primitive: A basic geometric shape, like lines, triangles, or even squares.
Raster: In short, rasterization converts vector-based graphics (like 3D vertices) into pixel-based graphics.
Rigging: The process of defining an internal “skeleton” structure, which allows a 3D model to be animated.
Shader: A small program which runs on the GPU. They usually take vertex or primitive information as an input, and output a different set of vertices or primitives. Several types of shaders can be combined within the pipeline.
Tesselation: Division of geometry into smaller, simpler shapes. This process allows the GPU to dynamically increase geometric detail.
UV (mapping): A method of mapping 2D images onto a 3D object. “U” and “V” are simply coordinate axes like “X” and “Y”.
Vertex/Vertices: An infinitesimal point in 3D space. Multiple vertices can be linked together, forming primitive shapes.