Language5 min read

The Pipeline

Understanding the GPU rendering pipeline and how BWSL shader blocks map to each stage.

Reading Time
5 min
Word Count
779
Sections
17
Try It Live

Pressure-test the syntax

Take the concept from this page into the playground and deliberately break a pass, binding, or type signature to see how the compiler responds.

Try a Live Edit

BWSL is designed around the modern GPU rasterization pipeline. Understanding this pipeline helps you write efficient shaders and know exactly where your code executes on the GPU.

The GPU Rasterization Pipeline

The GPU processes geometry through a series of fixed stages. BWSL gives you control over the programmable stages (highlighted below) while the GPU handles the fixed-function stages automatically.

GPU Rasterization Pipeline

Data flows from top to bottom

Input Assembly

Reads vertex data from buffers

Vertex Shader

PROGRAMMABLE

Transforms vertices to clip space

Primitive Assembly

Groups vertices into triangles

Clipping & Culling

Removes invisible geometry

Rasterization

Converts triangles to fragments

Fragment Shader

PROGRAMMABLE

Computes final pixel colors

Blending

Combines with framebuffer

Render Target

Output texture or screen

Pipeline Stages Explained

1. Input Assembly

The GPU reads vertex data from memory based on your attributes declaration:

bwsl
attributes {
position: float3 // Vertex positions
normal: float3 // Surface normals
texcoord: float2 // Texture coordinates
}

Each vertex is packaged with these attributes and sent to the vertex shader. The order and layout match what your engine provides via vertex buffers.

2. Vertex Shader (Programmable)

The vertex block in BWSL runs once per vertex. Its primary job is to transform vertices from model space to clip space:

bwsl
vertex {
// Transform to clip space (required)
output.position = resources.mvp * float4(attributes.position, 1.0);
// Pass data to the fragment shader
output.worldNormal = resources.normalMatrix * attributes.normal;
output.uv = attributes.texcoord;
}

The vertex shader must write to output.position. This clip-space position tells the GPU where to draw the vertex on screen. See Shader I/O for details on passing data between stages.

3. Primitive Assembly

After vertex processing, the GPU groups vertices into primitives (usually triangles). Three vertices become one triangle, ready for the next stages.

4. Clipping & Culling

The GPU automatically:

  • Clips triangles against the view frustum (removes parts outside the screen)
  • Culls back-facing triangles depending on the pipeline state your engine creates

These are fixed-function stages—you don't write code for them, but they happen automatically.

5. Rasterization

The rasterizer converts each triangle into fragments—potential pixels that might be drawn. For each fragment, the GPU:

  1. Determines which pixel it covers
  2. Interpolates all output values from the vertex shader across the triangle surface

Rasterizer Interpolation

v0
v1
v2
frag

Automatic interpolation — Values from output in your vertex shader are smoothly blended across the triangle surface.

Each fragment receives values based on its barycentric coordinates — its relative position within the triangle.

input.color = v0.color × w0 + v1.color × w1 + v2.color × w2

6. Early Depth/Stencil Test

Before running your fragment shader, the GPU can reject fragments that would be hidden behind already-drawn geometry. This optimization (called "early-Z") saves processing power.

7. Fragment Shader (Programmable)

The fragment block runs once per fragment (potential pixel). It receives interpolated data via input and computes the final color:

bwsl
fragment {
// Access interpolated values from vertex shader
float3 normal = normalize(input.worldNormal);
float2 uv = input.uv;
// Sample textures
float4 albedo = sample(resources.albedoMap, resources.albedoSampler, uv);
// Compute lighting
float light = max(dot(normal, resources.lightDir), 0.0);
// Write final color (required)
output.color = float4(albedo.rgb * light, 1.0);
}

8. Blending

The GPU combines your fragment's color with whatever is already in the render target. Common blend modes include:

  • Opaque: Replace existing color entirely
  • Alpha blend: Mix based on alpha for transparency
  • Additive: Add colors together for glow effects

Blend mode is configured by the host pipeline state, not in BWSL code.

9. Render Target

Final colors are written to render targets—textures or the screen. These can be used as inputs to subsequent passes.

How BWSL Maps to the Pipeline

BWSL provides a clean abstraction over the GPU pipeline:

BWSL ConstructPipeline StagePurpose
attributes { }Input AssemblyDefine vertex data layout
vertex { }Vertex ShaderTransform vertices, prepare interpolants
output.positionVertex OutputRequired clip-space position
output.*Rasterizer InputValues to interpolate per-fragment
input.*Fragment InputInterpolated values from vertex stage
fragment { }Fragment ShaderCompute final pixel color
output.colorFragment OutputColor written to render target
pass "Name" { }Full PipelineOne complete render pipeline execution

Multi-Pass Rendering

Real-world rendering often requires multiple passes through the pipeline. BWSL's pass construct represents one complete trip through the GPU pipeline.

Multi-Pass Rendering

Each pass is a complete pipeline execution

pass "MainPass"
V
vertex → transforms geometry
F
fragment → computes colors
Scene ColorDepth
pass "PostFX"
V
vertex → transforms geometry
F
fragment → computes colors
Final Output

Here's a complete deferred rendering example with three passes:

bwsl
pipeline DeferredRenderer {
attributes {
position: float3
normal: float3
texcoord: float2
}
resources {
mvp: mat4
model: mat4
normalMatrix: mat3
albedoMap: texture2D
gBufferAlbedo: texture2D
gBufferNormal: texture2D
gBufferPosition: texture2D
litScene: texture2D
linearSampler: sampler
}
pass "GBuffer" {
use attributes { position, normal, texcoord }
use resources { mvp, model, normalMatrix, albedoMap, linearSampler }
// First pass: Render geometry data to multiple targets
vertex {
output.position = resources.mvp * float4(attributes.position, 1.0);
output.worldPos = (resources.model * float4(attributes.position, 1.0)).xyz;
output.normal = resources.normalMatrix * attributes.normal;
output.uv = attributes.texcoord;
}
fragment {
// Write to G-buffer textures
output.color = sample(resources.albedoMap, resources.linearSampler, input.uv);
// Additional render targets are host pipeline outputs.
}
}
pass "Lighting" {
use attributes { position, texcoord }
use resources { gBufferAlbedo, gBufferNormal, gBufferPosition, linearSampler }
// Second pass: Fullscreen lighting calculation
vertex {
output.position = float4(attributes.position, 1.0);
output.uv = attributes.texcoord;
}
fragment {
// Read from G-buffer, compute lighting
float4 albedo = sample(resources.gBufferAlbedo, resources.linearSampler, input.uv);
float3 normal = sample(resources.gBufferNormal, resources.linearSampler, input.uv).xyz;
float3 worldPos = sample(resources.gBufferPosition, resources.linearSampler, input.uv).xyz;
// Lighting calculations...
output.color = float4(finalColor, 1.0);
}
}
pass "PostFX" {
use attributes { position, texcoord }
use resources { litScene, linearSampler }
// Third pass: Post-processing effects
vertex {
output.position = float4(attributes.position, 1.0);
output.uv = attributes.texcoord;
}
fragment {
float4 color = sample(resources.litScene, resources.linearSampler, input.uv);
// Vignette
float2 uv = input.uv * 2.0 - 1.0;
float vignette = 1.0 - dot(uv, uv) * 0.3;
output.color = float4(color.rgb * vignette, 1.0);
}
}
}

Each pass is a complete traversal of the pipeline:

  1. GBuffer: Renders 3D geometry, outputs multiple textures
  2. Lighting: Reads G-buffer, computes lighting as a fullscreen quad
  3. PostFX: Final fullscreen effects before display

Performance Considerations

Understanding the pipeline helps you write efficient shaders:

Vertex vs Fragment

Code in vertex { } runs once per vertex (typically thousands). Code in fragment { } runs once per pixel (potentially millions). Move calculations to the vertex shader when possible—interpolation is free.

bwsl
// Efficient: Calculate in vertex shader, interpolate for free
vertex {
output.position = resources.mvp * float4(attributes.position, 1.0);
output.worldPos = (resources.model * float4(attributes.position, 1.0)).xyz;
output.viewDir = normalize(resources.cameraPos - output.worldPos); // Per-vertex
}
fragment {
float3 viewDir = normalize(input.viewDir); // Just normalize the interpolated value
// ...
}

Pipeline Stalls to Avoid

  • Excessive texture samples: Each sample() call can stall waiting for memory
  • Dependent texture reads: Sampling at coordinates computed from another sample
  • Complex branching: GPUs prefer uniform execution across fragments
  • Discard overuse: Breaks early-Z optimization

Summary

The GPU pipeline is a production line for rendering:

  1. Input Assembly reads your vertex attributes
  2. Vertex Shader (vertex { }) transforms geometry and prepares data
  3. Rasterization converts triangles to fragments, interpolates data
  4. Fragment Shader (fragment { }) computes final pixel colors
  5. Blending combines with existing framebuffer content
  6. Output writes to render targets or screen

BWSL gives you direct control over the programmable stages (vertex and fragment) while the GPU handles the fixed-function stages automatically.

See Also

  • Shader I/O — How data flows between vertex and fragment shaders
  • Shader Variants — Specialize one pipeline into multiple compiled shader combinations
  • The Pass — Configuring individual render passes
  • Vertex Attributes — Defining vertex input data
  • Resources — Uniforms, textures, and buffers