The Pipeline
Understanding the GPU rendering pipeline and how BWSL shader blocks map to each stage.
Pressure-test the syntax
Take the concept from this page into the playground and deliberately break a pass, binding, or type signature to see how the compiler responds.
Try a Live EditBWSL is designed around the modern GPU rasterization pipeline. Understanding this pipeline helps you write efficient shaders and know exactly where your code executes on the GPU.
The GPU Rasterization Pipeline
The GPU processes geometry through a series of fixed stages. BWSL gives you control over the programmable stages (highlighted below) while the GPU handles the fixed-function stages automatically.
GPU Rasterization Pipeline
Data flows from top to bottom
Input Assembly
Reads vertex data from buffers
Vertex Shader
PROGRAMMABLETransforms vertices to clip space
Primitive Assembly
Groups vertices into triangles
Clipping & Culling
Removes invisible geometry
Rasterization
Converts triangles to fragments
Fragment Shader
PROGRAMMABLEComputes final pixel colors
Blending
Combines with framebuffer
Render Target
Output texture or screen
Pipeline Stages Explained
1. Input Assembly
The GPU reads vertex data from memory based on your attributes declaration:
attributes {
position: float3 // Vertex positions
normal: float3 // Surface normals
texcoord: float2 // Texture coordinates
}
Each vertex is packaged with these attributes and sent to the vertex shader. The order and layout match what your engine provides via vertex buffers.
2. Vertex Shader (Programmable)
The vertex block in BWSL runs once per vertex. Its primary job is to transform vertices from model space to clip space:
vertex {
// Transform to clip space (required)
output.position = resources.mvp * float4(attributes.position, 1.0);
// Pass data to the fragment shader
output.worldNormal = resources.normalMatrix * attributes.normal;
output.uv = attributes.texcoord;
}
The vertex shader must write to output.position. This clip-space position tells the GPU where to draw the vertex on screen. See Shader I/O for details on passing data between stages.
3. Primitive Assembly
After vertex processing, the GPU groups vertices into primitives (usually triangles). Three vertices become one triangle, ready for the next stages.
4. Clipping & Culling
The GPU automatically:
- Clips triangles against the view frustum (removes parts outside the screen)
- Culls back-facing triangles depending on the pipeline state your engine creates
These are fixed-function stages—you don't write code for them, but they happen automatically.
5. Rasterization
The rasterizer converts each triangle into fragments—potential pixels that might be drawn. For each fragment, the GPU:
- Determines which pixel it covers
- Interpolates all
outputvalues from the vertex shader across the triangle surface
Rasterizer Interpolation
Automatic interpolation — Values from output in your vertex shader are smoothly blended across the triangle surface.
Each fragment receives values based on its barycentric coordinates — its relative position within the triangle.
input.color = v0.color × w0 + v1.color × w1 + v2.color × w26. Early Depth/Stencil Test
Before running your fragment shader, the GPU can reject fragments that would be hidden behind already-drawn geometry. This optimization (called "early-Z") saves processing power.
7. Fragment Shader (Programmable)
The fragment block runs once per fragment (potential pixel). It receives interpolated data via input and computes the final color:
fragment {
// Access interpolated values from vertex shader
float3 normal = normalize(input.worldNormal);
float2 uv = input.uv;
// Sample textures
float4 albedo = sample(resources.albedoMap, resources.albedoSampler, uv);
// Compute lighting
float light = max(dot(normal, resources.lightDir), 0.0);
// Write final color (required)
output.color = float4(albedo.rgb * light, 1.0);
}
8. Blending
The GPU combines your fragment's color with whatever is already in the render target. Common blend modes include:
- Opaque: Replace existing color entirely
- Alpha blend: Mix based on alpha for transparency
- Additive: Add colors together for glow effects
Blend mode is configured by the host pipeline state, not in BWSL code.
9. Render Target
Final colors are written to render targets—textures or the screen. These can be used as inputs to subsequent passes.
How BWSL Maps to the Pipeline
BWSL provides a clean abstraction over the GPU pipeline:
| BWSL Construct | Pipeline Stage | Purpose |
|---|---|---|
attributes { } | Input Assembly | Define vertex data layout |
vertex { } | Vertex Shader | Transform vertices, prepare interpolants |
output.position | Vertex Output | Required clip-space position |
output.* | Rasterizer Input | Values to interpolate per-fragment |
input.* | Fragment Input | Interpolated values from vertex stage |
fragment { } | Fragment Shader | Compute final pixel color |
output.color | Fragment Output | Color written to render target |
pass "Name" { } | Full Pipeline | One complete render pipeline execution |
Multi-Pass Rendering
Real-world rendering often requires multiple passes through the pipeline. BWSL's pass construct represents one complete trip through the GPU pipeline.
Multi-Pass Rendering
Each pass is a complete pipeline execution
Here's a complete deferred rendering example with three passes:
pipeline DeferredRenderer {
attributes {
position: float3
normal: float3
texcoord: float2
}
resources {
mvp: mat4
model: mat4
normalMatrix: mat3
albedoMap: texture2D
gBufferAlbedo: texture2D
gBufferNormal: texture2D
gBufferPosition: texture2D
litScene: texture2D
linearSampler: sampler
}
pass "GBuffer" {
use attributes { position, normal, texcoord }
use resources { mvp, model, normalMatrix, albedoMap, linearSampler }
// First pass: Render geometry data to multiple targets
vertex {
output.position = resources.mvp * float4(attributes.position, 1.0);
output.worldPos = (resources.model * float4(attributes.position, 1.0)).xyz;
output.normal = resources.normalMatrix * attributes.normal;
output.uv = attributes.texcoord;
}
fragment {
// Write to G-buffer textures
output.color = sample(resources.albedoMap, resources.linearSampler, input.uv);
// Additional render targets are host pipeline outputs.
}
}
pass "Lighting" {
use attributes { position, texcoord }
use resources { gBufferAlbedo, gBufferNormal, gBufferPosition, linearSampler }
// Second pass: Fullscreen lighting calculation
vertex {
output.position = float4(attributes.position, 1.0);
output.uv = attributes.texcoord;
}
fragment {
// Read from G-buffer, compute lighting
float4 albedo = sample(resources.gBufferAlbedo, resources.linearSampler, input.uv);
float3 normal = sample(resources.gBufferNormal, resources.linearSampler, input.uv).xyz;
float3 worldPos = sample(resources.gBufferPosition, resources.linearSampler, input.uv).xyz;
// Lighting calculations...
output.color = float4(finalColor, 1.0);
}
}
pass "PostFX" {
use attributes { position, texcoord }
use resources { litScene, linearSampler }
// Third pass: Post-processing effects
vertex {
output.position = float4(attributes.position, 1.0);
output.uv = attributes.texcoord;
}
fragment {
float4 color = sample(resources.litScene, resources.linearSampler, input.uv);
// Vignette
float2 uv = input.uv * 2.0 - 1.0;
float vignette = 1.0 - dot(uv, uv) * 0.3;
output.color = float4(color.rgb * vignette, 1.0);
}
}
}
Each pass is a complete traversal of the pipeline:
- GBuffer: Renders 3D geometry, outputs multiple textures
- Lighting: Reads G-buffer, computes lighting as a fullscreen quad
- PostFX: Final fullscreen effects before display
Performance Considerations
Understanding the pipeline helps you write efficient shaders:
Vertex vs Fragment
Code in vertex { } runs once per vertex (typically thousands). Code in fragment { } runs once per pixel (potentially millions). Move calculations to the vertex shader when possible—interpolation is free.
// Efficient: Calculate in vertex shader, interpolate for free
vertex {
output.position = resources.mvp * float4(attributes.position, 1.0);
output.worldPos = (resources.model * float4(attributes.position, 1.0)).xyz;
output.viewDir = normalize(resources.cameraPos - output.worldPos); // Per-vertex
}
fragment {
float3 viewDir = normalize(input.viewDir); // Just normalize the interpolated value
// ...
}
Pipeline Stalls to Avoid
- Excessive texture samples: Each
sample()call can stall waiting for memory - Dependent texture reads: Sampling at coordinates computed from another sample
- Complex branching: GPUs prefer uniform execution across fragments
- Discard overuse: Breaks early-Z optimization
Summary
The GPU pipeline is a production line for rendering:
- Input Assembly reads your vertex attributes
- Vertex Shader (
vertex { }) transforms geometry and prepares data - Rasterization converts triangles to fragments, interpolates data
- Fragment Shader (
fragment { }) computes final pixel colors - Blending combines with existing framebuffer content
- Output writes to render targets or screen
BWSL gives you direct control over the programmable stages (vertex and fragment) while the GPU handles the fixed-function stages automatically.
See Also
- Shader I/O — How data flows between vertex and fragment shaders
- Shader Variants — Specialize one pipeline into multiple compiled shader combinations
- The Pass — Configuring individual render passes
- Vertex Attributes — Defining vertex input data
- Resources — Uniforms, textures, and buffers