The Pipeline

BWSL is designed around the modern GPU rasterization pipeline. Understanding this pipeline helps you write efficient shaders and know exactly where your code executes on the GPU.

The GPU Rasterization Pipeline

The GPU processes geometry through a series of fixed stages. BWSL gives you control over the programmable stages (highlighted below) while the GPU handles the fixed-function stages automatically.

GPU Rasterization Pipeline

Data flows from top to bottom

Programmable

Fixed Function

Input Assembly

Reads vertex data from buffers

attributes { }

Vertex Shader

PROGRAMMABLE

Transforms vertices to clip space

vertex { }

Primitive Assembly

Groups vertices into triangles

Clipping & Culling

Removes invisible geometry

Rasterization

Converts triangles to fragments

Fragment Shader

PROGRAMMABLE

Computes final pixel colors

fragment { }

Blending

Combines with framebuffer

Render Target

Output texture or screen

output.color

Pipeline Stages Explained

1. Input Assembly

The GPU reads vertex data from memory based on your attributes declaration:

bwsl

attributes {
    position: float3    // Vertex positions
    normal: float3      // Surface normals
    texcoord: float2    // Texture coordinates
}

Each vertex is packaged with these attributes and sent to the vertex shader. The order and layout match what your engine provides via vertex buffers.

2. Vertex Shader (Programmable)

The vertex block in BWSL runs once per vertex. Its primary job is to transform vertices from model space to clip space:

bwsl

vertex {
    // Transform to clip space (required)
    output.position = resources.mvp * float4(attributes.position, 1.0);

    // Pass data to the fragment shader
    output.worldNormal = resources.normalMatrix * attributes.normal;
    output.uv = attributes.texcoord;
}

The vertex shader must write to output.position. This clip-space position tells the GPU where to draw the vertex on screen. See Shader I/O for details on passing data between stages.

3. Primitive Assembly

After vertex processing, the GPU groups vertices into primitives (usually triangles). Three vertices become one triangle, ready for the next stages.

4. Clipping & Culling

The GPU automatically:

Clips triangles against the view frustum (removes parts outside the screen)
Culls back-facing triangles depending on the pipeline state your engine creates

These are fixed-function stages—you don't write code for them, but they happen automatically.

5. Rasterization

The rasterizer converts each triangle into fragments—potential pixels that might be drawn. For each fragment, the GPU:

Determines which pixel it covers
Interpolates all output values from the vertex shader across the triangle surface

Rasterizer Interpolation

frag

Automatic interpolation — Values from output in your vertex shader are smoothly blended across the triangle surface.

Each fragment receives values based on its barycentric coordinates — its relative position within the triangle.

input.color = v0.color × w0 + v1.color × w1 + v2.color × w2

6. Early Depth/Stencil Test

Before running your fragment shader, the GPU can reject fragments that would be hidden behind already-drawn geometry. This optimization (called "early-Z") saves processing power.

7. Fragment Shader (Programmable)

The fragment block runs once per fragment (potential pixel). It receives interpolated data via input and computes the final color:

bwsl

fragment {
    // Access interpolated values from vertex shader
    float3 normal = normalize(input.worldNormal);
    float2 uv = input.uv;

    // Sample textures
    float4 albedo = sample(resources.albedoMap, resources.albedoSampler, uv);

    // Compute lighting
    float light = max(dot(normal, resources.lightDir), 0.0);

    // Write final color (required)
    output.color = float4(albedo.rgb * light, 1.0);
}

8. Blending

The GPU combines your fragment's color with whatever is already in the render target. Common blend modes include:

Opaque: Replace existing color entirely
Alpha blend: Mix based on alpha for transparency
Additive: Add colors together for glow effects

Blend mode is configured by the host pipeline state, not in BWSL code.

9. Render Target

Final colors are written to render targets—textures or the screen. These can be used as inputs to subsequent passes.

How BWSL Maps to the Pipeline

BWSL provides a clean abstraction over the GPU pipeline:

BWSL Construct	Pipeline Stage	Purpose
`attributes { }`	Input Assembly	Define vertex data layout
`vertex { }`	Vertex Shader	Transform vertices, prepare interpolants
`output.position`	Vertex Output	Required clip-space position
`output.*`	Rasterizer Input	Values to interpolate per-fragment
`input.*`	Fragment Input	Interpolated values from vertex stage
`fragment { }`	Fragment Shader	Compute final pixel color
`output.color`	Fragment Output	Color written to render target
`pass "Name" { }`	Full Pipeline	One complete render pipeline execution

Multi-Pass Rendering

Real-world rendering often requires multiple passes through the pipeline. BWSL's pass construct represents one complete trip through the GPU pipeline.

Multi-Pass Rendering

Each pass is a complete pipeline execution

pass "MainPass"

vertex → transforms geometry

fragment → computes colors

Scene ColorDepth

pass "PostFX"

vertex → transforms geometry

fragment → computes colors

Final Output

Here's a complete deferred rendering example with three passes:

bwsl

pipeline DeferredRenderer {
    attributes {
        position: float3
        normal: float3
        texcoord: float2
    }

    resources {
        mvp: mat4
        model: mat4
        normalMatrix: mat3
        albedoMap: texture2D
        gBufferAlbedo: texture2D
        gBufferNormal: texture2D
        gBufferPosition: texture2D
        litScene: texture2D
        linearSampler: sampler
    }

    pass "GBuffer" {
        use attributes { position, normal, texcoord }
        use resources { mvp, model, normalMatrix, albedoMap, linearSampler }

        // First pass: Render geometry data to multiple targets
        vertex {
            output.position = resources.mvp * float4(attributes.position, 1.0);
            output.worldPos = (resources.model * float4(attributes.position, 1.0)).xyz;
            output.normal = resources.normalMatrix * attributes.normal;
            output.uv = attributes.texcoord;
        }

        fragment {
            // Write to G-buffer textures
            output.color = sample(resources.albedoMap, resources.linearSampler, input.uv);
            // Additional render targets are host pipeline outputs.
        }
    }

    pass "Lighting" {
        use attributes { position, texcoord }
        use resources { gBufferAlbedo, gBufferNormal, gBufferPosition, linearSampler }

        // Second pass: Fullscreen lighting calculation
        vertex {
            output.position = float4(attributes.position, 1.0);
            output.uv = attributes.texcoord;
        }

        fragment {
            // Read from G-buffer, compute lighting
            float4 albedo = sample(resources.gBufferAlbedo, resources.linearSampler, input.uv);
            float3 normal = sample(resources.gBufferNormal, resources.linearSampler, input.uv).xyz;
            float3 worldPos = sample(resources.gBufferPosition, resources.linearSampler, input.uv).xyz;

            // Lighting calculations...
            output.color = float4(finalColor, 1.0);
        }
    }

    pass "PostFX" {
        use attributes { position, texcoord }
        use resources { litScene, linearSampler }

        // Third pass: Post-processing effects
        vertex {
            output.position = float4(attributes.position, 1.0);
            output.uv = attributes.texcoord;
        }

        fragment {
            float4 color = sample(resources.litScene, resources.linearSampler, input.uv);

            // Vignette
            float2 uv = input.uv * 2.0 - 1.0;
            float vignette = 1.0 - dot(uv, uv) * 0.3;

            output.color = float4(color.rgb * vignette, 1.0);
        }
    }
}

Each pass is a complete traversal of the pipeline:

GBuffer: Renders 3D geometry, outputs multiple textures
Lighting: Reads G-buffer, computes lighting as a fullscreen quad
PostFX: Final fullscreen effects before display

Performance Considerations

Understanding the pipeline helps you write efficient shaders:

Vertex vs Fragment

Code in vertex { } runs once per vertex (typically thousands). Code in fragment { } runs once per pixel (potentially millions). Move calculations to the vertex shader when possible—interpolation is free.

bwsl

// Efficient: Calculate in vertex shader, interpolate for free
vertex {
    output.position = resources.mvp * float4(attributes.position, 1.0);
    output.worldPos = (resources.model * float4(attributes.position, 1.0)).xyz;
    output.viewDir = normalize(resources.cameraPos - output.worldPos);  // Per-vertex
}

fragment {
    float3 viewDir = normalize(input.viewDir);  // Just normalize the interpolated value
    // ...
}

Pipeline Stalls to Avoid

Excessive texture samples: Each sample() call can stall waiting for memory
Dependent texture reads: Sampling at coordinates computed from another sample
Complex branching: GPUs prefer uniform execution across fragments
Discard overuse: Breaks early-Z optimization

Summary

The GPU pipeline is a production line for rendering:

Input Assembly reads your vertex attributes
Vertex Shader (vertex { }) transforms geometry and prepares data
Rasterization converts triangles to fragments, interpolates data
Fragment Shader (fragment { }) computes final pixel colors
Blending combines with existing framebuffer content
Output writes to render targets or screen

BWSL gives you direct control over the programmable stages (vertex and fragment) while the GPU handles the fixed-function stages automatically.

Pressure-test the syntax

The GPU Rasterization Pipeline

GPU Rasterization Pipeline

Input Assembly

Vertex Shader

Primitive Assembly

Clipping & Culling

Rasterization

Fragment Shader

Blending

Render Target

Pipeline Stages Explained

1. Input Assembly

2. Vertex Shader (Programmable)

3. Primitive Assembly

4. Clipping & Culling

5. Rasterization

Rasterizer Interpolation

6. Early Depth/Stencil Test

7. Fragment Shader (Programmable)

8. Blending

9. Render Target

How BWSL Maps to the Pipeline

Multi-Pass Rendering

Multi-Pass Rendering

Performance Considerations

Pipeline Stalls to Avoid

Summary

See Also