Skip to content

Chunk optimisations#978

Open
Racc-Boi wants to merge 4 commits intosmartcmd:mainfrom
Racc-Boi:main
Open

Chunk optimisations#978
Racc-Boi wants to merge 4 commits intosmartcmd:mainfrom
Racc-Boi:main

Conversation

@Racc-Boi
Copy link

@Racc-Boi Racc-Boi commented Mar 8, 2026

Description

This PR introduces a multi-tiered optimization for the chunk rebuilding process. It replaces repeated branching logic with a pre-calculated lookup table and eliminates a redundant pre-calculation loop (removing a 3D boolean array). This significantly reduces memory bandwidth and CPU overhead on chunk rebuild threads, resulting in much faster terrain generation, quicker mesh building, and smoother rendering during player movement.

Changes

Previous Behavior

During chunk rebuilding, the engine utilized a two-pass system:

  1. Pass 1 (Occlusion): The engine iterated through a 16x16x16 grid, performing manual comparisons against a list of solid block types (e.g., stone, dirt, unbreakable, or ID 255) for all neighbors to determine if a block was fully surrounded. It then stored the result in a bool isOccluded[16][16][16] array.
  2. Pass 2 (Meshing): The engine looped through the chunk again. If a block wasn't occluded, it had to manually check the neighbors a second time to figure out exactly which faces to render.

Root Cause

  • Branching Overhead: Modern CPUs are slowed down by "branching" (multiple if or || conditions) when doing manual block ID comparisons.
  • Redundant Iterations & Memory: The engine was unnecessarily initializing and looping through a 4,096-element (16x16x16) array every time a chunk was rebuilt.
  • Double-checking Neighbors: The engine evaluated the exact same block-solidness logic twice per visible block (once to check occlusion, once to cull faces).

New Behavior

The engine now calculates occlusion purely inline using a single-pass system:

  • Lookup Table: A pre-calculated boolean array isOccluder[256] is used to fetch a "True/False" solidness result using the Tile ID as a memory index (an O(1) operation).
  • Inline Face Culling: The isOccluded array and the first iteration pass have been completely removed.
  • Reused Evaluations: The engine calculates face visibility (drawLeft, drawRight, etc.) directly. If any face is visible, it renders the block and instantly reuses those boolean checks to build the specific faces, completely eliminating redundant neighbor checks.

Fix Implementation

  • Tile Occlusion Lookup Table: Implemented the isOccluder[256] lookup table centrally.
  • Removed Redundant State: Deleted bool isOccluded[16][16][16], saving 4KB of memory initialization per chunk rebuild and removing an entire nested for loop.
  • Optimization Impact (Mathematical Breakdown):
    • Iteration: ~2,744 interior tiles per chunk.
    • Old Method (Two Passes):
      • Pass 1: 6 neighbor checks x 4 manual comparisons.
      • Pass 2: 6 neighbor checks x 4 manual comparisons.
      • Total: 2744 x 12 x 4 = 131,712 branch comparisons per chunk.
    • New Method (Single Pass + Lookup Table):
      • 6 face checks x 1 table lookup.
      • Total: 2744 x 6 x 1 = 16,464 table lookups per chunk.
    • Efficiency: This represents an approximate 87.5% reduction in operations required for neighbor-occlusion and face-culling checks per chunk.
  • Performance Gain: By slashing CPU cycles and removing nested loops, chunk meshes are generated in a fraction of the time. This drastically reduces CPU spikes on the rebuild threads, meaning chunks appear on-screen visibly faster when generating new terrain or loading into the world.

AI Use Disclosure

No AI was used

Related Issues

N/A

Racc-Boi added 2 commits March 8, 2026 21:30
Tesselator.h: Branchless min/max in Bounds via Min/Max; const ref on addBounds
Tesselator.cpp: clamp for color clamping; Min/Max initializer lists in packCompactQuad min/max finding and du/dv clamping
Chunk.cpp: bool[256] lookup table for occluder tile IDs in occlusion culling; [[unlikely]] hints on rebuild inner loop early-outs
LevelRenderer.cpp: __restrict + local caching in frustum clip(); [[likely]]/[[unlikely]] on chunk render loop continues
@eh-K
Copy link
Contributor

eh-K commented Mar 8, 2026

Amazing

@eh-K
Copy link
Contributor

eh-K commented Mar 8, 2026

Initial spawn does seem a bit quicker upon spawning in. The rendering while moving is about the same which is a shame, considering your optimizations.

Racc-Boi added 2 commits March 9, 2026 14:35
Chunk.cpp/h:
- thread_local occluder table via std::array<bool,256> IIFE; queries
  isSolidRender() for all 255 tile types instead of only stone/dirt/bedrock
- Null-check getChunkAt() early-out with correct COMPILED/NOTSKYLIT/EMPTY flags
- Defer Region + TileRenderer construction past empty-check to skip 9 chunk
  lookups and 128KB memset for empty chunks (common post-teleport)
- Replace Win32 TLS API (TlsAlloc/TlsSetValue/TlsGetValue) + new/delete
  with static thread_local array for per-thread tileIds storage

LevelRenderer.cpp:
- Rewrite updateDirtyChunks nearest-chunk search as linear ClipChunk scan;
  dirty-flag checked before any distance work, batch-clear empty dirty chunks
  upfront via isRenderChunkEmpty to shrink dirty set across frames
- const on all distance/flag locals

Region.h/cpp:
- Stack buffer flatChunks_stack[16] for common render-chunk regions (4x4),
  std::unique_ptr<LevelChunk*[]> heap fallback for large pathfinding regions;
  eliminates per-rebuild heap alloc/free on the hot path
- Deleted copy/move ops; std::fill_n for null-init; static constexpr constant

TileRenderer.h/cpp:
- static thread_local cache array replaces per-instance new/delete
- Remove dead getLightColorCount, cacheOwned, conditional delete[]
- static constexpr cache size; defaulted destructor

Level.cpp: Region constructed directly on stack (no copy)
stdafx.h: added <array>
@aetherwingz
Copy link

bump, this works.

@Racc-Boi
Copy link
Author

Racc-Boi commented Mar 9, 2026

Initial spawn does seem a bit quicker upon spawning in. The rendering while moving is about the same which is a shame, considering your optimizations.

more optimisations done now with noticable difference from my testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants