Could you explain why the SPD GLSL version only uses one barrier when cutting off all workgroups except the last?
// Only last active workgroup should proceed
bool SpdExitWorkgroup(AU1 numWorkGroups, AU1 localInvocationIndex, AU1 slice)
{
// global atomic counter
if (localInvocationIndex == 0)
{
SpdIncreaseAtomicCounter(slice);
}
SpdWorkgroupShuffleBarrier();
return (SpdGetAtomicCounter() != (numWorkGroups - 1));
}
void SpdWorkgroupShuffleBarrier() {
#ifdef A_GLSL
barrier();
#endif
#ifdef A_HLSL
GroupMemoryBarrierWithGroupSync();
#endif
}
According to the GLSL specification, there should be pair of calls: barrier() + memoryBarrierImage()
It can't be that AMD and all users of the algorithm allow UB.
The direct link to official SPD snippet
Important notes:
- There is VRAM memory for mip 5 which should be synchronized before running last single workgroup. The
barrier() call is not enough i think. We need additional memoryBarrierImage().
- The mip 5 image has
coherent specifier but the spec description is very hazy for me.
- I can't imagine any example with
coherent but without memoryBarrierImage()
Side note: HLSL code would need additional sync (DeviceMemoryBarrier?) as well.
Could you explain why the SPD GLSL version only uses one barrier when cutting off all workgroups except the last?
According to the GLSL specification, there should be pair of calls:
barrier() + memoryBarrierImage()It can't be that AMD and all users of the algorithm allow UB.
The direct link to official SPD snippet
Important notes:
barrier()call is not enough i think. We need additionalmemoryBarrierImage().coherentspecifier but the spec description is very hazy for me.coherentbut withoutmemoryBarrierImage()Side note: HLSL code would need additional sync (
DeviceMemoryBarrier?) as well.