I got these few times coz of hardware fault.
1st time I got GPU memory controller overheat due thermal interface (thermal paste) damage (one spot of GPU was completely empty, but GPU reported normal temps), u can try to check it with lowering max GPU power/gpu speed and memory speed.
2nd time - ram and cpu config fault in bios, undetectable by normal test programs, but detectable by single-thread Prime95 (via CoreCycler for example). Had to push +4 on fastest core via CurveOptimizer, manually set both LLC on both SoC/CPU to High, and manually set few Memory timings, and fix memory voltage.