Contemporary M1 / M2 / M3 / M4 machines from Apple have (at least) four different ways for low-level programmers to perform heavy computations: 1. Standard ARMv8 SIMD/NEON vector instructions on CPU cores (128 bits wide, issue up to four per cycle on Firestorm) 2. Apple's undocumented AMX…