Can this project help for you? https://github.com/philipturner/metal-flash-attention So far, metal-flash-attention can indeed provide the fastest generation speed for stable diffusion on MacOS.