Cheng (@zcbenz)
2025-04-11 | โค๏ธ 203 | ๐ 18
It is wild that with GPU programing you start measuring performance in units of ยตs. I wrote some notes on how I tracked down some of my suboptimal code that delayed the kernel execution for 40ยตs but made whole program 4x slower. https://github.com/ml-explore/mlx/pull/1983#issuecomment-2781855436
๐ ์๋ณธ ๋งํฌ
๐ Related
Auto-generated - needs manual review