Cheng (@zcbenz)

2025-04-11 | โค๏ธ 203 | ๐Ÿ” 18


It is wild that with GPU programing you start measuring performance in units of ยตs. I wrote some notes on how I tracked down some of my suboptimal code that delayed the kernel execution for 40ยตs but made whole program 4x slower. https://github.com/ml-explore/mlx/pull/1983#issuecomment-2781855436

๐Ÿ”— ์›๋ณธ ๋งํฌ


Auto-generated - needs manual review

Tags

domain-ai-ml domain-dev-tools domain-visionos