Hey everyone! I've been working on a matrix multiplication kernel and I'd love your help testing it out to gather performance metrics across different devices. Most of my testing has been on my M2, so I'm curious if I might have over-optimized for that architecture.
I believe this is the fastest pure WGSL web shader I've come across, although I haven't searched extensively. If you know of any better implementations, please share! Note that this kernel requires matrices to be 128-bit aligned, so some padding is necessary, but in my opinion, that's a fair trade-off for speed.
If you decide to test it, could you please post the fastest multiplication time you observe in the console or share the complete output along with your graphics card details? The site runs about ten times for warmup, so keep that in mind. I'm also open to any suggestions on how I could improve the implementation!
This project started because I want to create a neural network for a reinforcement learning agent to solve a Rubik's cube—it's been a fun adventure!
Here's the link to the GitHub pages: https://mukoroor.github.io/Puzzles/
2 Answers
Have you thought about letting the website handle the padding during multiplication? It might simplify the process for users. By the way, I opened your linked page, but it appears to be empty on my Firefox mobile browser.
You can check out this link for implementation status on WebGPU: https://github.com/gpuweb/gpuweb/wiki/Implementation-Status#implementation-status.
Could you clarify what you mean by 'linear transformation multiplication kernel'? I'm not sure I follow what you’re asking about that!
I was just referring to the kind of kernel used for multiplying matrices in linear transformations.
Yeah, I think the output actually shows up in the console. Also, keep in mind that Firefox might not fully support WebGPU yet.