I'm preparing for an in-domain system design interview and the recruiter told me that part of it would be about how key AI model classes (mostly GenAI, RecSys and ranking) behave when parallelised over such an AI infrastructure, including communication primitives, potential bottlenecks etc.
I'm not very familiar with this side of ML and I would appreciate any useful resources for my level. I know DL and ML very well so that's not an issue. I'm rather more concerned with the other stuff. Example questions are optimizing a cluster of GPUs for training an ML model or designing and serving an LLM.
submitted by /u/ready_eddi
[comments]
Source link