How to Scale Your Model


What you Learned

  • LLM scaling fundamentals

  • TPU and GPU architecture basics

  • Roofline analysis

  • Compute vs Memory vs Communication bottlenecks

  • Strong scaling concepts

  • Transformer architecture internals

  • Transformer FLOPs and parameter calculation

  • Multi-GPU/TPU training

  • Model parallelism techniques

  • Data parallelism

  • Tensor parallelism

  • Pipeline parallelism

  • Expert parallelism (MoE)

  • Model sharding and FSDP

  • ZeRO optimization

  • Gradient accumulation

  • Rematerialization (checkpointing)

  • LLM training cost estimation

  • LLM inference optimization

  • KV Cache management

  • Latency vs Throughput trade-offs

  • LLaMA 3 training and serving

  • TPU networking and communication

  • JAX for large-scale AI

  • TPU/GPU profiling and debugging

  • Hardware-aware AI system design

Learned how large language models are trained, scaled, parallelized, optimized, and served efficiently across thousands of GPUs/TPUs.