CS231n: Deep Learning for Computer Vision

Stanford - Spring 2026

What You'll Learn

Official Source

Deep Learning Fundamentals
  • Data-driven machine learning

  • K-Nearest Neighbors (KNN)

  • Linear classifiers

  • Softmax loss

  • Regularization techniques

  • Gradient Descent & Stochastic Gradient Descent (SGD)

  • Optimization algorithms: Momentum, AdaGrad, Adam

  • Learning rate scheduling

Neural Networks
  • Multi-Layer Perceptrons (MLPs)

  • Backpropagation

  • Computing gradients efficiently

  • Training deep neural networks

Computer Vision with CNNs
  • Convolutional Neural Networks (CNNs)

  • Convolution and pooling operations

  • Feature extraction from images

  • Transfer learning

  • Batch Normalization

  • Famous architectures:

    • AlexNet

    • VGGNet

    • ResNet

    • GoogLeNet

Sequence Models & NLP
  • Recurrent Neural Networks (RNNs)

  • LSTM and GRU

  • Language Modeling

  • Image Captioning

  • Sequence-to-Sequence Models

Transformers & Attention
  • Self-Attention Mechanism
  • Transformer Architecture

  • Vision Transformers (ViT)

  • Modern foundation models

Advanced Computer Vision
  • Object Detection

  • Image Segmentation

    • Semantic Segmentation

    • Instance Segmentation

    • Panoptic Segmentation

  • YOLO and R-CNN family

  • DETR (Transformer-based Detection)

Video Understanding
  • Video Classification

  • 3D CNNs

  • Two-Stream Networks

  • Multimodal Video Analysis

Large-Scale AI Training
  • Distributed Training

  • Model Parallelism

  • Data Parallelism

  • Activation Checkpointing

  • GPU Utilization Optimization

Self-Supervised Learning
  • Contrastive Learning

  • Pretext Tasks

  • Learning without labels

  • DINO and modern SSL methods

Generative AI
  • Variational Autoencoders (VAEs)

  • Generative Adversarial Networks (GANs)

  • Autoregressive Models

  • Diffusion Models (used in modern image generators)

3D Vision
  • 3D Shape Representations

  • Shape Reconstruction

  • Neural Implicit Representations

  • Scene Understanding

Vision + Language
  • Connecting images and text

  • Multimodal AI systems

  • Foundations of systems like image-captioning and visual assistants

World Models & Future AI
  • World Modeling

  • Environment Understanding

  • Agent-based AI concepts

Responsible AI
  • Human-Centered AI

  • AI ethics

  • Building AI systems that work well with humans

Practical Skills
  • Python for Deep Learning

  • NumPy

  • PyTorch

  • Training and debugging neural networks

  • Building an end-to-end computer vision project

  • Research paper reading and implementation