CS109: Introduction to Probability for Computer Scientists

What You'll Learn

Official Source

CS109 is one of Stanford University's most respected courses in probability, statistics, and machine learning. Created by Chris Piech with contributions from leading Stanford faculty including Mehran Sahami, the course is designed to help students understand how uncertainty, data, and intelligent decision-making systems work. It begins with mathematical foundations and gradually builds toward modern machine learning applications, allowing students to see how probability theory powers many of today's AI technologies.

The course starts with combinatorics, the mathematics of counting. Although counting may seem basic, it is one of the most important foundations of probability. Students learn how to calculate the number of possible outcomes in different situations using techniques such as permutations and combinations. These concepts are useful in computer science, cryptography, search algorithms, optimization problems, and game theory.

Key concepts learned in counting:

  • Fundamental counting principles

  • Permutations

  • Combinations

  • Arrangements and selections

  • Probability through counting

  • Real-world counting applications

After establishing a foundation in counting, students move into probability theory. Probability provides a mathematical framework for dealing with uncertainty. Since most real-world systems involve incomplete information, understanding probability is essential for data science, machine learning, artificial intelligence, finance, and scientific research.

Students learn the basic axioms of probability and discover how probabilities can be combined, manipulated, and interpreted. They gain an understanding of events, sample spaces, probability laws, and mathematical reasoning under uncertainty.

One of the most important topics in the course is conditional probability. Students learn that probabilities can change when additional information becomes available. This idea is extremely important because intelligent systems constantly update their predictions as new data arrives.

For example, a medical diagnosis system changes its prediction when new test results become available. Recommendation systems update suggestions when users interact with products. Search engines adjust rankings based on user behavior. All of these rely on conditional probability.

Students learn how Bayes' Theorem updates beliefs using new evidence and forms the basis for many AI applications.

Applications of Bayes' Theorem:

  • Spam filtering

  • Medical diagnosis

  • Fraud detection

  • Recommendation systems

  • Machine learning classification

  • Risk prediction

The course also explores independence. Students learn how to determine whether events influence one another and how independence simplifies probability calculations. Understanding independence is critical because many statistical models assume variables behave independently.

The next major topic introduces random variables. Random variables allow uncertain outcomes to be represented numerically. Instead of thinking only about events, students learn to analyze measurable quantities mathematically.

Topics include:

  • Discrete random variables

  • Continuous random variables

  • Probability mass functions

  • Probability density functions

  • Cumulative distribution functions

Students then study expectation, often called the expected value. Expectation represents the long-run average outcome of a random process. This concept appears everywhere in machine learning, economics, finance, and decision-making systems.

Closely related to expectation is variance, which measures how much values deviate from the average. Students discover that understanding variability is just as important as understanding averages.

Important statistical measures learned:

  • Mean

  • Expectation

  • Variance

  • Standard deviation

  • Distribution spread

  • Risk measurement

The course next introduces probability distributions. These distributions model different types of uncertainty and are fundamental building blocks of machine learning.

Students begin with the Bernoulli distribution, which models simple yes-or-no outcomes. Building on this foundation, they study the Binomial distribution, which models repeated trials.

Examples include:

  • Coin flips

  • Product quality tests

  • Customer conversions

  • Survey responses

Students then explore a variety of discrete probability distributions and learn how each can model different types of real-world data.

The course subsequently transitions into continuous random variables. Unlike discrete variables, continuous variables can take infinitely many values.

Examples include:

  • Height

  • Weight

  • Temperature

  • Time

  • Income

  • Distance

One of the most important topics in the course is the Normal Distribution. Students learn why the familiar bell curve appears throughout nature, science, business, and machine learning.

Why the Normal Distribution matters:

  • Models natural variation

  • Supports statistical inference

  • Used in hypothesis testing

  • Appears in machine learning algorithms

  • Forms the basis of many predictive models

The curriculum then becomes more advanced by introducing joint distributions. Students learn how multiple random variables interact with one another. This allows them to analyze more realistic systems where several factors influence outcomes simultaneously.

The course further explores conditional distributions, teaching students how probability changes when specific information is already known. This concept strengthens their understanding of predictive reasoning and uncertainty management.

Students are also introduced to the Beta Distribution, an important distribution in Bayesian statistics. It provides a flexible way to represent uncertainty about probabilities themselves and plays a major role in modern machine learning methods.

Another significant section focuses on covariance and correlation.

Students learn:

  • Measuring relationships between variables

  • Positive correlation

  • Negative correlation

  • No correlation

  • Linear dependence

  • Data pattern discovery

These tools help identify hidden relationships within datasets and form the basis for many predictive analytics techniques.

Conditional expectation is another sophisticated concept covered in the course. Students learn how expected values change when certain information is known beforehand. This topic is important in economics, finance, reinforcement learning, and statistical modeling.

One of the most powerful theoretical ideas presented is the Central Limit Theorem. Students discover that when many random variables are combined, their sum often follows a normal distribution regardless of the original distribution.

Key insights from the Central Limit Theorem:

  • Explains why normal distributions appear frequently

  • Enables statistical inference

  • Supports confidence intervals

  • Forms the foundation of modern statistics

  • Essential for data science

After building a strong foundation in probability, the course shifts toward machine learning. This transition helps students understand how mathematical theories directly power intelligent systems.

The machine learning section begins with parameters and learning. Students learn how machine learning models use data to automatically adjust internal parameters and improve predictions.

Topics include:

  • Training models

  • Learning from data

  • Model fitting

  • Prediction

  • Generalization

  • Performance evaluation

Students then study Maximum Likelihood Estimation (MLE), one of the most important techniques in machine learning.

MLE teaches students how to:

  • Estimate model parameters

  • Fit statistical models

  • Optimize predictions

  • Analyze uncertainty

The course also covers Maximum A Posteriori (MAP) estimation, which extends likelihood-based methods by incorporating prior knowledge into learning.

This leads naturally into Naive Bayes, one of the most famous probabilistic machine learning algorithms.

Students learn how Naive Bayes can:

  • Classify emails

  • Detect spam

  • Categorize documents

  • Predict outcomes

  • Perform efficient classification

Although simple, Naive Bayes remains widely used because of its speed and effectiveness.

Logistic Regression is another major topic covered in the machine learning section. Students learn how to predict probabilities and perform classification tasks.

Applications of Logistic Regression:

  • Customer churn prediction

  • Disease diagnosis

  • Credit risk assessment

  • Marketing analytics

  • Fraud detection

  • User behavior prediction

The course also provides an introduction to Deep Learning. Students gain exposure to neural networks and learn how modern AI systems recognize patterns in large datasets.

Deep learning topics include:

  • Neural networks

  • Hidden layers

  • Learning representations

  • Pattern recognition

  • Modern AI systems

  • Large-scale prediction

Although the course does not focus exclusively on deep learning, it helps students understand how probability and statistics support today's most advanced AI technologies.

The final machine learning lessons emphasize practical implementation. Students learn how to apply algorithms to real-world problems, evaluate model performance, interpret results, and understand the strengths and limitations of different approaches.

Throughout the course, students develop valuable analytical and computational skills, including:

Technical skills gained:

  • Statistical reasoning

  • Probability modeling

  • Data analysis

  • Machine learning fundamentals

  • Predictive modeling

  • Mathematical thinking

  • Data interpretation

  • Decision-making under uncertainty

Machine learning skills gained:

  • Naive Bayes

  • Logistic Regression

  • Parameter estimation

  • Model evaluation

  • Classification techniques

  • Predictive analytics

  • Applied machine learning

By the end of CS109, students possess a strong understanding of probability theory, statistical inference, machine learning fundamentals, and data-driven decision-making. They learn how concepts such as counting, Bayes' Theorem, probability distributions, expectation, variance, correlation, and conditional reasoning directly contribute to modern artificial intelligence systems. The course serves as an excellent foundation for advanced studies in machine learning, data science, artificial intelligence, statistics, and computer science while teaching students how to think rigorously about uncertainty, evidence, and prediction in the real world.