CS109: Introduction to Probability for Computer Scientists
What You'll Learn
Official Source
CS109 is one of Stanford University's most respected courses in probability, statistics, and machine learning. Created by Chris Piech with contributions from leading Stanford faculty including Mehran Sahami, the course is designed to help students understand how uncertainty, data, and intelligent decision-making systems work. It begins with mathematical foundations and gradually builds toward modern machine learning applications, allowing students to see how probability theory powers many of today's AI technologies.
The course starts with combinatorics, the mathematics of counting. Although counting may seem basic, it is one of the most important foundations of probability. Students learn how to calculate the number of possible outcomes in different situations using techniques such as permutations and combinations. These concepts are useful in computer science, cryptography, search algorithms, optimization problems, and game theory.
Key concepts learned in counting:
Fundamental counting principles
Permutations
Combinations
Arrangements and selections
Probability through counting
Real-world counting applications
After establishing a foundation in counting, students move into probability theory. Probability provides a mathematical framework for dealing with uncertainty. Since most real-world systems involve incomplete information, understanding probability is essential for data science, machine learning, artificial intelligence, finance, and scientific research.
Students learn the basic axioms of probability and discover how probabilities can be combined, manipulated, and interpreted. They gain an understanding of events, sample spaces, probability laws, and mathematical reasoning under uncertainty.
One of the most important topics in the course is conditional probability. Students learn that probabilities can change when additional information becomes available. This idea is extremely important because intelligent systems constantly update their predictions as new data arrives.
For example, a medical diagnosis system changes its prediction when new test results become available. Recommendation systems update suggestions when users interact with products. Search engines adjust rankings based on user behavior. All of these rely on conditional probability.
Students learn how Bayes' Theorem updates beliefs using new evidence and forms the basis for many AI applications.
Applications of Bayes' Theorem:
Spam filtering
Medical diagnosis
Fraud detection
Recommendation systems
Machine learning classification
Risk prediction
The course also explores independence. Students learn how to determine whether events influence one another and how independence simplifies probability calculations. Understanding independence is critical because many statistical models assume variables behave independently.
The next major topic introduces random variables. Random variables allow uncertain outcomes to be represented numerically. Instead of thinking only about events, students learn to analyze measurable quantities mathematically.
Topics include:
Discrete random variables
Continuous random variables
Probability mass functions
Probability density functions
Cumulative distribution functions
Students then study expectation, often called the expected value. Expectation represents the long-run average outcome of a random process. This concept appears everywhere in machine learning, economics, finance, and decision-making systems.
Closely related to expectation is variance, which measures how much values deviate from the average. Students discover that understanding variability is just as important as understanding averages.
Important statistical measures learned:
Mean
Expectation
Variance
Standard deviation
Distribution spread
Risk measurement
The course next introduces probability distributions. These distributions model different types of uncertainty and are fundamental building blocks of machine learning.
Students begin with the Bernoulli distribution, which models simple yes-or-no outcomes. Building on this foundation, they study the Binomial distribution, which models repeated trials.
Examples include:
Coin flips
Product quality tests
Customer conversions
Survey responses
Students then explore a variety of discrete probability distributions and learn how each can model different types of real-world data.
The course subsequently transitions into continuous random variables. Unlike discrete variables, continuous variables can take infinitely many values.
Examples include:
Height
Weight
Temperature
Time
Income
Distance
One of the most important topics in the course is the Normal Distribution. Students learn why the familiar bell curve appears throughout nature, science, business, and machine learning.
Why the Normal Distribution matters:
Models natural variation
Supports statistical inference
Used in hypothesis testing
Appears in machine learning algorithms
Forms the basis of many predictive models
The curriculum then becomes more advanced by introducing joint distributions. Students learn how multiple random variables interact with one another. This allows them to analyze more realistic systems where several factors influence outcomes simultaneously.
The course further explores conditional distributions, teaching students how probability changes when specific information is already known. This concept strengthens their understanding of predictive reasoning and uncertainty management.
Students are also introduced to the Beta Distribution, an important distribution in Bayesian statistics. It provides a flexible way to represent uncertainty about probabilities themselves and plays a major role in modern machine learning methods.
Another significant section focuses on covariance and correlation.
Students learn:
Measuring relationships between variables
Positive correlation
Negative correlation
No correlation
Linear dependence
Data pattern discovery
These tools help identify hidden relationships within datasets and form the basis for many predictive analytics techniques.
Conditional expectation is another sophisticated concept covered in the course. Students learn how expected values change when certain information is known beforehand. This topic is important in economics, finance, reinforcement learning, and statistical modeling.
One of the most powerful theoretical ideas presented is the Central Limit Theorem. Students discover that when many random variables are combined, their sum often follows a normal distribution regardless of the original distribution.
Key insights from the Central Limit Theorem:
Explains why normal distributions appear frequently
Enables statistical inference
Supports confidence intervals
Forms the foundation of modern statistics
Essential for data science
After building a strong foundation in probability, the course shifts toward machine learning. This transition helps students understand how mathematical theories directly power intelligent systems.
The machine learning section begins with parameters and learning. Students learn how machine learning models use data to automatically adjust internal parameters and improve predictions.
Topics include:
Training models
Learning from data
Model fitting
Prediction
Generalization
Performance evaluation
Students then study Maximum Likelihood Estimation (MLE), one of the most important techniques in machine learning.
MLE teaches students how to:
Estimate model parameters
Fit statistical models
Optimize predictions
Analyze uncertainty
The course also covers Maximum A Posteriori (MAP) estimation, which extends likelihood-based methods by incorporating prior knowledge into learning.
This leads naturally into Naive Bayes, one of the most famous probabilistic machine learning algorithms.
Students learn how Naive Bayes can:
Classify emails
Detect spam
Categorize documents
Predict outcomes
Perform efficient classification
Although simple, Naive Bayes remains widely used because of its speed and effectiveness.
Logistic Regression is another major topic covered in the machine learning section. Students learn how to predict probabilities and perform classification tasks.
Applications of Logistic Regression:
Customer churn prediction
Disease diagnosis
Credit risk assessment
Marketing analytics
Fraud detection
User behavior prediction
The course also provides an introduction to Deep Learning. Students gain exposure to neural networks and learn how modern AI systems recognize patterns in large datasets.
Deep learning topics include:
Neural networks
Hidden layers
Learning representations
Pattern recognition
Modern AI systems
Large-scale prediction
Although the course does not focus exclusively on deep learning, it helps students understand how probability and statistics support today's most advanced AI technologies.
The final machine learning lessons emphasize practical implementation. Students learn how to apply algorithms to real-world problems, evaluate model performance, interpret results, and understand the strengths and limitations of different approaches.
Throughout the course, students develop valuable analytical and computational skills, including:
Technical skills gained:
Statistical reasoning
Probability modeling
Data analysis
Machine learning fundamentals
Predictive modeling
Mathematical thinking
Data interpretation
Decision-making under uncertainty
Machine learning skills gained:
Naive Bayes
Logistic Regression
Parameter estimation
Model evaluation
Classification techniques
Predictive analytics
Applied machine learning
By the end of CS109, students possess a strong understanding of probability theory, statistical inference, machine learning fundamentals, and data-driven decision-making. They learn how concepts such as counting, Bayes' Theorem, probability distributions, expectation, variance, correlation, and conditional reasoning directly contribute to modern artificial intelligence systems. The course serves as an excellent foundation for advanced studies in machine learning, data science, artificial intelligence, statistics, and computer science while teaching students how to think rigorously about uncertainty, evidence, and prediction in the real world.
