Pattern Recognition (PR) and Machine Learning (ML) are two subfields of Artificial Intelligence that are closely related to each other and that aim at finding regularities in data. This module gives a general introduction to PR & ML, from applications to theoretical aspects in Statistical Learning Theory. After a presentation of the different learning settings, we will address some of the main issues that a learner faces during the training step: the curse of dimensionality, and the overfitting/underfitting phenomena. Then, we will present an overview of supervised learning showing that learning typically boils down to optimizing some convex loss function under regularization constraints. We will study the bias/variance trade-off and the main theoretical conditions for an algorithm to learn well. Lastly, we will focus on Occam’s Razor principle stating that a good model is the simplest one which is consistent with the training data. In this context, we will see how to induce sparse models by making use of mathematical norms.