AI, ML, DL π€
You must have seen these terms popping up almost everywhere. Ever wondered what the connections and differences are among them? Here is a quick summary.
π€ Artificial Intelligence (AI)
AI is the broadest term among the three. TLDR: it’s about making machines capable of doing things that would normally require human intelligence.
As computers became more powerful and widely adopted, the ambition grew too: humans wanted computers to handle not just calculations, but tasks requiring human judgement, in order to automate processes and improve productivity.
AI covers the field of Machine Learning, Deep Learning, and much more, including computer vision, game play, robotics, etc.
AI isn’t a new idea - ref.
Artificial intelligence was founded as an academic discipline in 1956
So why is it suddenly everywhere now? A few things came together:
- More data β the internet and smartphones generated an explosion of data to learn from.
- More compute β GPUs and cloud computing made it affordable to train large models.
- Better algorithms β breakthroughs like deep neural networks and the Transformer architecture unlocked capabilities that older approaches couldn’t match.
These three forces compounded over time, and what was once a niche research field crossed a threshold where it became genuinely useful β and then unavoidable.
Before ML and DL, most AI was rule-based β explicit instructions written by humans telling the machine exactly what to do. Here are a couple of examples:
- A cooking robot following a fried rice recipe: set the temperature to X, add this much of each ingredient, then execute the steps in order.
- The COM opponent in old video games: already quite hard to beat, yet likely powered entirely by hand-crafted rules.
π Machine Learning (ML)
Machine Learning is a subset of AI.
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit programming language instructions.
There are several approaches:
- Supervised learning β the model is trained on labelled data (e.g. emails marked as spam or not spam), and learns to make predictions on new, unseen data.
- Unsupervised learning β the model is given unlabelled data and finds patterns or groupings on its own (e.g. clustering customers by behaviour).
- Reinforcement learning β the model learns by trial and error, receiving rewards for good actions and penalties for bad ones (e.g. training a game-playing agent).
This solves problems for things that cannot be preprogrammed, like content personalisation, recommendations, or discovering patterns in the data.
The obvious examples are all sorts of feeds in your social media account, in your shopping account etc. These are shown to you by a model trained to predict what you would like to see.
Another example is detecting spam email or flagging bad content β these are classic cases of supervised learning, where the model is trained on labelled examples of spam vs. not spam.
I remember using fastText to train a model for our review spam detection during one of the REA hackday. I had to find open-source SMS data and label them as spam vs. non-spam, then feed it through fastText β the output was a trained model. The biggest challenge was not the training itself, it was actually to produce the input data.
π§ Deep Learning (DL)
Deep Learning is a subset of ML.
deep learning (DL) focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and revolves around stacking artificial neurons into layers and “training” them to process data. The adjective “deep” refers to the use of multiple layers (ranging from three to several hundred or thousands) in the network.
Essentially, it is using many layers of neural networks to learn from large amounts of data and perform tasks.
One of the key things DL unlocked is automatic feature learning. In traditional ML, engineers have to manually decide what inputs the model should look at β for example, word count, presence of certain keywords, sender reputation. This is called “feature engineering”, and it doesn’t scale. I don’t have much first-hand experience with it, but my understanding is the pipeline can look something like:
raw data -> human -> feature -> model -> output
DL removes that burden. As Wikipedia puts it:
In the deep learning approach, features are not hand-crafted and the model discovers useful feature representations from the data automatically.
You just feed it raw data β text, images, audio β and it figures out what to look at on its own.
The most popular applications we are using now, are the Large Language Models (LLM), think ChatGPT, Claude etc.
DL laid the foundation for these applications, and a key breakthrough that made it possible was the Transformer architecture β the innovation that led to today’s GenAI boom.
Let’s explore it in the future chapter.
So, how do they relate?
![]()
Think of them as three nested circles:
- AI is the broadest field β any technique that enables machines to mimic human intelligence, whether rule-based or learned.
- ML sits inside AI β it’s the subset where machines learn from data rather than follow hand-written rules.
- DL sits inside ML β it’s the subset of ML that uses deep neural networks to learn features automatically from raw data, removing the need for manual feature engineering.
The key practical difference between ML and DL comes down to this:
| ML | DL | |
|---|---|---|
| Input | Structured, human-defined features | Raw data (text, images, audio) |
| Feature extraction | Done manually by engineers | Learned automatically by the model |
| Data needed | Works with smaller datasets | Typically needs large amounts of data |
| Example | Spam filter using word counts | ChatGPT understanding your question |
In short: all DL is ML, and all ML is AI β but not the other way around.