AI, ML, DL & Generative AI
An Evidence-First Primer
The Core Definitions
Artificial Intelligence (AI)
A broad field of computer science focused on creating systems that simulate human intelligence and rationality, such as reasoning, learning, and self-improvement.
Source: ANSI / NIST
Machine Learning (ML)
A subset of AI that provides systems the “ability to learn without being explicitly programmed.” It improves at a task (T) with experience (E).
Source: Arthur Samuel (1959), Tom Mitchell (1997)
Deep Learning (DL)
A subset of ML that uses deep, multi-layered artificial neural networks to automatically learn hierarchical features from raw data (like pixels or text).
Source: Goodfellow et al. (2016)
Generative AI (GenAI)
A category of models (mostly DL) defined by its *purpose*: to *create new, plausible data* (e.g., text, images, code), not just make predictions.
Source: MIT / AWS
How They Relate: A Conceptual Hierarchy
The relationship is a hierarchy of scope, combined with functional paradigms that leverage the underlying methods. Not all AI is ML, but all DL is ML.
Artificial Intelligence (AI)
The entire field, including all methods and paradigms.
Symbolic AI (GOFAI)
e.g., Expert Systems, Logic
Machine Learning (ML)
(A subset of AI) – Systems that learn from data.
Deep Learning (DL)
(A subset of ML) – Multi-layer neural networks.
e.g., CNNs, Transformers
‘Classic’ ML
e.g., SVMs, Random Forests
Generative AI (GenAI)
A purpose (to create)
Powered by: Deep Learning
Reinforcement Learning (RL)
A paradigm (to decide)
Often uses: Deep Learning
Key Milestones: A Historical Timeline
1955-56: Birth of “Artificial Intelligence”
The Dartmouth workshop coins the term and establishes the *Symbolic AI* paradigm.
1958: The Perceptron
Frank Rosenblatt introduces a single-layer neural network, a foundational *Connectionist* model.
1986: Backpropagation
Rumelhart, Hinton, & Williams describe an efficient method for training *multi-layer* networks, enabling DL.
1995: Support-Vector Networks (SVM)
Cortes & Vapnik publish the SVM, a pinnacle of “classic” feature-engineered ML.
1998: LeNet-5
Yann LeCun et al. deploy a Convolutional Neural Network (CNN) to read checks, a first commercial proof of DL.
2012: AlexNet
Krizhevsky et al. win the ImageNet competition, proving DL’s dominance and starting the modern “Big Bang” of AI.
2014: Generative Adversarial Networks (GANs)
Ian Goodfellow et al. introduce GANs, kicking off the boom in high-fidelity generative image modeling.
2017: The Transformer
Vaswani et al. publish “Attention Is All You Need,” the foundational architecture for all modern LLMs (e.g., GPT).
2020: GPT-3 & Diffusion Models
GPT-3 (175B parameters) proves scale unlocks general-purpose “few-shot” learning. DDPMs become the new SOTA for image generation.
Core Technical Distinctions
Symbolic AI (GOFAI)
Key Idea: Manipulating high-level symbols based on explicit, human-coded rules and logic.
Algorithms: Expert Systems, LISP, Prolog.
Strengths: Explainable, traceable logic.
Limits: Brittle, poor handling of uncertainty, “knowledge acquisition bottleneck.”
‘Classic’ Machine Learning
Key Idea: Learning from data to improve at a task, but requires human “feature engineering.”
Algorithms: SVMs, Random Forests, Clustering.
Strengths: Effective on smaller data, efficient.
Limits: Cannot process raw data (e.g., pixels) directly; relies on manual feature creation.
Deep Learning (DL)
Key Idea: Using deep neural networks to *automatically* learn hierarchical features from raw data.
Algorithms: CNNs, RNNs, Transformers.
Strengths: State-of-the-art on complex perception tasks (vision, language).
Limits: Requires massive datasets and compute; often a “black box” (lacks explainability).
The Explosion of Model Scale (Parameters)
The DL era is defined by exponential growth in model size. Note the logarithmic scale below.
Data sourced from original publication papers (e.g., AlexNet, GPT-3).
How We Measure Success (Evaluation Metrics)
Prediction Metrics
Accuracy / F1: Measures correctness for classification tasks (e.g., spam filtering).
RMSE: Measures error magnitude for regression tasks (e.g., forecasts).
Language Metrics (GenAI)
Perplexity (PPL): Measures how “confused” a model is. Lower is better.
BLEU / ROUGE: Measures n-gram overlap with human references for translation/summarization.
Image Metrics (GenAI)
Inception Score (IS): Measures quality (distinct) and diversity. Higher is better.
Fréchet Inception Distance (FID): Measures distance between real and fake image distributions. Lower is better.
Use Cases & Key Limitations
Use Cases
- ✅ Spam Filtering: A classic ‘Classic ML’ classification task.
- ✅ Computer Vision: Object detection, medical image analysis (DL / ResNet).
- ✅ Machine Translation: Language translation (DL / Transformer).
- ✅ Content Creation: Generating code, stories, and images (GenAI / LLMs / Diffusion).
- ✅ Robotics & Games: Learning optimal control policies (RL / DQN, AlphaGo).
Limitations & Risks
- ⚠️ Data Bias (COMPAS): Models inherit and amplify human biases from data, leading to unfair outcomes.
- ⚠️ “Black Box” Problem: DL model decisions are often opaque, a major barrier in high-stakes fields like law and medicine.
- ⚠️ AI Hallucinations: GenAI models confidently produce plausible but factually inaccurate information.
- ⚠️ Adversarial Attacks: Evasion attacks (e.g., fooling a stop sign sensor) can cause physical-world failures.
- ⚠️ Model Collapse: A risk where future models degrade by training on AI-generated data.
Governance Frameworks
Because of these risks, frameworks like the NIST AI RMF (Govern, Map, Measure, Manage) and IEEE Ethically Aligned Design are critical for managing AI deployment responsibly.