Tier 01 / Foundations
Before you touch a prompt, you need a working mental model of what's happening on the other side of the screen. This tier covers the seven ideas that make every other tier click into place. No math. No jargon you can't say out loud.
01 · The nesting dolls
These words get used interchangeably and it's killing your ability to understand anything. They are nested. Each one is a subset of the one before it.
The widest umbrella. Any computer system that does something we'd call "intelligent" if a human did it. Includes chess engines, GPS routing, spam filters. Most AI in the wild is not new and not generative.
A subset of AI where the system learns patterns from data instead of being programmed with explicit rules. Show it ten thousand cat photos, it learns what a cat looks like. Recommendation engines, fraud detection, weather models.
A flavor of ML that uses neural networks with many layers. The "deep" just means "many layers." This is what unlocked the modern wave of AI starting around 2012, and everything you actually use day-to-day is built on it.
Deep-learning systems that produce new content, text, images, audio, video, code, instead of just classifying or predicting. ChatGPT writes, Midjourney paints, ElevenLabs speaks. All generative.
A generative AI specialized in language. GPT-4, Claude, Gemini, Llama. It reads text and writes text. It's the thing under the chat box. When people say "AI" in 2026, they almost always mean an LLM.
If you're reading a headline that says "AI did X," ask yourself: was that an LLM, or was it old-school ML? The answer changes the meaning of the headline. A bank's fraud system is AI. ChatGPT is AI. They have almost nothing in common.
02 · How an LLM works
This sentence is the most important thing on this page. Internalize it and most "weird AI behavior" stops being weird.
Your prompt. Plus, often, a hidden system prompt and any earlier messages in the conversation.
A token is a chunk of text, usually a word or part of a word. The model assigns a probability to every possible next token and picks one. That's the whole trick.
Token, token, token, until it decides to stop or hits a limit. What looks like a "response" is just this loop running fast.
It has patterns from training data baked into its weights. There is no database of facts it queries. When it sounds confident and wrong, that's not a bug, it's the default behavior of the system.
People reach for the word "thinking" and get themselves into trouble. An LLM is not thinking in any sense you'd recognize. It is producing the next likely token given the previous tokens. If you keep that image in your head, you will write better prompts and you will stop being shocked when it makes things up.
03 · Training vs inference
Training is how the model gets built. Inference is what happens every time you press Enter. Confusing these is the source of half the "but does my data get used?" anxiety.
| Training | Inference | |
|---|---|---|
| When it happens | Once, before the model is released (then occasionally re-done for new versions) | Every time you send a message |
| Who does it | The lab (OpenAI, Anthropic, Google) on huge GPU clusters | You, with one click. The model runs on the lab's servers. |
| Cost & time | Millions of dollars, weeks or months | Fractions of a cent, seconds |
| Data flow | Massive corpus → model weights (the model "absorbs" patterns) | Your prompt → response. Weights don't change. |
| What changes | The model's parameters | Nothing about the model. Just the conversation log. |
"If I tell ChatGPT a secret, it'll learn it and tell other users." Almost certainly not. Your message is used at inference time to generate your reply, and that's it. Whether the provider stores your conversation for future training is a separate question with a per-product answer, usually controllable in settings. Inference doesn't change the model; training does.
04 · Tokens & context
If you only memorize two technical words from this whole guide, make it these. Tokens are the unit. Context window is the budget.
A chunk of text the model sees as a unit. Roughly 4 characters of English, or about 3/4 of a word. "Strawberry" is 3 tokens. "Hello" is 1. The model never actually sees letters; it sees tokens.
The maximum number of tokens the model can consider at once, your prompt + the conversation history + its reply. If you exceed it, the oldest stuff falls off the front.
Everything you and the system send into the model.
Everything the model writes back. Output tokens cost more than input tokens, usually 3-5x more, because generating is harder than reading.
| Context window | Roughly equals | What that lets you do |
|---|---|---|
| 8K tokens | ~12 pages of text | A long chat or a short document. Old models. |
| 128K tokens | ~250 pages | A book chapter or a long PDF. Modern frontier baseline. |
| 200K tokens | ~400 pages | A full novel. Claude's standard window. |
| 1M tokens | ~2,000 pages | A whole bookshelf. Gemini's flagship, and Claude's enterprise tier. |
"Why did the AI forget what I said earlier?" Almost always: the conversation outgrew the context window, or the model had a smaller window than you assumed. Long chats degrade. Start fresh with a clean summary when you feel a thread getting incoherent.
05 · Parameters & "size"
A parameter is one of the numbers inside the model that gets adjusted during training. Modern frontier models have hundreds of billions to trillions of them. You'll see numbers like 7B, 70B, 405B thrown around.
For years it more or less did. Today, training technique matters at least as much as raw parameter count. A well-trained smaller model can beat a poorly-trained larger one.
Larger models are slower and cost more per token to run. Why the frontier labs ship a fleet (e.g. Haiku, Sonnet, Opus): a tier for cheap-and-fast, a tier for smart-and-slow.
In the consumer apps, you pick a product (ChatGPT Plus, Claude Pro, Gemini Advanced) and the app picks the model. You only think about parameters if you're using the API or running an open-source model locally.
06 · Hallucinations
A hallucination is when the model produces something that sounds right but isn't true. Fake citations, fake quotes, fake URLs, fake legal cases, fake people. Built into how the technology works. Mitigated, not eliminated.
The model is rewarded during training for producing plausible-sounding text, not for being right. When it doesn't know, it produces something that sounds like the right kind of answer instead of saying "I don't know."
Names, dates, numbers, citations, quotes, laws, medical claims. Treat the LLM's answer as a confident draft, not a fact.
The model may invent the source itself. The check is mandatory, not the ask.
"Search the web," "use the files I uploaded," ChatGPT Search, Gemini Grounding, Perplexity. These tie answers to real retrieved documents and cut hallucination rates dramatically. Not to zero.
Most prompts implicitly punish "I don't know" by demanding a confident answer. Explicitly tell it: "If you don't know, say so. Don't guess." That single line helps.
Hallucination is the cost of the technology, not a sign you're using it wrong. The skill isn't picking a model that doesn't hallucinate. It's building the habit of verifying anything load-bearing, and structuring your workflow so a wrong answer is caught before it does damage. Trust, but read.
07 · The model landscape
You don't need to track every release. You need the rough shape of the map so you don't get lost when a new headline drops.
These are the companies racing to build the most capable closed models. You access them through their consumer apps and APIs.
Open models you can download and run yourself (or have someone host for you). You will not use these directly as a beginner, but you should know the names because they show up everywhere.
As a non-technical user, your world is three frontier labs and three consumer products: ChatGPT, Claude, Gemini. That's it. Open-source models matter for what they enable other tools to do cheaply, but you won't typically pick one yourself.
08 · Multimodality
"Multimodal" means the model can take more than just text as input, images, audio, sometimes video. Most modern frontier models are multimodal by default.
| Input | What you can do | Example |
|---|---|---|
| Text | Standard chat | Ask a question, draft an email |
| Image | Describe, transcribe, analyze, critique | "Read this screenshot of an error message," "What's wrong with this chart?" |
| PDF / file | Summarize, extract, compare | "Pull every dollar figure out of this contract" |
| Audio (voice in) | Speech-to-text in flight | Voice mode in ChatGPT or Gemini Live |
| Audio (voice out) | Spoken reply | Same voice modes, replying back to you |
| Video | Frame-by-frame understanding | Gemini's video features; growing in others |
The biggest unlock for new users is the screenshot. Once you realize you can paste a screenshot of literally anything, an error, a chart, a contract, a UI you don't understand, and ask the model to read it, your daily workflow changes overnight. You stop typing descriptions of things you could just show.
09 · Before you climb
If you can't answer each of these in one or two sentences without scrolling up, re-read the section before climbing to Tier 2.