CChief A.I., Oh!

Tier 05  /  Adjacent AI

The rest of the AI map. So nothing surprises you.

Chat is the front door. The rest of AI is image generation, voice synthesis, transcription, agents, and coding assistants. You don't need to master each one. You need to know they exist, what they do, and when to reach for them so the next product launch doesn't disorient you.

Approx 25 min read Tour, not deep dive Pick one or two to try

01 · The map

Six adjacent surfaces.

Each block below is a category, not a single product. Skim them all. Bookmark the one or two that fit how you work.

1

Image generation

From text, get a picture. Photos, illustrations, logos, mockups.

2

Video generation

Same idea, moving. Still rougher than image. Improving fast.

3

Voice & audio

Synthetic speech, voice cloning, music. Replace your "I'll record a voiceover later."

4

Transcription

Audio in, text out. Meetings, calls, voice notes, interviews.

5

Agents

AI that takes actions on your behalf, browse, click, fill forms, run jobs.

6

Coding assistants

AI inside the developer's editor (or terminal). Even non-coders benefit indirectly.

02 · Image generation

From a sentence to a picture.

The most accessible adjacent skill. Free tools exist. Paid tools are still cheap by any pre-AI standard.

ChatGPT (DALL-E + GPT-4o)built in

  • Inside the same chat box
  • Best for conversational editing ("now make the sky orange")
  • Good at text inside images

Gemini (Imagen / Nano Banana) built in

  • Inside Gemini app + Workspace
  • Excellent photorealism
  • Consistent characters across multiple images

Midjourney separate product

  • The aesthete's favorite
  • Best stylized / artistic output
  • Lives in a web app (formerly Discord)

Flux (Black Forest Labs) open model

  • Open-weights model behind many third-party image apps
  • Strong quality, runs anywhere
  • You'll meet it inside products like Ideogram, Krea, Replicate

Adobe Firefly creative pro

  • Inside Photoshop, Illustrator, Express
  • Trained on Adobe Stock, commercially safe
  • If you're already in the Adobe stack, the easiest entry

Ideogram design-friendly

  • Best in class at text inside images
  • Posters, logos, ads with readable copy
  • Free tier is generous

Prompting an image model is different

Describe the picture, not the task

"A photo of an empty desk by a window, morning light, shallow depth of field, 35mm" beats "make me a desk image."

Style words matter

"Watercolor," "ink wash," "Polaroid," "studio lighting," "isometric vector." Concrete style words drive the look.

Iterate by editing, not restarting

Most modern image tools let you say "same, but X" and keep continuity. Use that.

!

Rights matter

Commercial use rules vary by tool. Firefly is trained on licensed data and is safe for commercial work. The others are more ambiguous, generally fine for personal/internal use, riskier for ads or anything you'd sell. When in doubt, check the tool's terms.

03 · Video generation

From a sentence to moving pictures. Still rough, improving fast.

Video gen is roughly where image gen was two years ago: amazing demos, awkward production reality. Worth tracking, not yet a daily tool for most beginners.

OpenAI Sora

  • Bundled into ChatGPT for paid users
  • Strong at short, cinematic clips
  • Currently best for B-roll and concept videos

Google Veo

  • Inside Gemini and dedicated Google products
  • Best multimodal understanding (knows what you're asking for)
  • Tight integration with Workspace and YouTube

Runway

  • Independent video AI shop
  • Most professional editing surface, masks, motion, style transfer
  • Where many real creators are doing real work

Pika / Luma / Kling

  • A bench of fast-moving smaller players
  • Worth scanning quarterly; the leader changes
AB
AiAi Bro

Don't subscribe to a video AI yet unless you have a specific job in mind. The free trials of Sora-in-ChatGPT or Veo-in-Gemini are enough to learn the shape. Real production workflows are still finicky, coherent characters, lip sync, anything longer than 10 seconds, and the best practice is to wait until you have a project demanding it.

04 · Voice & audio

Synthetic speech that's actually convincing.

Two separate things live under "voice AI": the voice mode inside ChatGPT/Claude/Gemini (Tier 4) and standalone voice generation tools that turn your text into a custom-sounding audio file. This section is about the second.

ElevenLabs market leader

  • The gold standard for synthetic voice
  • Voice cloning from 30 seconds of sample
  • Hundreds of preset voices in dozens of languages
  • What podcasters, audiobook narrators, video producers use

OpenAI TTS

  • Available via API; a few preset voices
  • Powers ChatGPT's voice mode
  • Good enough for most use cases; less expressive than ElevenLabs

Google TTS / Chirp

  • Inside Google Cloud + Workspace
  • Strong on multilingual
  • What you'll use if you're building inside Google's stack

Suno / Udio music

  • Different category: text-to-music
  • Type a song prompt, get a full track with vocals
  • Worth knowing exists; mostly novelty for non-music people

Common uses for synthetic voice

05 · Transcription

Audio in, accurate text out.

The unsexy adjacent skill that pays back the fastest. Anyone with meetings, calls, or voice notes saves real hours.

OpenAI Whisper

  • Open-source model; near-state-of-the-art accuracy
  • Powers many of the tools below
  • You can run it yourself; most people use it via a product

Otter.ai

  • Real-time meeting transcription + summaries
  • Integrates with Zoom, Meet, Teams
  • Mainstream choice for business users

Granola

  • Lighter, less intrusive; runs in the background of any call
  • Auto-summary, action items, your typed notes blended in
  • Increasingly the operator favorite

AssemblyAI / Deepgram

  • Developer-facing transcription APIs
  • What products you use are built on
  • Mention only so the names aren't unfamiliar

Built-in (Apple, Google)

  • Voice Memos transcribes natively on iOS
  • Pixel's Recorder app transcribes on-device
  • Free, instant, surprisingly good
i

The transcription → LLM workflow

Record a 10-minute voice note → transcribe → paste into an LLM → ask it to extract structure ("decisions made, follow-ups, open questions"). This is the single highest-leverage adjacent-AI move for operators. Costs nothing, saves hours weekly.

06 · Agents

AI that takes actions, not just answers.

An agent is a system that decides which step to take next without you telling it each one. Booking a flight, filling a form, scraping a site, sending emails, running a workflow. As a beginner, you'll mostly meet agents through three doors below.

OpenAI Operator computer use

  • An AI that drives a virtual browser for you
  • "Book me a table at X for Friday at 7"
  • Available in Pro plans; promising, still flaky for hard tasks

Claude with computer use / Claude Code

  • Claude Code runs in your terminal and acts on real files
  • Claude's computer-use API lets it click around a screen
  • The most capable developer-facing agent today

Gemini Deep Research / Agent surfaces

  • Multi-step research that browses the web for you and produces a report
  • Lives inside Gemini's app
  • Best for "research this topic for 20 minutes and come back to me"

n8n / Zapier / Make + AI

  • Workflow automation tools, now AI-aware
  • You wire steps together visually; LLMs handle the "thinking" steps
  • Where most real business agents actually live today
!

Agents are not yet a beginner sport

Setting up real agents is more involved than building a Custom GPT. As a beginner, treat agents as "interesting, watch the space." Once you're confident at Tier 4 across a couple of LLMs, then consider taking one workflow and turning it into an actual agent.

07 · Coding assistants

Even if you don't write code, know the names.

"AI-powered coding tool" is one of the biggest product categories in tech right now. Beginners can do real, useful things with these tools even without traditional programming skills.

Claude Code terminal

  • Runs in your terminal; acts on your real files
  • The most capable agentic coder as of this guide
  • Surprisingly approachable for non-developers

Cursor IDE

  • A code editor (forked from VS Code) with AI deeply built in
  • Industry favorite among professional developers
  • Pulls from multiple models; you choose

GitHub Copilot

  • Inside VS Code, JetBrains, and other editors
  • Microsoft + OpenAI partnership
  • The original mainstream coding AI; still solid

Windsurf / Cline / Continue

  • A growing field of agentic coders
  • Worth a scan if Cursor and Claude Code don't fit

Lovable / Bolt / v0 no-code app builders

  • Generate a working web app from a sentence
  • Best for prototypes, landing pages, internal tools
  • True entry point for non-coders who want to build software
AB
AiAi Bro

If you have ever wanted "a little tool that does X" and stopped because you don't code: try Lovable or Bolt this week. You will be shocked what a non-developer can ship in an evening. The bar to build software has moved. The bar to build useful software has moved more.

08 · Search & research

AI that grounds answers in real sources.

A specialized adjacent category: tools that don't just generate, they go look something up first. Lower hallucination rate. Better for facts.

Perplexity

  • Routes through multiple models, grounds every answer in cited sources
  • The "research" search engine
  • Free tier is excellent; paid unlocks Pro Search and Spaces

ChatGPT Search

  • Built into ChatGPT
  • Pulls live web results, cites them inline
  • Solid for everyday quick research

Google AI Overviews / Gemini Grounding

  • The AI answer at the top of Google results
  • Grounding option inside Gemini for verifiable answers
  • Tied to the world's largest index

NotebookLM

  • Discussed in Tier 4, worth re-mentioning
  • Grounds answers in a finite set of sources you provide
  • Best for due diligence and study

09 · If you only do four things

The minimum-viable adjacent AI stack.

You don't need to subscribe to ten products. Most operators benefit from exactly four of the categories above. Pick one tool from each.

1
Image

One image generator

Whichever is built into your main LLM is usually enough. Add Midjourney only if you care about aesthetics.

2
Transcription

One transcription tool

Granola or Otter for meetings. Apple/Pixel voice memos for solo notes. Free options are great. Don't overthink.

3
Voice gen (optional)

ElevenLabs (free tier)

Set up an account. Clone your voice. You'll find uses for it over the next year.

4
Search

Perplexity

Pin it as a tab. Use it whenever you'd reach for Google for a factual question. Single biggest research upgrade.

10 · Where you've landed

What you can do now.

If you've worked through all five tiers:

You can explain what an LLM is, in plain English, to someone who's never used one.
You can write a prompt that gets a useful answer on the first try.
You can pick between ChatGPT, Claude, and Gemini on purpose, based on the job.
You can build a Custom GPT, a Project, or a Gem for any recurring workflow.
You know what image gen, voice, transcription, agents, and coding tools are and when to reach for them.
You have the glossary as a permanent reference for anything you forgot.
AB
AiAi Bro

You are no longer a beginner. You're a competent intermediate user across the three flagship LLMs. The next level, building agents, training your own models, integrating AI into business systems, is real, but it isn't this guide. Stay here a season. Get fluent. The depth comes from repetition, not from the next course.