7  Working With AI

The durable thing about using a language model well is a method, not a trick list. The top result on any search for “best ChatGPT prompts” today will be obsolete by the time the next model ships, and the one after that will be obsolete by the model after. The half-life of a prompt-engineering hack is roughly one model generation. The half-life of the underlying discipline — knowing when to trust the answer, when to push back, when to throw the conversation away and start over — is much longer.

This chapter is the universal version of that discipline. By the end of it you will have a way of working with these tools that does not depend on which model is on the market this quarter, and that the rest of Part II will then sharpen for specific fields. The shape is three layers. A mindset that decides how you treat the thing you are talking to. A small set of tactics that are durable conversational moves. A system that takes what worked once and turns it into something you can use again. Each layer is independent of the others, and each is independently improvable. None of it requires you to write code.

One ground rule first. The point of working with a language model is to augment your thinking, not to subcontract it. The chapter’s whole architecture is bent toward that end, and if a passage ever reads as if the model should be doing your job for you, I have written it badly.

The mindset — partner, not oracle

The biggest single change is upstream of any specific technique. Stop treating the model as a search engine that happens to talk back. Start treating it as a conversational partner with the patience of a saint and the memory of a goldfish.

A search engine returns a list. A partner enters a back-and-forth. The first interaction is rarely the answer; it is the first turn of a process that, if you stay in it, gets better with every move. The model is good at producing material — drafts, options, lists, restatements, candidate analyses — and bad at deciding what to do with that material. Deciding what to do is your job. The skill is in keeping the conversation pointed at your question rather than the model’s first plausible response to it.

The change shows up most clearly in how you open the conversation. The wrong opening asks the model to do the work. The right opening asks the model to help you frame the work. “Write an email asking my manager for a raise” is the wrong opening; the model will write you a competent template that nobody who knows your situation would have written. The right opening is the one where you stay the thinker: “I need to write that email. Before I draft it, what should I have figured out first? What evidence belongs in it, what tone usually works in this context, what objections am I likely to face?” The model now has a job it is genuinely good at — laying out a structured first cut of the problem — and you have a starting point you can defend.

Same shift in lower-stakes contexts. Planning a child’s birthday party with a vague “give me a list of party ideas” gets you a list. The same conversation reopened as “I’m planning a science-themed party for my 7-year-old. What logistical details do I need to think through to make it run smoothly?” gets you a checklist of things you would have forgotten on the day. The model is no smarter in the second case. It has just been pointed at a question it can actually answer.

The mechanical reason this works — why the same model gives you a thoughtful checklist in one framing and a generic list in the other — sits in Part I. What you are doing, in the partner framing, is loading the conversation with the structure of the problem before you ask for an answer to it; every token that goes into the window changes what comes out next. The Part I chapters on language modelling and on the model’s known limitations are the long version. The short version is: the partner framing makes you write better contexts, and a model is only ever as good as the context you have built for it.

A consequence worth stating directly. Because the partner is fluent and friendly, it is easy to relax the critical filter you would apply to anything else. Don’t. The reasons over-trust is dangerous — confident hallucination, plausible-sounding wrong answers, the inability of the model to know when it does not know — are the subject of the limitations chapter in Part III. The mindset move here is upstream of that one: treat the model as a brilliant colleague who has never met you, has read everything, and will sometimes confidently invent. Push back. Ask for sources. Disagree.

The tactics — durable conversation moves

A small number of moves do most of the work. None of them require you to be technical. All of them are independent of which model you are using, and most of them were already true in 2022 and will still be true in 2030. Treat them as named tools, not as a recipe; pick the ones that fit the situation.

Be explicit and strategic

The single highest-leverage habit. The model does not know what you actually want unless you say it, and almost every disappointing answer is downstream of a vague prompt. “Summarise this article” is vague. “Summarise this article in 200 words for someone who already knows the basics of monetary policy but has never heard of yield-curve inversion” is a request the model can act on. The difference is not effort; it is specificity. Specificity gives the model the same thing you give a good assistant: enough constraint that the first attempt is in the right neighbourhood.

The same lever, run in a different direction, is telling the model where to look. A generic medical query is dangerous because the model will average across everything it has ever read, including bad sources. A constrained query — “Use guidance from the Mayo Clinic and the WHO on the common symptoms of iron deficiency” — is bounded by named institutions whose authority you can independently evaluate. For consumer decisions: “Compare the Tesla Model 3, Hyundai Ioniq 5, and Ford Mustang Mach-E for a family of four. Focus on real-world range, charging speed on a standard home charger, and cargo capacity.” A question that names its dimensions gets back an answer that addresses them.

Show the model the answer you want

The named version is few-shot prompting.1 Instead of describing the output you want in the abstract, give one or two examples of the kind of output you want and let the model generalise from them. This works almost embarrassingly well for two things: tasks that are easier to show than to explain (style, tone, structural quirks of a report) and tasks where you care about the shape of the answer more than its content. When you need a list of book recommendations in a particular tone you have used before, paste two of your old ones; the third comes back matching. The baseline — no examples, just an instruction — is zero-shot, and it should always be the first thing you try. Few-shot is what you reach for when zero-shot drifts.

Tell the model who it is

Roleplay is the move where you set the model’s audience, register, and stance before asking the question. “Explain the Reed-Solomon error-correcting code.” “Now explain it as a high-school maths teacher to a curious 14-year-old who has just learned about polynomials.” The two answers come from the same model and are wildly different documents. The named role is doing the same job as the specificity move above, but along a different axis: instead of constraining what the answer talks about, it constrains how the answer talks. Use it whenever the audience or tone matters more than the raw content.

Ask the model to think before it answers

The most-named technique in the prompt-engineering literature. Asking the model to show its work — to write out a chain of reasoning before delivering a conclusion — measurably improves its accuracy on anything that involves multiple steps. The 2022 paper that named this chain-of-thought prompting reported the gains on grade-school maths problems; the technique has since been industrialised into the default behaviour of reasoning models.2 What it costs you is words; what it buys you is correctness on the kinds of question where the first plausible answer is also the wrong one.

The Part I chapter on language modelling has the mechanical reason this works (the model performs a fixed amount of computation per token, so more tokens are literally more thinking). The user-side fact is simpler: when the question is complex, ask the model to lay out the reasoning. “Walk me through how you arrived at that.” “Before you decide, list the assumptions you are making.” “Think it through step by step.” Three sentences. Worth their weight on any non-trivial question.

Ask for structure

When the answer has parts — a comparison, a table, a checklist, a set of fields, a step list — say so. “Give me the comparison as a table with one row per option and columns for price, range, charging speed, and cargo space.” Structure forces the model to surface the parts of the answer that would otherwise hide in a paragraph, and it lets you spot a missing column at a glance. The format also reads faster, which means you can iterate faster.

Ask the model to criticise its own answer

Cheap, ugly, surprisingly effective. Once the model gives you a draft, ask it to read what it just wrote and find what is wrong with it. “Read the email you just drafted. Now act as my manager, who is busy and skeptical. What lands flat? Is the tone too demanding, or not confident enough? What objection are you most worried about?” The second pass routinely surfaces problems the first pass papered over. It is the simplest possible form of the more general move where the model checks its own work, and it costs nothing to try.

Convene a committee

For decisions where one answer is not enough, generate several and then synthesise. The everyday form: open three conversations. In one, ask the model to argue for option A as a pragmatic engineer. In the second, for option B as an enthusiast. In the third, for option C as a family-focused reviewer. Then paste all three into a fourth conversation and ask it, as a senior editor, to weigh the cases and recommend. You have just done by hand what the research literature calls self-consistency — sampling multiple independent reasoning paths and selecting the answer they converge on — and it works for the same reason there: an unusual answer is much more likely to be a mistake than the answer that survives three different framings.3 Use the committee when the cost of a wrong choice is high enough to justify three minutes of setup.

A short list of principles that sit under all of the above

These are not techniques; they are facts about the medium that explain why the techniques work, recast as practical guidance.

Context matters, all of it. Everything you put in the window influences what comes out. Tone, ordering, examples, even single word choices. Be deliberate about what is in there.

One task at a time. Models are bad at conditional, branching prompts (“if the user says X, reply Y, otherwise Z”) because you are asking them to solve two problems in one shot. Split the task. Ask the model to classify first, then handle each case in its own turn.

Verbosity is compute. Asking the model to “be brief” on a hard question costs you accuracy, because there is genuinely less computation happening when the answer is shorter. For hard questions, let the model talk; for cosmetic ones, ask for terseness.

Find the minimum valuable prompt. The shortest prompt that contains everything the model needs to give a good answer. Longer is not better; denser is better. If a sentence adds nothing, cut it.

Experiment. Prompting is closer to a craft than a science, and the same instruction can produce noticeably different results in different models. Try the same task across a couple of providers, vary the framing, see what happens. None of the rules above replace running the experiment.

The system — reusable natural-language programs

The third layer is the one most people skip, and skipping it is the single biggest reason that working with a language model feels like reinventing the wheel every morning.

When you find a prompt that worked, save it. The conversation you ran to plan the birthday party, the one that produced exactly the kind of checklist you needed — that conversation is a program, written in natural language, that solves a class of problems you will face again. Keep it. Six months later, when you are planning another party, you will start at the working version instead of from a blank page.

Over time the saved prompts cluster into a personal library: a Kids’ Party Planner, a Career Conversation Prep, a Trip Planner, a Document Reviewer, a Code Review Buddy if you write code. Each one is a small natural-language program with a known shape, known inputs, and a known kind of output. The library compounds. The fifth prompt is easier to write than the first because you have learned what the moves above do for you in your specific contexts.

The platform features that have grown up around this — Custom GPTs, Projects, system prompts that persist across conversations, saved instructions, shared workspaces — are all the same idea industrialised. Take a prompt that works, give it a name, attach the reference material it needs (a CSV, a brief, a style guide), and the result is a small dedicated tool that you or your team can invoke without retyping the prompt every time. The names of the features will change. The idea — that a good prompt is an asset, not a one-shot — will not.

A small habit makes this real. Whenever a conversation produces something genuinely useful, before you close the tab, ask the model: “If I wanted to use this exact workflow again for a different input next time, what would the reusable prompt look like? Write it as a template with placeholders.” The model is good at this kind of meta-task. The template it gives you is the first draft of the saved program.

One step further. Each time you reuse a saved prompt, the next person to reuse it is you-two-months-from-now, and you-now is the only one who knows what worked and what didn’t. So write a short note alongside the template — two or three lines, free-form — recording what the prompt is for, the kind of input it expects, and the one or two things you learned the last time you ran it that the prompt itself does not yet encode. The library compounds in two directions at once: the prompts get sharper, and the marginalia get richer. A year of this habit and the library is a small private operations manual for the kinds of thinking you do most often.

A worked example — planning a trip with an AI partner

To see the three layers stack, the canonical example: planning a ten-day family vacation to Italy. The point is not the trip; the point is the shape of the conversation.

The opening is the partner move. Not “plan a trip to Italy” but “I want to plan a 10-day family vacation to Italy. Before you draft anything, what do you need to know from me to produce a useful itinerary?” The model now plays consultant, and the first reply is a structured set of questions: travellers, ages, budget, interests, pace, must-sees, dietary constraints, mobility. The questions force the inputs into the open.

Once you have answered, lock the constraints. “Great. Based on what I just told you, please summarise my constraints in a structured list, and confirm before drafting anything.” This is the be explicit tactic and the ask for structure tactic doing one job at once: you make sure the model and you share the same picture of the problem before any planning happens. Wrong assumptions get caught here, cheaply, instead of after a draft itinerary that has already absorbed them.

Now the first draft. The model produces a day-by-day plan. Read it; resist the urge to either accept it or argue with it. Instead, run the criticism tactic: “Act as a skeptical travel agent who has run a hundred trips like this. Tell me what is missing, what is unrealistic, what is going to go wrong on day six.” A useful answer might be that three major cities in ten days is too ambitious with young children; that the gap between two trains on day four is too tight; that the second hotel is in a neighbourhood that is loud at night. None of these are things the model invented in a vacuum. They are things you would otherwise have found out the hard way.

Now the loop. You feed the criticisms back. The model revises. Run criticism again. Revise again. Two or three turns of this and the itinerary is genuinely good. Not because the model is suddenly smarter on the third try, but because the conversation has accumulated the kind of context that a competent human travel agent would have built up over the same hour.

Finally, the system move. Before closing the tab: “Convert this entire conversation into a reusable Family Vacation Planner template, with placeholders for travellers, ages, country, budget, and length. Output it as a single prompt I can paste next time.” The next family trip — wherever it goes — starts at the working version of this one. One afternoon’s work becomes a reusable asset.

Notice what each layer did in this example, because the pattern repeats across every domain in the chapters ahead. The mindset move kept you in the driver’s seat at the very beginning, where it would have been easy to outsource the framing. The tactics each carved out a particular cognitive job — eliciting constraints, locking them, drafting, criticising, looping — and asked the model to do that job well, one at a time. The system move converted the work into something you can replay. None of it required a magic prompt. All of it required showing up for the conversation as the thinker, not the consumer.

Common pitfalls every user must know

The moves above are designed to make the model’s output better. The pitfalls below are about what can go wrong with you. Each one is universal — there is no domain in Part II that escapes them — and each one is worth getting in front of before the harm shows up.

The Eliza effect — misplaced trust in a conversational machine

The original ELIZA was a 200-line script written in the mid-1960s that pattern-matched a sentence like “I am sad” into “Why are you sad?” and a couple dozen similar templates.4 Its author, Joseph Weizenbaum, was unnerved to discover that even people who knew how the program worked — including his own secretary — began ascribing understanding to it, telling it things they had not told a person. The pattern recurs with every chat interface and is much stronger with modern language models, because the surface fluency is incomparably better.

The danger is not that you will fall in love with the chatbot. The danger is subtler: a model that talks like a thoughtful person triggers the social reflexes you would use with a thoughtful person, including the reflex to believe what they say. You stop fact-checking because the answer sounds confident, because the explanation reads well, because the model used the right vocabulary for your field. The mechanical reasons the answer can still be wrong are in the limitations chapter in Part III. The behavioural rule here is simpler. Read the model’s answers with the same suspicion you would apply to a stranger’s confident assertion on Wikipedia. Fluent does not mean correct. Trust is earned per claim, not per interface.

Cognitive offloading — the lazy-brain problem

There is a phenomenon in psychology called cognitive offloading: the use of external action to reduce the mental effort of a task, defined that way in a peer-reviewed review in 2016.5 It is the reason you write a shopping list, the reason you save a phone number rather than memorising it. It is mostly a good thing. But like every cognitive aid, it has a cost: the offloaded skill does not get exercised, and over time it can erode. The classic demonstration is the so-called Google effect: a 2011 study in Science showed that people who expected information to be searchable later remembered the information less well than people who expected to have to recall it from memory.6 The information became external the moment they trusted the externality.

Language models extend this in a sharper direction. A search engine externalises facts. A language model can externalise the act of thinking itself — summarising, drafting, structuring, deciding — and that is a larger surface area to lose. Use the model to think better and you come out ahead; use it to think less and you slowly lose the capacity to think well. The asymmetry is uncomfortable but real. Notice when you are reaching for the model to escape a hard piece of cognitive work that you would have grown from doing. The escape is real and the cost is real and both are paid by you.

The privacy risk of casual conversation

When you talk to a chatbot in the easy register the interface invites, it is easy to forget that the conversation is being processed by a server you do not control, owned by a company whose policy could change tomorrow, with whatever retention behaviour their terms of service specify and you have not read. People paste medical reports, financial spreadsheets, private legal correspondence, proprietary code, salary numbers, full draft emails about colleagues — material they would never copy into a public Google search — and forget that, conversationally easy or not, the data has left the room.

A 2024 ICLR paper made the structural point clearly. Even when models are explicitly instructed to be discreet, they routinely leak information across contexts that humans treat as separate.7 The framing the paper uses, contextual integrity, names exactly the problem. A conversation with your doctor is information that belongs in one context. The fact that it is structurally easy to paste it into a chat window does not mean the model — or the company behind it — will respect the context you assumed. Treat any sensitive material the way you would treat data you were sending to an unfamiliar SaaS vendor with default settings, because that is what you are doing.

You are the final authority

The tactics make the raw material from the model better. This last layer is about what you do with that material, and it is the layer that the rest of the book repeatedly comes back to.

Never trust, always verify. The model is an unreliable narrator. Read its output the way you would read an article by a competent stringer with a deadline and no editor: often correct, occasionally invented, never the last word. For anything you would act on — a date, a number, a medical claim, a legal point, a piece of code about to touch production — verify against an independent source. The model can often help you find the source. It cannot be the source.

Synthesise, don’t copy-paste. The model outputs information; what you are after is knowledge. The two are not the same thing. Information is what the conversation produced; knowledge is what survives your judgement applied to it. The real work of any session happens after the model’s reply has landed: deciding what to keep, what to throw away, what to combine with something you already knew that the model could not. The Italy itinerary in the example above is not finished when the model stops typing. It is finished when you have made the calls.

Own the outcome. Anything that goes out of the conversation with your name on it — the email sent, the decision taken, the report filed — is yours. The model is not a co-signer. This is not a moral injunction; it is just a fact about how responsibility works. The party that acted is the party accountable. The most useful working stance toward the tool, across every Part II domain to come, is this: the model is an extraordinary assistant that helps me think. The thinking is still mine, and so is what happens next.

The temptation to wriggle out of this layer is real, and worth naming. When the output is good, it is easy to take more credit than the conversation produced. When the output is wrong, it is easy to point at the model and shrug. Both moves are forms of the same mistake: treating the model as if it had agency it does not have. A language model has no stake in your situation, no memory of who you are, no view about whether the thing you are about to send is wise. Those views are yours alone. The book’s techno-pragmatist claim about every chapter ahead is the same claim applied here, in miniature: the technology is a tool; the outcomes are choices; the choices are made by the humans wielding it.

What the rest of Part II does with this

The three layers are the universal toolkit. The chapters ahead each take it into a specific field, where the mindset stays the same, the tactics get sharpened against the field’s particular failure modes, and the system layer matures into the working habits of someone who has done the job for a while. The mindset never changes. The tactics gain new emphases. The system gets larger.

What does change is the cost of a mistake. In planning a trip, a wrong recommendation is an unpleasant afternoon. In knowledge work, science, software, education, creative work, or policy, the stakes scale up, and the responsibility layer above scales with them. The same shape; a different field; the same final authority sitting in the same chair.


  1. Brown, T. B. et al. Language Models are Few-Shot Learners. NeurIPS, 2020. arXiv:2005.14165. The GPT-3 paper that named and demonstrated in-context learning — zero-shot, one-shot, and few-shot — as a usable interface to the model, without any fine-tuning.↩︎

  2. Wei, J. et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS, 2022. arXiv:2201.11903. The paper that named the technique and reported the accuracy gains on multi-step arithmetic and commonsense reasoning that subsequent reasoning-tuned models industrialised.↩︎

  3. Wang, X. et al. Self-Consistency Improves Chain-of-Thought Reasoning in Language Models. ICLR, 2023. arXiv:2203.11171. Sample multiple chains of thought independently, then select the answer they most often agree on; the committee move is the user-side analogue.↩︎

  4. Weizenbaum, J. ELIZA — A Computer Program for the Study of Natural Language Communication Between Man and Machine. Communications of the ACM 9(1):36–45, 1966. Weizenbaum’s later book, Computer Power and Human Reason (1976), is the long-form indictment; the 1966 paper is the historical anchor.↩︎

  5. Risko, E. F. & Gilbert, S. J. Cognitive Offloading. Trends in Cognitive Sciences 20(9):676–688, 2016. The canonical review; defines offloading as “the use of physical action to alter the information-processing requirements of a task so as to reduce cognitive demand.”↩︎

  6. Sparrow, B., Liu, J. & Wegner, D. M. Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips. Science 333(6043):776–778, 2011. The original demonstration that expected availability of an external memory store changes what people internalise.↩︎

  7. Mireshghallah, N. et al. Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory. ICLR, 2024. arXiv:2310.17884. Models — including instruction-tuned chat systems explicitly asked to be discreet — routinely leak information across contexts that humans treat as separate.↩︎