Prompt Engineering
With the integration of large language models, the necessity for good prompt engineering emerged. Prompt engineering is the easiest and most accessible way to make an LLM do what you want, especially compared with more resource-intensive methods such as pre-training, post-training (which often includes fine-tuning and alignment), and full fine-tuning.

What is Prompt Engineering?
I recently read the book "AI Engineering", which dedicates an insightful chapter to prompt engineering. The book highlights that prompt engineering lets you leverage the immense capabilities of a pre-trained foundation model without needing to alter its weights or undergo extensive training processes, which require significant computational power, data, and expertise. It's presented as the initial — and often sufficient — step for adapting LLMs to various tasks. This perspective is shared by many industry leaders. For instance, Chip Huyen, the author of the book, along with Sam Altman from OpenAI, emphasizes that prompt engineering is crucial for effectively interacting with AI systems. Going even further, Joe Davis, EVP of Platform Engineering & AI Technology at ServiceNow, views prompt engineering not just as a role but as a baseline expectation for working effectively with AI, highlighting its value in producing faster, more accurate results and driving efficiency.
Now that we understand its importance, let's check what prompt engineering is. There are many definitions out there, but based on the book I would describe it as "the process of designing inputs (prompts) to guide a large language model (LLM) or foundation model to produce a desired output." Based on this description it might seem easy to get the right response from the LLM, right? Here it gets tricky — and this is why mastering prompt engineering matters. The better the prompt, the more likely the LLM returns relevant information and avoids hallucinations. Let's dive into different aspects of prompt engineering.
System prompt and user prompt
Many LLM APIs let you split a prompt into a system prompt and a user prompt. The system prompt gives the LLM high-level instructions and often specifies the persona it should adopt. The user prompt is the specific task or question the user wants to address. For example, a user might interact with our application via a web or desktop interface and ask it something. Having a system prompt helps the LLM narrow its behavior, add guardrails, and even improve performance.
To make it more illustrative, let's imagine we build the following wrapper around an LLM as a coding assistant:
System prompt: You are an expert software engineer and coding
assistant. You help with code explanation, give recommendations on
improvements, share guidelines and much more. Keep
explanations concise (2–4 sentences). Provide code first,
explanation after. You should also not access external
networks or secrets. Do not invent unspecified project context.
User prompt: I am starting a new project in Java. I would like to
build a service that allows setting daily goals. This means I
would need to be able to create, update, select and delete goals.
How can I start?
It's good to know that, behind the scenes, these two prompts get combined into a single prompt using a template. Different models use different templates for these use cases. But the reason system prompts are helpful is that they come first: research shows models tend to follow instructions that appear earlier in the context.
Clarity
The more context you can give the LLM, the better. Sure, context windows have grown tremendously in recent years, but you still need to communicate your intent clearly. There should be no ambiguity. If you want the model to score homework, tell the LLM what the scoring range is. You might assume it would use school grades, but you cannot guarantee that. By stating in the prompt to use high-school grades, you give the LLM the additional context you desire.
If we go with the scoring example, you might want to give further context. You can apply a persona to the LLM; in this case it could be "You are an English high-school teacher for first-year students." This will ensure the grading is done at the expected level, and not, for example, at a senior level.
Like with humans, it helps to give examples. This is called zero-shot, one-shot, or few-shot prompting. Examples provide additional context. If we build upon the example above, you might give the LLM examples of what an "A" or "A+" looks like. For each grade you can give a concrete example. This helps the LLM learn how to rate the homework.
You also have the option to define the desired output. You can clarify this in the prompt and back it up with a concrete example. To further build upon the example I created, imagine you want the response to be concise to save on tokens. You can do this by stating: "The output should only contain the score and one or two sentences explaining the score." You can also enforce formatting, by specifying an example:
Score: A
Reason: The essay was well written with minimal grammar mistakes.
Break down problems
LLMs are generally better at simpler tasks than at complex, multi-step ones. Not diminishing the tremendous progress we've had so far with LLMs and acknowledging that models are getting "smarter", breaking down complex problems into smaller tasks is still recommended when possible. Let's say you bought an electronic device and need to contact technical support. A helpful chatbot might follow these three steps:
- Ask you to describe the problem. Based on that it would identify which electronic device is affected.
- Based on the device, it would offer the most obvious solutions first.
- If the user is not satisfied with the solutions, the chatbot would ask whether they want to get in touch with a technical specialist to resolve the issue.
This problem can be broken down into three prompts:
- The first prompt identifies the device
You are a technical support assistant. Based on the customer's
description of their issue, identify which electronic device
they are having trouble with. Ask clarifying questions if
needed to narrow down the specific product. Once identified,
confirm the device with the customer before proceeding.
- The second prompt looks for obvious solutions
The customer is experiencing issues with [DEVICE_NAME]. Review
the most common problems for this device and provide 2–3 solutions
that have resolved similar issues for other customers. Present
these solutions in a clear, step-by-step format that is easy
to follow.
- The last prompt identifies if the customer prefers a person in order to resolve the issue
The customer has tried the suggested solutions but their issue
persists. Acknowledge their frustration and ask if they would
like to speak with a technical specialist for personalized
assistance. If yes, collect necessary contact information and
explain the next steps.
How small each subtask should be depends on the situation — performance, cost, latency, and accuracy. You will most likely have to fine-tune this by trying various options. Breaking a problem into simpler parts is also beneficial for monitoring, debugging, parallelization, and effort. But of course there is the downside of added latency.
Use reasoning
You can ask the LLM to "think" about a question before answering. Chain-of-thought (CoT) prompting and self-critique are useful techniques. CoT means explicitly asking the model to think step by step, instead of just producing the final output. You can nudge the model by adding "think step by step" or "explain your decision" to the prompt. Besides being easy to apply, CoT can also reduce hallucinations.
Besides CoT, there is also self-critique for introducing reasoning. Self-critique means asking the model to validate its response before finalizing the answer. Like CoT, self-critique nudges the model to think more critically about a problem.
Iterate on prompts
As already mentioned, you might try a variety of prompts before you arrive at something satisfying. Hopefully these tips will help you refine your prompts and make your life easier. One final tip: always ask yourself, when defining a prompt, does the LLM have proper context? If I explained the same problem to a human and gave them the same context, would they be able to answer? I ask myself this question every time to evaluate how good a prompt is. You can go further — you already have tools at your disposal like PromptLayer or PromptFlow — and you can also use an LLM to fine-tune or evaluate a prompt.
What is your favorite technique for prompt engineering? What is something that never fails you when chatting with an LLM?
Published by...