Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

How to Make LLMs Actually Listen - Prompt Engineering guide

By Rashmik Kabiraj posted 26 days ago

  

How LLM works

LLM is a prediction engine. The model takes sequential text as an input and then predicts what the following token should be, based on the data it was trained on. A prompt (text or visuals) is the input to the model which uses to predict a specific output. Building the most effective prompt can be tricky. Many aspects of the prompt affect its efficiency: 

  • the model is in use
  • the model’s training data
  • the model configurations

word-choice, style, tone, structure, and context

Therefore, prompt engineering is an iterative process. Inadequate prompts can lead to ambiguous, inaccurate responses, and can hinder the model’s ability to provide meaningful output.

Prompt Engineering

Prompt engineering is the process of designing high-quality prompts that guide LLMs to produce accurate outputs. This process involves experimenting to find the best prompt, optimizing prompt length, and evaluating a prompt’s writing style and structure in relation to the task. In the context of natural language processing and LLMs, a prompt is an input provided to the model to generate a response or prediction.

These prompts can be used to achieve various kinds of understanding and generation tasks such as text summarization, information extraction, question and answering, text classification, image generation or classification, language or code translation, code generation, and code documentation or reasoning.

IBM Consulting advantage to help in prompt engineering

IBM Consulting advantage is used to portray some of the prompt engineering concepts, in this article. IBM Consulting advantage is an AI powered tool that enables consultants (not only software professionals) to augment their capabilities. It uses IBM’s granite and many more market leading models to help in performing AI tasks. More details of the said platform can be found here - https://www.ibm.com/consulting/advantage

LLM output Configuration

Most LLMs come with various configuration options that control the LLM’s output. Effective prompt engineering requires setting these configurations optimally for the task.

Output Length

An important configuration setting is the number of tokens to generate in a response. Generating more tokens requires more computation from the LLM, leading to higher energy consumption, potentially slower response times, and higher costs. Reducing the output length of the LLM doesn’t cause the LLM to become more stylistically or textually/visually correct in the output it creates, it just causes the LLM to stop predicting more tokens once the limit is reached.

Temperature

Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that expect a more deterministic response, while higher temperatures can lead to more diverse or unexpected results. A temperature of 0 is deterministic, the highest probability token is always selected (though note that if two tokens have the same highest predicted probability, depending on how tiebreaking is implemented it may not be possible to get the same output always with temperature 0).

Temperatures close to the maximum tend to create more random output. As temperature gets higher and higher, all tokens become equally likely to be the next predicted token.

Top K and Top P

Top K and top P are two sampling settings used in LLMs to restrict the predicted next token to come from tokens with the top predicted probabilities.

Top K sampling selects the top K most likely tokens from the model’s predicted distribution. The higher top K, the more creative and varied the model’s output, the lower top K, the more restrictive model’s output.

Top P sampling selects the top tokens whose cumulative probability does not exceed a certain value (P). Values for P range from 0 to 1 (all tokens in the LLM’s vocabulary).

If temperature, top K, and top P are all available (as in IBM Consulting Advantage), tokens that meet both the top K and top P criteria are candidates for the next predicted token, and then temperature is applied to sample from the tokens that passed the top K and top P criteria.

Below is an illustration of how temperature, top P and top K can be set in ICA tool.


Prompting Techniques

They are specific techniques that take advantage of how LLMs are trained and how LLMs work. This helps to get the relevant results from LLMs. Below are few prompt engineering techniques.

Zero Shot

Zero shot stands for no example. It only provides a description of a task and some text/picture for the LLM to get started with. This input could be a question, a short story, or instructions. Illustrated example using ICA is provided below.

One shot & few shot

It is possible to provide one or few examples in the prompt. It helps LLM to understand the direction of the required output.

One shot -> provides a single example. The idea is the model has an example it can imitate to best complete the task.

few-shot prompt -> provides multiple examples to the model. This approach shows the model a pattern that it needs to follow. The number of examples depend on the complexity of the task, the quality of the examples and the capability of the Gen AI models. 3-4 examples should be good start with, but complex tasks may require more examples.

For this example, I want a JSON to be returned as an output. Because it might be useful to integrated LLM output to an external application. I changed the temperature and max new tokens a little bit for this purpose.


And here is the output

System, Contextual and Role Prompting

System prompting sets the overall context and purpose for the language model. It defines the ‘what’ factor i.e. what the model should be doing.

Contextual prompting provides specific details or background information relevant to the current conversation or task. It helps the model to understand the nuances of what’s being asked and prepare the response accordingly.

Role prompting assigns a specific character or identity for the language model to adopt. This helps the model generate responses that are consistent with the assigned role and its associated knowledge and behavior.

System prompting

System prompts can be useful for generating output that meets specific requirements. The name ‘system prompt’ stands for ‘providing an additional task to the system’. For example, a system prompt can be used to return a certain structure (in this case JSON).

Role Prompting

Role prompting is a technique that involves assigning a specific role to the gen AI model. This can help the model to generate more relevant and informative output. For example, I want ICA tool to play the role of my local food guide. The prompt could be as below -

It comes back as 

Here are some writing styles that I find effective: Descriptive, Direct, Formal, Humorous, Influential, Informal, Inspirational, Persuasive. Below is the same example with ‘humorous’ writing style.

Contextual Prompting

By providing correct contextual prompts, it can be ensured that AI interactions are as seamless and efficient as possible. The model will be able to more quickly understand the request and be able to generate more accurate and relevant responses. For example, I want to write an article about ‘Parisian Food’, if the same context can be set at the beginning, the interactions are much more relevant. The input could be as below.

As an output it is guiding to write an article.

These are the major techniques for prompt engineering. In my next blog post, I'll cover chain of thoughts and few other considerations for Prompt Engineering.

1 comment
26 views

Permalink

Comments

26 days ago

Thanks for this insightful guide on Prompt Engineering! The analogy of 'making LLMs actually listen' really resonates. It highlights that while these models predict, the art of crafting effective prompts is crucial for truly guiding their predictions towards useful and relevant output.
From my own experience working with Gemini, I've found that clarity in defining the desired output format (e.g., asking for a structured analysis or a specific kind of support) significantly enhances the model's 'listening' capabilities. It's fascinating how specific instructions around tone, style, or desired response structure can dramatically shift the outcome. This iterative process you describe is absolutely key.