Generative AI with Vertex AI: Prompt Design

1. Install Vertex AI SDK and other required packages

%pip install --upgrade --user --quiet google-cloud-aiplatform

2. Restart runtime

import IPython

app = IPython.Application.instance()

app.kernel.do_shutdown(True)

3. Authenticate your notebook environment (Colab only)

import sys

if "google.colab" in sys.modules:
from google.colab import auth

auth.authenticate_user()

4. Set Google Cloud project information and initialize Vertex AI SDK
To get started using Vertex AI, you must have an existing Google Cloud project and enable the Vertex AI API.

PROJECT_ID = "isidengan project id" # @param {type:"string"}
LOCATION = "lokasi server google" # @param {type:"string"}

import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

from vertexai.generative_models import GenerationConfig, GenerativeModel
import time

5. Load model

model = GenerativeModel("gemini-1.5-flash")

import time

def call_gemini(prompt, generation_config=GenerationConfig(temperature=1.0)):
wait_time = 1
while True:
try:
response = model.generate_content(prompt, generation_config=generation_config).text
  return response
  break # Exit the loop if successful
  except Exception as e: # Replace with the actual exception type
time.sleep(wait_time)
  wait_time *= 2 # Double the wait time

def send_message_gemini(model, prompt):
  wait_time = 1
  while True:
  try:
response = model.send_message(prompt).text
  return response
  break # Exit the loop if successful
  except Exception as e: # Replace with the actual exception type
time.sleep(wait_time)
wait_time *= 2 # Double the wait time

6. Prompt engineering best practices

Prompt engineering is all about how to design your prompts so that the response is what you were indeed hoping to see.

The idea of using "unfancy" prompts is to minimize the noise in your prompt to reduce the possibility of the LLM misinterpreting the intent of the prompt. Below are a few guidelines on how to engineer "unfancy" prompts.

In this section, you'll cover the following best practices when engineering prompts:

Be concise
Be specific, and well-defined
Ask one task at a time
Improve response quality by including examples
Turn generative tasks to classification tasks to improve safety

- Be concise

prompt = "Suggest a name for a flower shop that sells bouquets of dried flowers"

print(call_gemini(prompt))

- Be specific, and well-defined

prompt = "Generate a list of ways that makes Earth unique compared to other planets"

print(call_gemini(prompt))

- Ask one task at a time

prompt = "What's the best method of boiling water?"

print(call_gemini(prompt))

prompt = "Why is the sky blue?"

print(call_gemini(prompt))

7. Watch out for hallucinations

Although LLMs have been trained on a large amount of data, they can generate text containing statements not grounded in truth or reality; these responses from the LLM are often referred to as "hallucinations" due to their limited memorization capabilities. Note that simply prompting the LLM to provide a citation isn't a fix to this problem, as there are instances of LLMs providing false or inaccurate citations. Dealing with hallucinations is a fundamental challenge of LLMs and an ongoing research area, so it is important to be cognizant that LLMs may seem to give you confident, correct-sounding statements that are in fact incorrect.

Note that if you intend to use LLMs for the creative use cases, hallucinating could actually be quite useful.

generation_config = GenerationConfig(temperature=1.0)

prompt = "What day is it today?"

print(call_gemini(prompt, generation_config))

8. Using system instructions to guardrail the model from irrelevant responses

How can we attempt to reduce the chances of irrelevant responses and hallucinations?

One way is to provide the LLM with system instructions.

Let's see how system instructions works and how you can use them to reduce hallucinations or irrelevant questions for a travel chatbot.

Suppose we ask a simple question about one of Italy's most famous tourist spots.

model_travel = GenerativeModel(
model_name="gemini-1.5-flash",
system_instruction=[
"Hello! You are an AI chatbot for a travel web site.",
"Your mission is to provide helpful queries for travelers.",
"Remember that before you answer a question, you must check to see if it complies with your mission.",
"If not, you can say, Sorry I can't answer that question.",
],
)

chat = model_travel.start_chat()

prompt = "What is the best place for sightseeing in Milan, Italy?"

print(send_message_gemini(chat, prompt))

prompt = "What's for dinner?"

print(send_message_gemini(chat, prompt))

9. Generative tasks lead to higher output variability

The prompt below results in an open-ended response, useful for brainstorming, but response is highly variable.

prompt = "I'm a high school student. Recommend me a programming activity to improve my skills."

print(call_gemini(prompt))

10. Classification tasks reduces output variability

prompt = """Saya siswa SMA. Dari aktivitas ini, mana yang Anda sarankan dan mengapa?

a) learn Python
b) learn JavaScript
c) learn Fortran
"""

print(call_gemini(prompt))

11. Improve response quality by including examples

Another way to improve response quality is to add examples in your prompt. The LLM learns in-context from the examples on how to respond. Typically, one to five examples (shots) are enough to improve the quality of responses. Including too many examples can cause the model to over-fit the data and reduce the quality of responses.

Similar to classical model training, the quality and distribution of the examples is very important. Pick examples that are representative of the scenarios that you need the model to learn, and keep the distribution of the examples (e.g. number of examples per class in the case of classification) aligned with your actual distribution.

- Zero-shot prompt

Below is an example of zero-shot prompting, where you don't provide any examples to the LLM within the prompt itself.

prompt = """Decide whether a Tweet's sentiment is positive, neutral, or negative.

Tweet: I loved the new YouTube video you made!
Sentiment:
"""
print(call_gemini(prompt))

- One-shot prompt

Below is an example of one-shot prompting, where you provide one example to the LLM within the prompt to give some guidance on what type of response you want.

- Few-shot prompt

Below is an example of few-shot prompting, where you provide a few examples to the LLM within the prompt to give some guidance on what type of response you want.

prompt = """Decide whether a Tweet's sentiment is positive, neutral, or negative.

Tweet: I loved the new YouTube video you made!
Sentiment: positive

Tweet: That was awful. Super boring 😠
Sentiment: negative

Tweet: Something surprised me about this video - it was actually original. It was not the same old recycled stuff that I always see. Watch it - you will not regret it.
Sentiment:
"""

print(call_gemini(prompt))