1. Install Vertex AI SDK and other required packages
app = IPython.Application.instance()
app.kernel.do_shutdown(True)
if "google.colab" in sys.modules:
from google.colab import auth
auth.authenticate_user()
4. Set Google Cloud project information and initialize Vertex AI SDK
To get started using Vertex AI, you must have an existing Google Cloud project and enable the Vertex AI API.
LOCATION = "lokasi server google" # @param {type:"string"}
import vertexai
vertexai.init(project=PROJECT_ID, location=LOCATION)
import time
5. Load model
def call_gemini(prompt, generation_config=GenerationConfig(temperature=1.0)):
wait_time = 1
while True:
try:
response = model.generate_content(prompt, generation_config=generation_config).text
return response
break # Exit the loop if successful
except Exception as e: # Replace with the actual exception type
time.sleep(wait_time)
wait_time *= 2 # Double the wait time
def send_message_gemini(model, prompt):
wait_time = 1
while True:
try:
response = model.send_message(prompt).text
return response
break # Exit the loop if successful
except Exception as e: # Replace with the actual exception type
time.sleep(wait_time)
wait_time *= 2 # Double the wait time
6. Prompt engineering best practices
Prompt engineering is all about how to design your prompts so that the response is what you were indeed hoping to see.
The idea of using "unfancy" prompts is to minimize the noise in your prompt to reduce the possibility of the LLM misinterpreting the intent of the prompt. Below are a few guidelines on how to engineer "unfancy" prompts.
In this section, you'll cover the following best practices when engineering prompts:
- Be concise
- Be specific, and well-defined
- Ask one task at a time
- Improve response quality by including examples
- Turn generative tasks to classification tasks to improve safety
- Be concise
print(call_gemini(prompt))
- Be specific, and well-defined
print(call_gemini(prompt))
print(call_gemini(prompt))
print(call_gemini(prompt))
7. Watch out for hallucinations
Although LLMs have been trained on a large amount of data, they can generate text containing statements not grounded in truth or reality; these responses from the LLM are often referred to as "hallucinations" due to their limited memorization capabilities. Note that simply prompting the LLM to provide a citation isn't a fix to this problem, as there are instances of LLMs providing false or inaccurate citations. Dealing with hallucinations is a fundamental challenge of LLMs and an ongoing research area, so it is important to be cognizant that LLMs may seem to give you confident, correct-sounding statements that are in fact incorrect.
Note that if you intend to use LLMs for the creative use cases, hallucinating could actually be quite useful.
prompt = "What day is it today?"
print(call_gemini(prompt, generation_config))
8. Using system instructions to guardrail the model from irrelevant responses
How can we attempt to reduce the chances of irrelevant responses and hallucinations?
One way is to provide the LLM with system instructions.
Let's see how system instructions works and how you can use them to reduce hallucinations or irrelevant questions for a travel chatbot.
Suppose we ask a simple question about one of Italy's most famous tourist spots.
model_name="gemini-1.5-flash",
system_instruction=[
"Hello! You are an AI chatbot for a travel web site.",
"Your mission is to provide helpful queries for travelers.",
"Remember that before you answer a question, you must check to see if it complies with your mission.",
"If not, you can say, Sorry I can't answer that question.",
],
)
chat = model_travel.start_chat()
prompt = "What is the best place for sightseeing in Milan, Italy?"
print(send_message_gemini(chat, prompt))
print(send_message_gemini(chat, prompt))
9. Generative tasks lead to higher output variability
prompt = "I'm a high school student. Recommend me a programming activity to improve my skills."
print(call_gemini(prompt))
10. Classification tasks reduces output variability
a) learn Python
b) learn JavaScript
c) learn Fortran
"""
print(call_gemini(prompt))
11. Improve response quality by including examples
Another way to improve response quality is to add examples in your prompt. The LLM learns in-context from the examples on how to respond. Typically, one to five examples (shots) are enough to improve the quality of responses. Including too many examples can cause the model to over-fit the data and reduce the quality of responses.
Similar to classical model training, the quality and distribution of the examples is very important. Pick examples that are representative of the scenarios that you need the model to learn, and keep the distribution of the examples (e.g. number of examples per class in the case of classification) aligned with your actual distribution.
- Zero-shot prompt
Below is an example of zero-shot prompting, where you don't provide any examples to the LLM within the prompt itself.
Tweet: I loved the new YouTube video you made!
Sentiment:
"""
print(call_gemini(prompt))
- One-shot prompt
Below is an example of one-shot prompting, where you provide one example to the LLM within the prompt to give some guidance on what type of response you want.
Tweet: I loved the new YouTube video you made!
Sentiment: positive
Tweet: That was awful. Super boring ðŸ˜
Sentiment:
"""
print(call_gemini(prompt))
- Few-shot prompt
Tweet: I loved the new YouTube video you made!
Sentiment: positive
Tweet: That was awful. Super boring ðŸ˜
Sentiment: negative
Tweet: Something surprised me about this video - it was actually original. It was not the same old recycled stuff that I always see. Watch it - you will not regret it.
Sentiment:
"""
print(call_gemini(prompt))