Teaching an AI Dog New Tricks (The Gemini Integration)

8 November 2024

Gemini APIExpressSupabasePostgreSQLTypeScriptAIWooster

The Art of Prompt Engineering (Or: How I Learned to Stop Worrying and Trust the AI)

First attempt at prompting Gemini:

const prompt = "Plan a nice trip";

Result: "Have you considered going somewhere? Perhaps doing things when you get there? Maybe eating food?"

Right. Maybe I need to be a bit more specific. Turns out "plan a nice trip" is to Gemini what "fetch" is to a golden retriever - technically understood, but interpreted rather loosely.

Take Two: The First Real Attempt

My first serious attempt at structuring the prompts was... well, let's say Wooster had some creative interpretations:

type BasicTripRequest = {
  destination: string;
  duration: number;
  startDate: string;
};
 
const firstPrompt = `
Plan a ${tripRequest.duration} day trip to ${tripRequest.destination}
starting on ${tripRequest.startDate}.
 
Please include:
 
- Daily activities
- Locations
- Approximate costs
- Duration of activities
`;

Result: "Day 1 you could maybe explore downtown? Stay as long as you're having fun! Costs depend on how many treats you buy along the way. Some people spend a little, some spend a lot. Day 2 there's this AMAZING park the locals love..."

Not quite the structured response I was hoping for. Time to try JSON.

Adding More Structure (But Not Enough)

Here was my first JSON attempt:

const secondPrompt = `
  Generate a JSON itinerary for a ${tripRequest.duration}-day trip to ${tripRequest.destination}.
  Each day should have 2-3 activities.
 
  Format:
  {
    "days": [
      {
        "dayNumber": number,
        "activities": [
          {
            "name": string,
            "location": string,
            "description": string,
            "duration": string,
            "category": string,  // This caused problems later...
            "cost": string
          }
        ]
      }
    ]
  }
`;

This worked better, but the categorization was a mess. I got everything from "Fun stuff" to "Walking around looking at things" as categories. It was like asking Wooster to sort his toys - everything ended up in the "things I can put in my mouth" category.

Take this response for example:

{
  "days": [
    {
      "dayNumber": 1,
      "activities": [
        {
          "name": "Explore the city vibes",
          "location": "wherever the wind takes us",
          "description": "Just soak in the atmosphere, you know?",
          "duration": "as long as we're having fun",
          "category": "Places with good smells",
          "cost": "whatever's in the wallet"
        }
      ]
    }
  ]
}

Time to get serious about types.

The Final Evolution

After much trial and error (and Wooster suggesting "belly rubs" as a valid activity category), I landed on this much more structured approach:

export const createPrompt = (
  days: number,
  location: string,
  startDate: string,
): string => `
  Generate **ONLY** JSON data for a ${days}-day trip to ${location}, starting on ${startDate}.
  Have no more than three activities per day. Exclude arrival and departure logistics.
 
  Duration must be in format "X hours" or "X.5 hours".
  Price must be in format "$X" where X is a number.
 
  Activities MUST include:
  - "activityName": string
  - "description": string (20-50 words)
  - "location": string
  - "price": string
  - "duration": string
  - "difficulty": "Easy" | "Moderate" | "Challenging"
  - "category": "Adventure" | "Cultural" | "Nature" | "Food & Drink" | "Shopping" | "Entertainment"
  - "bestTime": "Early Morning" | "Morning" | "Afternoon" | "Evening" | "Night"
  - "bookingRequired": boolean
`;

And later, after realizing I needed geolocation for mapping of activities:

export const destinationPromptTemplate = (destination: string) => `
  Generate **ONLY** a valid JSON object for ${destination}.
  Include:
  - "latitude": number (decimal degrees)
  - "longitude": number (decimal degrees)
  - "destinationName": string
  - "country": string
  - "description": string (50-200 words)
  // ... other structured fields
`;

Finally, I was getting responses that looked like actual travel plans rather than Wooster's diary entries:

{
  "days": [
    {
      "dayNumber": 1,
      "activities": [
        {
          "activityName": "Morning Market Tour",
          "description": "Explore the historic Camden Market, featuring local artisans and food vendors. Perfect for gathering travel supplies (and snacks).",
          "location": "Camden Lock Place, London",
          "price": "$15",
          "duration": "2.5 hours",
          "difficulty": "Easy",
          "category": "Food & Drink",
          "bestTime": "Morning",
          "bookingRequired": false
        },
        {
          "activityName": "Regent's Park Walk",
          "description": "A scenic stroll through one of London's most beautiful parks. Watch for squirrels (Wooster insisted this was important).",
          "location": "Regent's Park, London",
          "price": "$0",
          "duration": "1.5 hours",
          "difficulty": "Easy",
          "category": "Nature",
          "bestTime": "Afternoon",
          "bookingRequired": false
        }
      ]
    }
  ]
}

Still with a hint of Wooster's personality, but now in a format that wouldn't make TypeScript cry.

Handling the Responses

Of course, getting the prompt right was only half the battle. Gemini's responses weren't always perfectly formatted JSON, so I needed some utility functions to clean things up:

export const cleanLLMJsonResponse = (text: string): string => {
  // Step 1: Remove markdown code blocks with any language specification
  const withoutCodeBlocks = text.replace(
    /```(?:json)?\s*([\s\S]*?)\s*```/g,
    "$1",
  );
 
  // Step 2: Remove potential comments
  const withoutComments = withoutCodeBlocks.replace(
    /\/\*[\s\S]*?\*\/|\/\/.*/g,
    "",
  );
 
  // Step 3: Detect and replace incomplete URLs with a placeholder
  const withCompleteUrls = withoutComments.replace(
    /"website":\s*"https:([^",}]*)/g,
    `"website": "https://example.com"`,
  );
 
  // Step 4: Trim whitespace
  return withCompleteUrls.trim();
};
 
// Validate the JSON structure
export const validateJSON = (jsonString: string): void => {
  try {
    const parsed = JSON.parse(jsonString);
    if (typeof parsed !== "object" || parsed === null) {
      throw new Error("Response is not a valid JSON object or array");
    }
  } catch (error) {
    console.error(
      "JSON Validation failed. Invalid JSON string:",
      jsonString.slice(0, 500),
    );
    throw new Error(
      `Invalid JSON response: ${error instanceof Error ? error.message : "Unknown error"}`,
    );
  }
};
 
// Clean up any non-printable characters
export const cleanJSON = (jsonString: string): string => {
  // Remove control characters (ASCII 0 to 31)
  const cleanedString = jsonString.replace(/[\x00-\x1F]/g, "");
 
  // Remove non-printable characters
  return cleanedString.replace(/[^\x20-\x7E]/g, "");
};

Why did I need all this? Well, Gemini had some... interesting habits:

Sometimes wrapping responses in markdown code blocks
Occasionally including helpful comments (which broke the JSON)
Returning incomplete URLs
Adding mysterious control characters
And my personal favorite: sneaking in emoji (those non-printable characters had to go)

Using these utilities together:

async function generateTripPlan(tripRequest: TripRequest) {
  try {
    const result = await model.generateContent(createPrompt(/* ... */));
    const text = result.response.text();
 
    // Clean up the response
    const cleanedResponse = cleanLLMJsonResponse(text);
    const sanitizedJSON = cleanJSON(cleanedResponse);
 
    // Validate before parsing
    validateJSON(sanitizedJSON);
 
    return JSON.parse(sanitizedJSON);
  } catch (error) {
    console.error("Failed to generate valid trip plan:", error);
    throw new Error("Failed to generate trip plan");
  }
}

What I Actually Learned

Start with strict types from the beginning
Be explicit about formats
The more specific the prompt constraints, the better
Always sanitize and validate AI responses
Geolocation data should have been there from the start
Rate limiting is important
Regex is your best friend when dealing with AI responses
Never trust an AI to format URLs correctly

Next up: the frontend implementation, or as I like to call it, "Making Wooster Look Presentable for Company" (and this time with proper TypeScript interfaces from day one).

Next up: wrestling with Express endpoints, where I discover that parsing AI responses is like teaching Wooster that "sit" and "lay down" are different commands.