Building a Real-Time AI Chatbot w/ Model Training using HarperDB, Pinecone, Next.js, LangChain & Vercel

In this post, I talk about how I built an AI Chatbot with Next.js, which updates it’s model on the cloud behind the scenes using LangChain, and serves the latest response(s) using Pinecone memory. HarperDB helped me persist OpenAI responses in a NoSQL database to cut down on costs of OpenAI, and enforce rate limiting right at the edge with Next.js Middleware on Vercel.

Tech Stack

Next.js (Front-end and Back-end)
LangChain (framework for developing applications powered by language models)
Pinecone (for persisting trained indexes on Cloud)
HarperDB (Caching OpenAI Responses & Rate Limiting)
Tailwind CSS (Styling)
Vercel (Deployment)

Prerequisites

A HarperDB account (for setting up NoSQL database)
An OpenAI account (for OpenAI API Key)
A Pinecone account (for persisting/saving trained indexes)
A Vercel account (for deploying your website)

Setting up the project

To set up, just clone the app repo and follow this tutorial to learn everything that's in it. To fork the project, run:

git clone https://github.com/rishi-raj-jain/pinecone-langchain-harperdb-chatbot
cd pinecone-langchain-harperdb-chatbot
npm install

Once you have cloned the repo, you are going to create a .env file. You are going to add the values we obtain from the steps below.

Setting up HarperDB

Let’s start by creating our database instance. Sign in to Harper Studio and click on Create New HarperDB Cloud Instance
‍

‍

Fill in the database instance information, like here, we’ve added chatbot as the instance name with a username and password.

‍

Go with the default instance setup for RAM and Storage Size while optimizing the Instance Region to be as close to your serverless functions region in Vercel.

After some time, you’d see the instance (here, chatbot) ready to have databases and it’s tables. The dashboard would look something like as below:

‍

Let’s start by creating a database (here, cache_and_ratelimit) inside which we’ll spin our storage table, make sure to click the check icon to successfully spin up the database.

Let’s start by creating a table (here, all) with a hashing key (here, hash) which will be the named primary key of the table. Make sure to click the check icon to successfully spin up the table.

Once done,

Open lib/harper.js and update the database and table values per the names given above
Click on config at the top right corner in the dashboard, and:

Copy the Instance URL and save it as HARPER_DB_URL in your .env file
Copy the Instance API Auth Header and save it as HARPER_AUTH_TOKEN in your .env file

Awesome, you’re good to go. This is how the data looks for a record of rate limiting and a cached response.

Setting up Pinecone

Let’s start by creating our index instance. Sign in to Pinecone and during onboarding select Chatbot Application as the use case.

Let’s proceed with Creating an Index by clicking on Create Index:

‍

Once done, give it a name (here, chatbot) and update the PINECONE_INDEX variable in .env file. Also, copy the environment name (here, gcp-starter) and update the PINECONE_ENVIRONMENT variable in .env file.

The final step is to head to API Keys in the Pinecone dashboard, copy the value and update the PINECONE_API_KEY variable in .env file.

Nice, the whole setup is ready. Let’s dive into the code!

Configuring NoSQL CRUD helpers for HarperDB for Vercel Edge and Middleware Compatibility

To interact with the HarperDB database, we’ll use NoSQL HarperDB REST APIs called over fetch. This approach will help us opt out of any specific runtime requirements, and keep things simple and ready to deploy to Vercel Edge and Middleware.
‍

In the code below, we’ve defined the CRUD helpers, namely insert, update, deleteRecords and searchByValue for respective actions.

// Define a fetch function that takes care of common setup
// Such as the DB_URL and passing header token
const harperFetch = (body) =>
  fetch(process.env.HARPER_DB_URL, {
    method: 'POST',
    body: JSON.stringify({
      ...body,
      database: 'cache_and_ratelimit',
      table: 'all'
    }),
    headers: {
      'Content-Type': 'application/json',
      Authorization: 'Basic ' + process.env.HARPER_AUTH_TOKEN,
    },
})

// To insert records to the table, we supply records to be added as an array
// https://docs.harperdb.io/docs/developers/operations-api/nosql-operations#insert
export const insert = async (records = []) => {
  const t = await harperFetch({
    records,
    operation: 'insert',
  })
  if (!t.ok) return {}
  return await t.json()
}

// To update records in the table, we supply records
// to be updated as an array with the hash value
// https://docs.harperdb.io/docs/developers/operations-api/nosql-operations#update
export const update = async (records = []) => {
  await harperFetch({
    records,
    operation: 'update',
}) }
// To delete records from the table,
// we supply hash ids to be deleted
// https://docs.harperdb.io/docs/developers/operations-api/nosql-operations#delete
export const deleteRecords = async (ids = []) => {
  await harperFetch({
    ids,
    operation: 'delete',
  })
}

// To search records, we supply
// a given value for a given attribute
// https://docs.harperdb.io/docs/developers/operations-api/nosql-operations#search-by-value
export const searchByValue = async (search_value, search_attribute = 'id', get_attributes = ['*']) => {
  const t = await harperFetch({
    search_value,
    get_attributes,
    search_attribute,
    operation: 'search_by_value',
  })
  if (!t.ok) return []
  return await t.json()
}

Rate Limiting Requests with HarperDB and Next.js Middleware

To ensure reliability, and as less spam as possible, we’ve implemented Rate Limiting with HarperDB at Next.js Middleware. We obtain the x-forwarded-for header in the request which is the user’s IP Address, and use that as the unique key to rate limit users with.
‍

If the rate limit exceeds, we return Rate Limit Exceeded response directly from the middleware, saving our time on running the edge function for chat API.
‍

// File: middleware.js

import rateLimit from './lib/ratelimit'
import { NextResponse } from 'next/server'

export async function middleware(request) {
  // If method is not POST, return with `Forbidden Access`
  if (request.method !== 'POST') return new NextResponse('Bad Request.', { status: 400 })
  
   // Clone the request headers and read the ip
  const requestHeaders = new Headers(request.headers)
  const ip = requestHeaders.get('x-forwarded-for')
  if (ip) {
    // Check the Rate Limit
    const success = await rateLimit(ip.split(',')[0])
    // If within rate limit, send to the function logic
    if (success) return NextResponse.next()
    // If exceeded, return with a 401
    else return new NextResponse('Rate Limit Exceeded.', { status: 401 })
}
  return NextResponse.next()
}

export const config = {
  matcher: ['/api/chat', '/api/model'],
}

The logical flow of the rateLimit function is as follows:

It searches for records by the IP Address value in HarperDB table
If no record is found, the user is not rate limited, and a record with number of uses as 1 is set in HarperDB table

If a record is found:

The difference between the last use time to the time now is calculated and if it exceeds the time span permitted, it updates the record with number of uses as 1 in the HarperDB table
The number of uses derived from the record if found less the maximum number of uses allowed, it increments the uses in the HarperDB table with the latest timestamp
Else, the request is Rate Limited!


import { searchByValue, update, insert } from './harper'

const rateLimitConfig = { maxUpdates: 10, timeSpan: 86400 }

const rateLimit = async (key) => {
  // Check the Rate Limit
  const t = await searchByValue(key, 'id', ['count_updates', 'last_update', 'hash'])
  if (t && t[0]) {
    // Get rate limit data
    const { count_updates, last_update, hash } = t[0]
    // Calculate time difference in seconds
    const currentTime = new Date().getTime()
    const timeDifference = Math.floor((currentTime - new Date(last_update).getTime()) / 1000)
    // If time window has passed, reset count
    if (timeDifference >= rateLimitConfig.timeSpan) {
      await update([{ id: key, count_updates: 1, last_update: currentTime, hash }])
return true }
    // Check if the request count is below the limit
    if (Number(count_updates) < rateLimitConfig.maxUpdates) {
      // Assuming the limit is 5 requests in 24 hours
      await update([{ id: key, count_updates: Number(count_updates) + 1, last_update: new Date().getTime(), hash }])
      return true
    } else {
      return false
}
} else {
    await insert([{ id: key, count_updates: 1, last_update: new Date().getTime() }])
return true }
}

export default rateLimit

Retrieve Persisted Vector Index from Pinecone and Caching Personalized Responses from OpenAI with HarperDB

In this section, we explore how the vector store is retrieved from Pinecone, and OpenAI API is used to serve responses while caching them with HarperDB.

Retrieval of Vector Store from Pinecone

To load the vector store from Pinecone, in each chat API request, we create a new instance of Pinecone database class, and wait for the Vector Store class instance to be derived for our existing Pinecone Index (here, chatbot)


// File: app/api/chat/route.js

// Reference Function to loadVectorStore
import { loadVectorStore } from '@/lib/vectorStore'
// Load the trained model
const vectorStore = await loadVectorStore()

// ...

// Vectore Store Function
// File: lib/vectorStore.js

import { Pinecone } from '@pinecone-database/pinecone'
import { OpenAIEmbeddings } from 'langchain/embeddings/openai'
import { PineconeStore } from 'langchain/vectorstores/pinecone'

export async function loadVectorStore() {
  const pinecone = new Pinecone()
  const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX)
  return await PineconeStore.fromExistingIndex(new OpenAIEmbeddings(), { pineconeIndex })
}

Lazily Streaming Responses from OpenAI API

To make sure that we’re not calling OpenAI APIs for the same set of questions repeatedly, we maintain the flow for obtaining responses as follows:

If a record pertaining to question asked from the chatbot is found by searching with the id value in HarperDB, we return answer key’s value from our existing record stored
In case no existing record for the question is found, using Vercel Streaming we send each chunk of response from OpenAI API as soon as possible, while when the response is completely sent, we insert a record to cache the response in our HarperDB table. Notice that we use the isChat attribute so that we can clean this record after the model is updated in the model training POST request.

// File: app/api/chat/route.js

import { insert, searchByValue } from '@/lib/harper'
import { ChatOpenAI } from 'langchain/chat_models/openai'

// Load the trained model
// ...

// LookUp for response in HarperDB
const cachedResponse = await searchByValue(input, 'id', ['answer'])

// If cached response found, return as is
if (cachedResponse[0] && cachedResponse[0]['answer']) return new Response(cachedResponse[0]['answer'])

const model = new ChatOpenAI({
  streaming: true,
  callbacks: [
    {
      handleLLMNewToken(token) {
        controller.enqueue(encoder.encode(token))
      },
      async handleLLMEnd(output) {
        // Once the response is sent, cache it in HarperDB
        await insert([{ id: input, answer: output.generations[0][0].text, isChat: true }])
        controller.close()
}, },
], })

// Create a LLM QA Chain and respond
// ...

Training Content with LangChain and Persisting Vector Index in Pinecone for retrieval during ChatBot Conversations

With Pinecone, we’re able to save the latest indexed vector store into the cloud. This allows us to send out responses to the user that are based on the latest and relevant knowledge of the model. Let’s dive into how one can train their model on set of URLs passed in the POST request to /api/model.

In the code (for app/api/model/route.js), we’re ensuring:

The functions runs on Vercel Edge, made possible with export const runtime = 'edge'
The response is always dynamic, made possible with export const dynamic = 'force-dynamic'
Waits for the train function to finish, which is invoked with the URLs list that came in with the request. Inside the train function (file lib/train.js in the project), it takes care of fetching each URL’s content, break it into LangChain compatible documents, and update the Pinecone Index with the generated documents.
As soon as training is done, it clears out the cached conversation responses and queries in HarperDB. This is made possible by searching all records with the value of isChat key as true, and deleting all the records by passing on the primary key (here, hash) from HarperDB. This approach allows us to cache the new responses that will be generated based on the updated knowledge of the model.

// File: app/api/model/route.js

export const runtime = 'edge'

export const dynamic = 'force-dynamic'

import train from '@/lib/train'
import { deleteRecords, searchByValue } from '@/lib/harper'

export async function POST(req) {
  // If `urls` is not in body, return with `Bad Request`
  const { urls } = await req.json()
  if (!urls) return new Response('Bad Request.', { status: 400 })
  // Train on the particular URLs
  await train(urls)
  // Get all the cached responses ID
  const t = await searchByValue(true, 'isChat', ['hash'])
  // Once trained, delete all the cached responses
  await deleteRecords(t.map((i) => i.hash))
  return new Response(null, { status: 200 })
}

‍

By now, you’ve learnt how to cache the responses from OpenAI API and rate limit users using HarperDB. We’ve also learnt how to train the model to have the latest knowledge and save the updated vector store using Pinecone.

Deploy to Vercel

The repository is ready to deploy to Vercel. Follow the steps below to deploy seamlessly with Vercel 👇🏻

Create a GitHub Repository with the app code
Create a New Project in Vercel Dashboard
Link the created GitHub Repository as your new project
Scroll down and update the Environment Variables from the .env locally
Deploy! 🚀

‍

References

GitHub Repo: https://github.com/rishi-raj-jain/pinecone-langchain-harperdb-chatbot

‍LangChain Docs: https://js.langchain.com/

‍HarperDB NoSQL Operations: https://docs.harperdb.io/docs/developers/operations-api/nosql-operations

‍Pinecone Vector Index: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/pinecone

Dev Center

Our Story