How To Build A Knowledge Base AI Chat App

If you use services such as Notion for your knowledge base, you might be able to get built-in AI chat features so you can search and ask questions about your knowledge base using AI. However, that’s exclusive to Notion (or whatever app you’re using). If you want to have full control, or you’re a developer who wants to build an AI-based knowledge base product, let’s dive into how you can do that with Pinata.

We’re going to use public data for our chat app, and what better than Pinata’s own documentation. Fortunately, the documentation is hosted in a public Github repository with each page in the docs represented by a markdown file.

Getting Started

To start, you’ll need a free Pinata account. We’ll be making use of the Pinata Private Files API, which has a limit of 1GB stored and 500 files stored. We’ll make sure we keep our work under that, but if you build the next great knowledge base product using Pinata, you can easily upgrade to the Picnic Plan for $20 per month.

You will also need Node.js installed on your machine and a code editor.

We will be making use of a new feature from Pinata for this project: Pinata’s Vector Storage is an innovate take on vector storage and search. In most vector storage solutions, the file that is being vectorized is treated as a second-class citizen. This happens even when the file is critically necessary for AI prompting. With Pinata Vector Storage, we start with the file.

But I’m getting ahead of myself. What are vectors, and what is vector storage and retrieval?

At their core, vector embeddings are a way to represent complex things like words, images, or even user behavior as numbers (more precisely, as lists of numbers, called vectors). These numbers are designed so that similar things have similar numbers.

Imagine you're organizing a bookshelf:

Books about cats might go on the left.
Books about dogs near them.
Books about space on the far right.

You’re grouping books by similarity. In the world of vector embeddings, instead of shelves, we use numbers to represent how "close" or "far" two things are. For example:

A book about cats might be represented as [0.1, 0.3].
A book about dogs as [0.2, 0.4].
A book about space as [10.0, 20.0].

The closer the numbers, the more similar the concepts (cats and dogs are close; space is far away).

Vector search will find those closest matching numbers (vectors) to a give query, but there’s an extra step you have to take before you can use the result to prompt the AI model you’re using. You have to fetch the file associated with the vector embedding. This has traditionally been done by storing the file in one system, storing the vector in another, then referencing them using a third system or some sort of metadata.

Pinata changes this model by combining the systems. When you receive your match, you can receive the associated file immediately.

Ok, enough background. Let’s code.

Setting up the app

We’re going to be using Next.js for this application. Let’s start by creating a new project in Next. Fire up your terminal, and run the following command:

npx create-next-app ai-knowledge-base

Answer the prompts however you’d like, but we’ll be using App Router from Next.js.

Now, we’ll need to install the Pinata SDK. Pinata has two SDKs, one for private file storage and retrieval, and one for IPFS, a peer-to-peer network that Pinata has built on top of for a number of years. We’ll be using the private files API for this. Change into your project directory and install Pinata:

cd ai-knowledge-base && npm i pinata

Now, before we proceed, let’s grab an API key from Pinata. Sign in to your account, then navigate to the API Keys page. Create a new key, give it a name, then select Admin as your option. This key should not but used on the client side of your application. To help us ensure we protect it, we will add it as an environmental variable in a way that ensures it can only be accessed on the backend.

Grab the JWT result from your API key creation, then open your project in a text editor. Create a new file in the root of the project called .env. In that file add the following:

PINATA_JWT=YOUR_JWT_HERE

Paste your JWT in and save the file.

Ok, we have the basic starting point for the app. There’s a lot to do still, but this sets the stage. Let’s move on to getting the data from Pinata’s docs, uploading it to Pinata, and vectorizing it.

Seeding our knowledge base

The easiest way to seed our knowledge base is going to be through code. Pinata’s docs are public. You can read them here. However, we need it in a text-based format. To do that we will take two steps:

Clone the repository for the docs
Write a script in our project directory that crawls all the docs’ markdown files and uploads/vectorizes them

Since there are fewer than 500 files in the docs repository, we will be able to stay within the Pinata free plan limits.

Let’s first clone the docs repo. In a new terminal window, make sure you navigate outside of your ai-knowledge-base project directory. Then run:

git clone <https://github.com/PinataCloud/docs/>

We don’t need to do anything with that repo, but it does need to exist on our machine. So we’re all set. Now, let’s go ahead and create a scripts folder at the root of our project. Inside that folder, create a file called crawl_docs.js. And inside that file, add the following:

const fs = require("fs");
const path = require("path");

// Input directory (update with your source directory)
const sourceDir = "Absolute path to your pinata docs directory";

// Output directory
const outputDir = path.join(__dirname, "upload");

// Create output directory if it doesn't exist
if (!fs.existsSync(outputDir)) {
  fs.mkdirSync(outputDir);
}

// Helper function to recursively get all `.md` files in a directory
const getMarkdownFiles = (dir) => {
  let results = [];
  const files = fs.readdirSync(dir);

  for (const file of files) {
    const fullPath = path.join(dir, file);
    const stat = fs.statSync(fullPath);

    if (stat.isDirectory()) {
      results = results.concat(getMarkdownFiles(fullPath));
    } else if (path.extname(fullPath) === ".md" || path.extname(fullPath) === ".mdx") {
      results.push(fullPath);
    }
  }
  return results;
};

const combineMarkdownFiles = (files, outputDir, batchSize = 10) => {
  let batchContent = "";
  let batchCount = 0;

  for (let i = 0; i < files.length; i++) {
    const content = fs.readFileSync(files[i], "utf8");
    batchContent += content + "\\n\\n";

    if ((i + 1) % batchSize === 0 || i === files.length - 1) {
      const outputFilePath = path.join(outputDir, `batch_${batchCount + 1}.md`);
      fs.writeFileSync(outputFilePath, batchContent, "utf8");
      console.log(`Created: ${outputFilePath}`);
      batchContent = "";
      batchCount++;
    }
  }
};

const markdownFiles = getMarkdownFiles(sourceDir);

if (markdownFiles.length === 0) {
  console.log("No markdown files found.");
} else {
  console.log(`Found ${markdownFiles.length} markdown files.`);
  combineMarkdownFiles(markdownFiles, outputDir);
  console.log("Combining complete!");
}

Ok, let’s look at what this script is doing. We are using the absolute path to the Pinata docs repo directory on our computer. You can get the absolute path by going to the terminal window where you cloned the docs repo, changing into the repo directory (cd docs), and typing pwd.

The rest of the code is crawling through nested folders to find all the markdown files. After every time a markdown files is found, we’re combining the content into a single markdown folder and storing it in an upload directory.

This script takes things a step further in case you want to use this for a knowledge base that has a lot of files. You may want to combine some of those markdown files together, so there is a batching mechanism available in this script. Simply change the batchSize variable value in the combineMarkdownFiles function and the script will combine the number of markdown files you set with that variable into a single file.

Ok, now let’s go ahead and run this script. From your terminal, run the command:

node scripts/crawl_docs.js

The output will show the batching in progress, and when it’s complete, you should have an upload folder as a child of the scripts folder. It should have all of your markdown files in it.

Now, let’s create another script in our scripts folder called upload_and_vectorize.js. We’re going to use Pinata’s SDK here, so we’ll import that at the top, but before we do, let’s install one more dependency. Because this is a standalone script that we will run once, we don’t have access to the Next.js build system. So, for environment variables, we’ll need to install the dotenv package like this:

npm i dotenv

Then In that script file, add the following:

const fs = require('fs');
const { PinataSDK } = require("pinata");
const { Blob } = require("buffer");

require("dotenv").config();

const pinata = new PinataSDK({
    pinataJwt: process.env.PINATA_JWT,
    pinataGateway: process.env.GATEWAY_URL,
});

const loadDir = async () => {
    try {
        const dir = fs.readdirSync("./scripts/upload");
        console.log(dir);
        return dir;
    } catch (error) {
        console.error("Error reading directory:", error);
        throw error;
    }
};

const processFile = async (file) => {
    try {
        if (file.includes(".md") || file.includes(".txt")) {
            const blob = new Blob([fs.readFileSync(`./scripts/upload/${file}`)]);
            const fileData = new File([blob], file, { type: "text/plain" });
            await pinata.upload.file(fileData).group(process.env.GROUP_ID).vectorize();
            console.log("Vectorized file name:", file);
            fs.unlinkSync(`./scripts/upload/${file}`)
        }
    } catch (error) {
        console.error(`Error processing file ${file}:`, error);
        process.exit(1)
    }
};

(async () => {
    try {
        const pLimit = await import('p-limit').then(mod => mod.default);
        const files = await loadDir();
        const limit = pLimit(20); // Limit the concurrency to 20, adjust this as needed for your network and machine
        const fileProcessingPromises = files.map((file) => limit(() => processFile(file))); 
        await Promise.all(fileProcessingPromises);
        console.log("All files processed successfully.");
    } catch (error) {
        console.error("Error in main process:", error);
        process.exit(1);
    }
})();

To speed up the upload process in this script, we’re going to upload concurrently. To achieve this, we’re using a library called p-limit. Install that by running:

npm i p-limit

In addition to the concurrency part of this script, there are a couple of things we should immediately take a look at. First, when we initialize the Pinata SDK, we need the JWT (which we already stored in our environment variable file), and we need a gateway URL. We can get that URL by logging into the Pinata web app and click on the Gateways tab on the left. Just copy the gateway URL and enter as another environment variable like this:

PINATA_GATEWAY_URL=YOUR GATEWAY URL

Now, the second thing we should address is the GROUP_ID variable. Pinata, like other vector storage providers, stores vector embeddings in an index. With Pinata, we’ve simplified that process by using a paradigm you may already be used to if you’ve used Pinata before. Groups allow you to store files and keep similar files together by putting them in a group. We use that same model for vector indexes. So, if you have a bunch of vector data you want to be able to search, simple put it in the same group.

You can create a group with the API, but it’s pretty simple to log into the web app. Under the files section, click Groups on the left. Then, create a new group. When that group is created, click on it. You’ll see there are no files in it, and that’s expected. What we want right now is the GROUP_ID. This will be in the URL. You’ll see the URL looks like this:

https://app.devpinata.cloud/storage/groups/0193ac7e-9f0c-7a06-9792-a60e525d7395

The last part is the GROUP_ID. Copy that and enter in as the value as another environment variable like this:

GROUP_ID=YOUR GROUP ID

The rest of the script is a process of reading the files to upload from their directory, then using p-limit to speed up the uploads. We are using the GROUP_ID as part of the upload, and we are using the vectorize() command to ensure each file gets an associated vector embedding.

Go ahead and run this script with:

node scripts/upload_and_vectorize.js

When the process is complete, your files will have been uploaded and each one will have an associated vector embedding ready to be queried. Now, we’re ready to build the app!

Building the app

Our app will have a client and a backend. The backend will consist of a Next.js serverless function. We’ll start with the backend, then we’ll work on the interface.

In your project directory, create an api folder inside the app folder. Next, add a chat folder inside the api folder. And finally, add a file called router.ts inside the chat folder. Before we write the code for this API route, we’ll need to install one more dependency.

Our app can make use of any LLM (Large Language Model), especially if the LLM has an API compatible with the OpenAI API. For simplicity, we will just use the OpenAI API. You’ll need to have an OpenAI developer account, and you will need to get an OpenAI API key. Follow this guide to do so.

When you have your OpenAI API key, open your .env file and add it in like so:

OPENAI_API_KEY=YOUR_OPEN_AI_KEY

Now, let’s create a helper file that will make our lives easier when using the Pinata SDK. Create a pinata.ts file in the root of the app folder. Similar to how we set up our upload script, we’re going to be using the Pinata SDK. However, we are going to export it so that it’s available throughout our app.

Add the following to your pinata.ts file:

const { PinataSDK } = require("pinata");

export const pinata = new PinataSDK({
  pinataJwt:
    process.env.PINATA_JWT,
  pinataGateway: process.env.PINATA_GATEWAY_URL
});

export const GROUP_ID = process.env.GROUP_ID;

Now, we could use the OpenAI SDK, and if we wanted more flexibility, we would. But we’re focused on a chat app, and we want to be able to stream the AI model’s responses back to the user interface easily. So, we’re going to make use of an open source tool created by the Vercel team called, well, ai. Install that like this:

npm i @ai-sdk/openai ai

Let’s start writing our API route.

Inside your app/api/chat/route.ts file add the following:

import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
import { GROUP_ID, pinata } from "@/app/pinata";
import { NextResponse, NextRequest } from "next/server";

export async function POST(request: NextRequest) {
  try {
    const body = await request.json();
    const { messages } = body;

    const text = messages[messages.length - 1].content;

    const data = await pinata.files.queryVectors({
      groupId: GROUP_ID,
      query: text,
      returnFile: true,
    });

    const result = streamText({
        model: openai('gpt-4o'),
        system: `Please only use the following to help inform your response and do not make up additional information: ${data.data}`,
        messages,
      });
      return result.toTextStreamResponse({
        headers: {
          'Content-Type': 'text/event-stream',
        },
      });
  } catch (error) {
    console.log(error);
    return NextResponse.json({ message: "Server error" }, { status: 500 });
  }
}

Let’s take a look at what’s going on in this file. First, we’re importing the Pinata SDK and our Group ID. In our POST request, we parse the request body and extract the messages array. This array will be provided by the frontend and it will match the expected format for OpenAI chat completions.

We need to extract the user’s actual query from the messages array so we can use it in our vector search, so we do that by grabbing the content property of the last message in the array.

Next, we start working some magic. Using Pinata’s SDK, we can use the query from the user to search our vector embeddings and find the best match. Because Pinata has built vector storage and search to be file-first, we cut out a lot of boilerplate code and latency. With other services, you’d have to:

Vectorize the text input
Use that vector embedding to search against your vector database index
Take the result and find the best match
Get the metadata from that best match to find the file which will be stored somewhere else
Fetch the file
Use the file as part of the next steps

With Pinata’s vector storage, you just do the following:

Pass in the text input
Get the best matched file back
Use the file as part of the next steps

We are setting the returnFile property to true with our vector search, which means we’ll get the raw data associated with top match. We provide that data as part of our prompt context to get the best response from the AI model.

Finally, we use the ai library to help us stream the response from OpenAI back to the client. Now, before we start building the application interface, let’s test this. Fire up your app by running:

npm run dev

Then, from the terminal, let’s try a curl request with a text input like so:

curl --location '<http://localhost:3000/api/chat>' \\
--header 'Content-Type: application/json' \\
--data '{
    "messages": [
        {
            "role": "user",
            "content": "How can I upload a file to a group?"
        }
    ]
}'

You should see a response output in the terminal that answers your question. Pretty cool, right?

Time to build our UI. Open up the page.tsx file found in your app folder in the project directory. Replace everything in there with the following:

"use client";

import { useChat } from "ai/react";
import { useRef, useEffect } from "react";
import { User, Bot } from "lucide-react";

export default function Page() {
  const { messages, input, handleInputChange, handleSubmit, error } = useChat({
    streamProtocol: "text",
  });

  // Ref to track the last message for auto-scrolling
  const bottomRef = useRef(null);

  useEffect(() => {
    if (bottomRef.current) {
      bottomRef.current.scrollIntoView({ behavior: "smooth" });
    }
  }, [messages]);

  return (
    <div className="w-screen min-h-screen bg-green flex flex-col">
      <h1 className="text-center text-2xl font-bold p-4">Ask Pinnie</h1>
      
      {/* Scrollable Message Container */}
      <div className="bg-white rounded-md w-3/4 max-w-[1200px] m-auto flex-grow overflow-y-auto p-4 shadow-lg">
        {messages.map((m) => (
          <div key={m.id} className="whitespace-pre-wrap">
            {m.role === "user" ? (
              <div className="w-full flex justify-end my-2">
                <span className="w-3/4 bg-purple text-white rounded-md p-2 mr-2">
                  {m.content}
                </span>
                <User />
              </div>
            ) : (
              <div className="w-full flex justify-start my-2">
                <Bot />
                <span className="w-3/4 bg-gray-200 text-black rounded-md p-2 ml-2">
                  {m.content}
                </span>
              </div>
            )}
          </div>
        ))}

        {/* Dummy element for auto-scroll */}
        <div ref={bottomRef} />
      </div>

      {/* Input Bar */}
      <form
        onSubmit={handleSubmit}
        className="w-screen bg-white border-t p-4 shadow fixed bottom-0"
      >
        <input
          autoFocus
          className="w-full max-w-[1200px] p-2 border outline-none rounded shadow-lg m-auto block text-black"
          value={input}
          placeholder="Ask Pinnie something..."
          onChange={handleInputChange}
        />
      </form>
    </div>
  );
}

This single component is doing a lot of heavy lifting in fewer than 100 lines of code. For a little flare, we’re using icons from lucide-react, so you’ll need to install that library like this:

npm i lucide-react

We’re using the ai library on the frontend here as well. Specifically, we’re using built-in React hooks to handle the streaming response from our backend. We have some UI flare to make this look like a chat interface, and we also have an auto-scroll feature to scroll to the most recent message.

And that’s it. We have a chat app.

Go ahead and load http://localhost:3000 in your browser and take a look. You should see something like this:

Conclusion

In about 200 total lines of code (including our one-time scripts), we have built an AI chat app that allows us to ask questions about our knowledge base. In this case, that knowledge base is Pinata’s developer documentation.

You can see the live app here!

By using Pinata’s Vector Storage feature, we not only saved time (and lines of code), but we also sped up the process of prompting our AI model. By using file-first vector embeddings, we have the context the AI model needs immediately.

You can extend this to any knowledge base where you have access to the text-based files. You could also extend this further to save chat history (tip: this would be as simple as storing the messages array from the frontend as a JSON file using Pinata).

Now, go out and build the next killer AI app!

How To Build A Knowledge Base AI Chat App

Getting Started

Setting up the app

Seeding our knowledge base

Building the app

Conclusion

developers

Build a Voice-Powered Sharing App: Convert Text to Speech and Store on Pinata

announcements

Announcing the Pinata Vector Storage Public Beta

How To Build A Knowledge Base AI Chat App

Getting Started

Setting up the app

Seeding our knowledge base

Building the app

Conclusion

Stay up to date

Join our newsletter for the latest stories & product updates from the Pinata community.