Build a Voice-Powered Sharing App: Convert Text to Speech and Store on Pinata

Ever wished you could turn text into speech and share it effortlessly? Imagine an app where you type in some text, and within seconds, you have an audio file hosted on a decentralized platform, ready to share with the world. In this tutorial, we’re going to build just that: a web app that lets you:

Convert text into speech using the ElevenLabs API.
Store audio files on Pinata, leveraging decentralized storage with IPFS.
Generate a shareable URL for your audio file.

By the end of this guide, you’ll have a fully functional app that combines powerful technologies to create a practical solution for sharing audio content. This step-by-step tutorial will walk you through every part, making sure it’s straightforward and easy to follow.

Here’s what you’ll learn along the way:

Setting up a Next.js project with TypeScript.
Integrating the ElevenLabs Text-to-Speech API.
Uploading and managing files on Pinata.
Building a simple yet effective frontend interface.
Best practices for organizing and structuring your code.

Prerequisites

Before we get started, make sure you have the following:

Node.js and npm installed.
A Pinata account with your JWT (JSON Web Token).
An ElevenLabs API key.
Basic familiarity with React, Next.js, and TypeScript.
A working understanding of asynchronous JavaScript.

If you’re ready, let’s start building!

Project Setup

Step 1: Create a New Next.js Project

Open your terminal and run:

npx create-next-app@latest voice-powered-sharing --typescript
cd voice-powered-sharing

This command sets up a new Next.js project named voice-powered-sharing with TypeScript support.

Configuring Environment Variables

Sensitive information like API keys should be stored securely using environment variables.

Step 1: Create a `.env.local` File

In your project's root directory, create a .env.local file:

touch .env.local

Step 2: Add Your API Keys and Configuration

Open .env.local and add:

# Pinata Configuration
PINATA_JWT=your_pinata_jwt
NEXT_PUBLIC_GATEWAY_URL=your_pinata_gateway_url

# ElevenLabs API Key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

Replace the placeholders with your actual API keys and gateway URL.

PINATA_JWT: Your Pinata JWT for authentication.
NEXT_PUBLIC_GATEWAY_URL: Your public Pinata gateway URL.
ELEVENLABS_API_KEY: Your ElevenLabs API key.

Installing Dependencies

Install the necessary packages to interact with Pinata and ElevenLabs APIs.

Step 1: Install Required Packages

Run:

npm install pinata elevenlabs

Creating Helper Functions

Organize your code with helper functions for configuration and common tasks.

Step 1: Create a `utils` Directory and `config.ts` File

Create a utils directory and a config.ts file:

mkdir src/utils
touch src/utils/config.ts

Step 2: Configure Pinata SDK

In src/utils/config.ts, add:

// src/utils/config.ts

import { PinataSDK } from 'pinata';

export const pinata = new PinataSDK({
  pinataJwt: process.env.PINATA_JWT!,
  pinataGateway: process.env.NEXT_PUBLIC_GATEWAY_URL!,
});

export const getFileUrl = (cid: string): string => {
  return `https://${process.env.NEXT_PUBLIC_GATEWAY_URL}/files/${cid}`;
};

Explanation:

Initializes the Pinata SDK with your JWT and gateway URL.
Provides a function getFileUrl to generate the full URL for a file using its CID.

Building the Backend API

Create API routes to handle text-to-speech conversion and file uploading.

Step 1: Set Up the API Route

Create the API route file:

mkdir -p src/pages/api/public-text2speech
touch src/pages/api/public-text2speech/index.ts

Step 2: Implement the Backend Logic

Open src/pages/api/public-text2speech/route.ts and add:

// src/pages/api/public-text2speech/route.ts

import { NextApiRequest, NextApiResponse } from 'next';
import { pinata, getFileUrl } from '@/utils/config';
import { ElevenLabsClient } from 'elevenlabs';

const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY!;
const client = new ElevenLabsClient({
  apiKey: ELEVENLABS_API_KEY,
});

// Helper functions will be added here...

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
  // Handle POST and GET requests
}

Step 2.1: Create Helper Function to Generate Audio

Add:

async function createAudioFromText(text: string, voiceId: string = 'Rachel') {
  try {
    const audio = await client.generate({
      voice: voiceId,
      model_id: 'eleven_multilingual_v2',
      text,
    });

    const chunks: Buffer[] = [];
    for await (const chunk of audio) {
      chunks.push(chunk);
    }
    return Buffer.concat(chunks);
  } catch (error) {
    throw new Error('Audio generation failed');
  }
}

Explanation:

Generates audio from text using the ElevenLabs API.
Collects audio chunks and concatenates them into a single Buffer.

Step 2.2: Create Function to Handle File Upload

Add:

async function uploadAudioToPinata(audioContent: Buffer, groupName: string, fileName: string) {
  const audioFile = new File([audioContent], fileName, {
    type: 'audio/mpeg',
  });

  const groupId = await getOrCreatePublicGroup(groupName);

  const uploadResponse = await pinata.upload.file(audioFile).group(groupId);
  return getFileUrl(uploadResponse.cid);
}

async function getOrCreatePublicGroup(groupName: string) {
  const groups = await pinata.groups.list().isPublic(true);
  const existingGroup = groups.groups.find(
    (group) => group.name === groupName && group.is_public
  );

  if (existingGroup) {
    return existingGroup.id;
  }

  const newGroup = await pinata.groups.create({
    name: groupName,
    isPublic: true,
  });
  return newGroup.id;
}

Explanation:

uploadAudioToPinata handles uploading the audio file to Pinata and returns the file URL.
getOrCreatePublicGroup ensures that the specified public group exists on Pinata.

Step 2.3: Implement the API Handler

Add:

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
  const { text, groupName = 'Public Files', voiceId = 'Rachel' } =
    req.method === 'POST' ? req.body : req.query;

  if (!text || typeof text !== 'string' || text.trim() === '') {
    return res.status(400).json({ error: "'text' is required and must be a non-empty string." });
  }

  try {
    const audioContent = await createAudioFromText(text, voiceId);
    const fileName = `generated-audio-${Date.now()}.mp3`;
    const fileUrl = await uploadAudioToPinata(audioContent, groupName, fileName);

    res.status(200).json({ fileUrl });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
}

Explanation:

Extracts text, groupName, and voiceId from the request.
Validates the text input.
Generates audio and uploads it to Pinata.
Returns the file URL in the response.

Developing the Frontend Interface

Build a user-friendly interface for users to input text and receive the audio URL.

Step 1: Create the Frontend Page

Create src/page.tsx

Step 2: Build the Frontend Component

Open src/page.tsx and add:

// src/page.tsx

import { useState } from 'react';

export default function Text2SpeechPage() {
  const [text, setText] = useState('');
  const [groupName, setGroupName] = useState('');
  const [voiceId, setVoiceId] = useState('Rachel');
  const [audioUrl, setAudioUrl] = useState('');
  const [loading, setLoading] = useState(false);

  // Step 2.1: Handle Form Submission
  const handleSubmit = async (event: React.FormEvent) => {
    event.preventDefault();

    if (!text.trim()) {
      alert('Please enter some text.');
      return;
    }

    setLoading(true);

    try {
      const response = await fetch('/api/public-text2speech', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ text, groupName, voiceId }),
      });

      if (!response.ok) {
        throw new Error('Failed to generate audio');
      }

      const data = await response.json();
      setAudioUrl(data.fileUrl);
    } catch (error) {
      alert(error.message);
    }

    setLoading(false);
  };

  // Step 2.2: Render the Form and Result
  return (
    <div style={styles.container}>
      <h1 style={styles.title}>Convert Text to Speech and Share</h1>

      <form onSubmit={handleSubmit} style={styles.form}>
        {/* Text Input */}
        <div style={styles.field}>
          <label>Text:</label>
          <textarea
            value={text}
            onChange={(e) => setText(e.target.value)}
            required
            rows={4}
            style={styles.textarea}
          />
        </div>

        {/* Group Name Input */}
        <div style={styles.field}>
          <label>Group Name (Optional):</label>
          <input
            type="text"
            value={groupName}
            onChange={(e) => setGroupName(e.target.value)}
            style={styles.input}
          />
        </div>

        {/* Voice Selection */}
        <div style={styles.field}>
          <label>Voice:</label>
          <select value={voiceId} onChange={(e) => setVoiceId(e.target.value)} style={styles.select}>
            <option value="Rachel">Rachel</option>
            <option value="Bill">Bill</option>
            <option value="Charlie">Charlie</option>
            <option value="Charlotte">Charlotte</option>
          </select>
        </div>

        {/* Submit Button */}
        <button type="submit" disabled={loading} style={styles.button}>
          {loading ? 'Generating...' : 'Submit'}
        </button>
      </form>

      {/* Display the Result */}
      {audioUrl && (
        <div style={styles.result}>
          <h2>Generated Audio:</h2>
          <audio controls src={audioUrl} style={styles.audio}></audio>
          <p>
            <a href={audioUrl} download>
              Download Audio
            </a>
          </p>
        </div>
      )}
    </div>
  );
}

// Step 2.3: Define Styles
const styles: { [key: string]: React.CSSProperties } = {
  container: {
    maxWidth: '600px',
    margin: 'auto',
    padding: '2rem',
    fontFamily: 'Arial, sans-serif',
  },
  title: {
    textAlign: 'center',
    marginBottom: '2rem',
  },
  form: {
    marginBottom: '2rem',
  },
  field: {
    marginBottom: '1rem',
  },
  textarea: {
    width: '100%',
    padding: '0.5rem',
    fontSize: '1rem',
  },
  input: {
    width: '100%',
    padding: '0.5rem',
    fontSize: '1rem',
  },
  select: {
    width: '100%',
    padding: '0.5rem',
    fontSize: '1rem',
  },
  button: {
    width: '100%',
    padding: '0.75rem',
    backgroundColor: '#0070f3',
    color: '#fff',
    border: 'none',
    fontSize: '1rem',
    cursor: 'pointer',
  },
  result: {
    textAlign: 'center',
  },
  audio: {
    width: '100%',
  },
};

Step-by-Step Explanation:

State Management: Uses useState to manage form inputs and loading state.
Form Submission (handleSubmit):
- Prevents default form behavior.
- Validates that text is not empty.
- Sends a POST request to the backend API.
- Handles the response and updates the audioUrl state.
Rendering the Form:
- Text Input: A textarea for users to input the text to convert.
- Group Name Input: An optional input for the group name.
- Voice Selection: A dropdown to select the voice.
- Submit Button: A button to submit the form; displays 'Generating...' when loading.
Displaying the Result:
- If audioUrl is available, displays an audio player and a download link.
Styling:
- Uses inline styles defined in the styles object for simplicity.

Testing the Application

Step 1: Run the Development Server

Start your app:

npm run dev

Step 2: Access the Application

Navigate to:

<http://localhost:3000/text2speech>

Step 3: Test the Functionality

Enter Text: Type the text you want to convert to speech.
Select Voice: Choose a voice from the dropdown menu.
Group Name: Optionally, enter a group name.
Submit: Click the "Submit" button.

Step 4: Verify the Result

Wait for the audio to be generated.
An audio player should appear with your generated audio.
You can play it directly or download it using the provided link.

Congratulations! You’ve successfully built a web app that converts text to speech, stores the generated audio on Pinata, and provides a shareable link—all while leveraging decentralized storage with IPFS. This project demonstrates how modern technologies can come together to solve real-world problems, offering both practicality and innovation.

By following this step-by-step guide, you’ve learned how to:

Set up a Next.js project with TypeScript.
Integrate the ElevenLabs Text-to-Speech API.
Upload and manage files on Pinata using IPFS.
Build a user-friendly frontend interface.
Implement best practices for code organization and scalability.

Optional Next Steps

Ready to take your project to the next level? Here are a few ideas for enhancements:

Error Handling: Improve feedback to the user in case of errors or invalid inputs.
Voice Customization: Add more voice options or allow users to adjust speech parameters.
User Authentication: Implement authentication to manage private files and user-specific groups.
UI/UX Improvements: Enhance the interface with better styling or responsive design.
Deployment: Deploy your application to a hosting platform like Vercel or Netlify.

Conclusion

You now have a working application and, more importantly, the tools and knowledge to build on it. Ready to try it for yourself? Sign up for Pinata today and start building your next decentralized app!

Build a Voice-Powered Sharing App: Convert Text to Speech and Store on Pinata

Prerequisites

Project Setup

Step 1: Create a New Next.js Project

Configuring Environment Variables

Step 1: Create a `.env.local` File

Step 2: Add Your API Keys and Configuration

Installing Dependencies

Step 1: Install Required Packages

Creating Helper Functions

Step 1: Create a `utils` Directory and `config.ts` File

Step 2: Configure Pinata SDK

Building the Backend API

Step 1: Set Up the API Route

Step 2: Implement the Backend Logic

Step 2.1: Create Helper Function to Generate Audio

Step 2.2: Create Function to Handle File Upload

Step 2.3: Implement the API Handler

Developing the Frontend Interface

Step 1: Create the Frontend Page

Step 2: Build the Frontend Component

Testing the Application

Step 1: Run the Development Server

Step 2: Access the Application

Step 3: Test the Functionality

Step 4: Verify the Result

Optional Next Steps

Conclusion

web3

Making Private NFTs

tutorials

How To Build A Knowledge Base AI Chat App

Build a Voice-Powered Sharing App: Convert Text to Speech and Store on Pinata

Prerequisites

Project Setup

Step 1: Create a New Next.js Project

Configuring Environment Variables

Step 1: Create a .env.local File

Step 2: Add Your API Keys and Configuration

Installing Dependencies

Step 1: Install Required Packages

Creating Helper Functions

Step 1: Create a utils Directory and config.ts File

Step 2: Configure Pinata SDK

Building the Backend API

Step 1: Set Up the API Route

Step 2: Implement the Backend Logic

Step 2.1: Create Helper Function to Generate Audio

Step 2.2: Create Function to Handle File Upload

Step 2.3: Implement the API Handler

Developing the Frontend Interface

Step 1: Create the Frontend Page

Step 2: Build the Frontend Component

Testing the Application

Step 1: Run the Development Server

Step 2: Access the Application

Step 3: Test the Functionality

Step 4: Verify the Result

Optional Next Steps

Conclusion

Stay up to date

Join our newsletter for the latest stories & product updates from the Pinata community.

Step 1: Create a `.env.local` File

Step 1: Create a `utils` Directory and `config.ts` File