Back to blog
Build a Voice-Powered Sharing App: Convert Text to Speech and Store on Pinata
Ever wished you could turn text into speech and share it effortlessly? Imagine an app where you type in some text, and within seconds, you have an audio file hosted on a decentralized platform, ready to share with the world. In this tutorial, we’re going to build just that: a web app that lets you:
- Convert text into speech using the ElevenLabs API.
- Store audio files on Pinata, leveraging decentralized storage with IPFS.
- Generate a shareable URL for your audio file.
By the end of this guide, you’ll have a fully functional app that combines powerful technologies to create a practical solution for sharing audio content. This step-by-step tutorial will walk you through every part, making sure it’s straightforward and easy to follow.
Here’s what you’ll learn along the way:
- Setting up a Next.js project with TypeScript.
- Integrating the ElevenLabs Text-to-Speech API.
- Uploading and managing files on Pinata.
- Building a simple yet effective frontend interface.
- Best practices for organizing and structuring your code.
Prerequisites
Before we get started, make sure you have the following:
- Node.js and npm installed.
- A Pinata account with your JWT (JSON Web Token).
- An ElevenLabs API key.
- Basic familiarity with React, Next.js, and TypeScript.
- A working understanding of asynchronous JavaScript.
If you’re ready, let’s start building!
Project Setup
Step 1: Create a New Next.js Project
Open your terminal and run:
npx create-next-app@latest voice-powered-sharing --typescript
cd voice-powered-sharing
This command sets up a new Next.js project named voice-powered-sharing
with TypeScript support.
Configuring Environment Variables
Sensitive information like API keys should be stored securely using environment variables.
Step 1: Create a .env.local
File
In your project's root directory, create a .env.local
file:
touch .env.local
Step 2: Add Your API Keys and Configuration
Open .env.local
and add:
# Pinata Configuration
PINATA_JWT=your_pinata_jwt
NEXT_PUBLIC_GATEWAY_URL=your_pinata_gateway_url
# ElevenLabs API Key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
Replace the placeholders with your actual API keys and gateway URL.
PINATA_JWT
: Your Pinata JWT for authentication.NEXT_PUBLIC_GATEWAY_URL
: Your public Pinata gateway URL.ELEVENLABS_API_KEY
: Your ElevenLabs API key.
Installing Dependencies
Install the necessary packages to interact with Pinata and ElevenLabs APIs.
Step 1: Install Required Packages
Run:
npm install pinata elevenlabs
Creating Helper Functions
Organize your code with helper functions for configuration and common tasks.
Step 1: Create a utils
Directory and config.ts
File
Create a utils
directory and a config.ts
file:
mkdir src/utils
touch src/utils/config.ts
Step 2: Configure Pinata SDK
In src/utils/config.ts
, add:
// src/utils/config.ts
import { PinataSDK } from 'pinata';
export const pinata = new PinataSDK({
pinataJwt: process.env.PINATA_JWT!,
pinataGateway: process.env.NEXT_PUBLIC_GATEWAY_URL!,
});
export const getFileUrl = (cid: string): string => {
return `https://${process.env.NEXT_PUBLIC_GATEWAY_URL}/files/${cid}`;
};
Explanation:
- Initializes the Pinata SDK with your JWT and gateway URL.
- Provides a function
getFileUrl
to generate the full URL for a file using its CID.
Building the Backend API
Create API routes to handle text-to-speech conversion and file uploading.
Step 1: Set Up the API Route
Create the API route file:
mkdir -p src/pages/api/public-text2speech
touch src/pages/api/public-text2speech/index.ts
Step 2: Implement the Backend Logic
Open src/pages/api/public-text2speech/route.ts
and add:
// src/pages/api/public-text2speech/route.ts
import { NextApiRequest, NextApiResponse } from 'next';
import { pinata, getFileUrl } from '@/utils/config';
import { ElevenLabsClient } from 'elevenlabs';
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY!;
const client = new ElevenLabsClient({
apiKey: ELEVENLABS_API_KEY,
});
// Helper functions will be added here...
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
// Handle POST and GET requests
}
Step 2.1: Create Helper Function to Generate Audio
Add:
async function createAudioFromText(text: string, voiceId: string = 'Rachel') {
try {
const audio = await client.generate({
voice: voiceId,
model_id: 'eleven_multilingual_v2',
text,
});
const chunks: Buffer[] = [];
for await (const chunk of audio) {
chunks.push(chunk);
}
return Buffer.concat(chunks);
} catch (error) {
throw new Error('Audio generation failed');
}
}
Explanation:
- Generates audio from text using the ElevenLabs API.
- Collects audio chunks and concatenates them into a single
Buffer
.
Step 2.2: Create Function to Handle File Upload
Add:
async function uploadAudioToPinata(audioContent: Buffer, groupName: string, fileName: string) {
const audioFile = new File([audioContent], fileName, {
type: 'audio/mpeg',
});
const groupId = await getOrCreatePublicGroup(groupName);
const uploadResponse = await pinata.upload.file(audioFile).group(groupId);
return getFileUrl(uploadResponse.cid);
}
async function getOrCreatePublicGroup(groupName: string) {
const groups = await pinata.groups.list().isPublic(true);
const existingGroup = groups.groups.find(
(group) => group.name === groupName && group.is_public
);
if (existingGroup) {
return existingGroup.id;
}
const newGroup = await pinata.groups.create({
name: groupName,
isPublic: true,
});
return newGroup.id;
}
Explanation:
uploadAudioToPinata
handles uploading the audio file to Pinata and returns the file URL.getOrCreatePublicGroup
ensures that the specified public group exists on Pinata.
Step 2.3: Implement the API Handler
Add:
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
const { text, groupName = 'Public Files', voiceId = 'Rachel' } =
req.method === 'POST' ? req.body : req.query;
if (!text || typeof text !== 'string' || text.trim() === '') {
return res.status(400).json({ error: "'text' is required and must be a non-empty string." });
}
try {
const audioContent = await createAudioFromText(text, voiceId);
const fileName = `generated-audio-${Date.now()}.mp3`;
const fileUrl = await uploadAudioToPinata(audioContent, groupName, fileName);
res.status(200).json({ fileUrl });
} catch (error) {
res.status(500).json({ error: error.message });
}
}
Explanation:
- Extracts
text
,groupName
, andvoiceId
from the request. - Validates the
text
input. - Generates audio and uploads it to Pinata.
- Returns the file URL in the response.
Developing the Frontend Interface
Build a user-friendly interface for users to input text and receive the audio URL.
Step 1: Create the Frontend Page
Create src/page.tsx
Step 2: Build the Frontend Component
Open src/page.tsx
and add:
// src/page.tsx
import { useState } from 'react';
export default function Text2SpeechPage() {
const [text, setText] = useState('');
const [groupName, setGroupName] = useState('');
const [voiceId, setVoiceId] = useState('Rachel');
const [audioUrl, setAudioUrl] = useState('');
const [loading, setLoading] = useState(false);
// Step 2.1: Handle Form Submission
const handleSubmit = async (event: React.FormEvent) => {
event.preventDefault();
if (!text.trim()) {
alert('Please enter some text.');
return;
}
setLoading(true);
try {
const response = await fetch('/api/public-text2speech', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text, groupName, voiceId }),
});
if (!response.ok) {
throw new Error('Failed to generate audio');
}
const data = await response.json();
setAudioUrl(data.fileUrl);
} catch (error) {
alert(error.message);
}
setLoading(false);
};
// Step 2.2: Render the Form and Result
return (
<div style={styles.container}>
<h1 style={styles.title}>Convert Text to Speech and Share</h1>
<form onSubmit={handleSubmit} style={styles.form}>
{/* Text Input */}
<div style={styles.field}>
<label>Text:</label>
<textarea
value={text}
onChange={(e) => setText(e.target.value)}
required
rows={4}
style={styles.textarea}
/>
</div>
{/* Group Name Input */}
<div style={styles.field}>
<label>Group Name (Optional):</label>
<input
type="text"
value={groupName}
onChange={(e) => setGroupName(e.target.value)}
style={styles.input}
/>
</div>
{/* Voice Selection */}
<div style={styles.field}>
<label>Voice:</label>
<select value={voiceId} onChange={(e) => setVoiceId(e.target.value)} style={styles.select}>
<option value="Rachel">Rachel</option>
<option value="Bill">Bill</option>
<option value="Charlie">Charlie</option>
<option value="Charlotte">Charlotte</option>
</select>
</div>
{/* Submit Button */}
<button type="submit" disabled={loading} style={styles.button}>
{loading ? 'Generating...' : 'Submit'}
</button>
</form>
{/* Display the Result */}
{audioUrl && (
<div style={styles.result}>
<h2>Generated Audio:</h2>
<audio controls src={audioUrl} style={styles.audio}></audio>
<p>
<a href={audioUrl} download>
Download Audio
</a>
</p>
</div>
)}
</div>
);
}
// Step 2.3: Define Styles
const styles: { [key: string]: React.CSSProperties } = {
container: {
maxWidth: '600px',
margin: 'auto',
padding: '2rem',
fontFamily: 'Arial, sans-serif',
},
title: {
textAlign: 'center',
marginBottom: '2rem',
},
form: {
marginBottom: '2rem',
},
field: {
marginBottom: '1rem',
},
textarea: {
width: '100%',
padding: '0.5rem',
fontSize: '1rem',
},
input: {
width: '100%',
padding: '0.5rem',
fontSize: '1rem',
},
select: {
width: '100%',
padding: '0.5rem',
fontSize: '1rem',
},
button: {
width: '100%',
padding: '0.75rem',
backgroundColor: '#0070f3',
color: '#fff',
border: 'none',
fontSize: '1rem',
cursor: 'pointer',
},
result: {
textAlign: 'center',
},
audio: {
width: '100%',
},
};
Step-by-Step Explanation:
- State Management: Uses
useState
to manage form inputs and loading state. - Form Submission (
handleSubmit
):- Prevents default form behavior.
- Validates that text is not empty.
- Sends a POST request to the backend API.
- Handles the response and updates the
audioUrl
state.
- Rendering the Form:
- Text Input: A textarea for users to input the text to convert.
- Group Name Input: An optional input for the group name.
- Voice Selection: A dropdown to select the voice.
- Submit Button: A button to submit the form; displays 'Generating...' when loading.
- Displaying the Result:
- If
audioUrl
is available, displays an audio player and a download link.
- If
- Styling:
- Uses inline styles defined in the
styles
object for simplicity.
- Uses inline styles defined in the
Testing the Application
Step 1: Run the Development Server
Start your app:
npm run dev
Step 2: Access the Application
Navigate to:
<http://localhost:3000/text2speech>
Step 3: Test the Functionality
- Enter Text: Type the text you want to convert to speech.
- Select Voice: Choose a voice from the dropdown menu.
- Group Name: Optionally, enter a group name.
- Submit: Click the "Submit" button.
Step 4: Verify the Result
- Wait for the audio to be generated.
- An audio player should appear with your generated audio.
- You can play it directly or download it using the provided link.
Congratulations! You’ve successfully built a web app that converts text to speech, stores the generated audio on Pinata, and provides a shareable link—all while leveraging decentralized storage with IPFS. This project demonstrates how modern technologies can come together to solve real-world problems, offering both practicality and innovation.
By following this step-by-step guide, you’ve learned how to:
- Set up a Next.js project with TypeScript.
- Integrate the ElevenLabs Text-to-Speech API.
- Upload and manage files on Pinata using IPFS.
- Build a user-friendly frontend interface.
- Implement best practices for code organization and scalability.
Optional Next Steps
Ready to take your project to the next level? Here are a few ideas for enhancements:
- Error Handling: Improve feedback to the user in case of errors or invalid inputs.
- Voice Customization: Add more voice options or allow users to adjust speech parameters.
- User Authentication: Implement authentication to manage private files and user-specific groups.
- UI/UX Improvements: Enhance the interface with better styling or responsive design.
- Deployment: Deploy your application to a hosting platform like Vercel or Netlify.
Conclusion
You now have a working application and, more importantly, the tools and knowledge to build on it. Ready to try it for yourself? Sign up for Pinata today and start building your next decentralized app!