4.3 KiB

Raw Blame History

Indonesian Learning App with AI Speech Integration

Setup Instructions

1. Prerequisites

Python 3.11+
Node.js 16+
Google Cloud Account
OpenAI API Key

2. Google Cloud Setup

Create a new Google Cloud project or use existing one
Enable the following APIs:
- Cloud Speech-to-Text API
- Cloud Text-to-Speech API
Create a service account with the following roles:
- Speech Client
- Text-to-Speech Client
Download the service account key JSON file

3. Environment Configuration

Backend Configuration

Copy the environment template:
```
cd backend
cp .env.example .env
```

Edit backend/.env with your credentials:

# Required
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
OPENAI_API_KEY=your-openai-api-key-here

# Optional - customize as needed
OPENAI_MODEL=gpt-4o-mini
GOOGLE_CLOUD_PROJECT=your-project-id
SPEECH_LANGUAGE_CODE=id-ID
TTS_VOICE_NAME=id-ID-Standard-A
TTS_VOICE_GENDER=FEMALE
HOST=0.0.0.0
PORT=8000
CORS_ORIGINS=http://localhost:3000,http://localhost:5173

Frontend Configuration

Copy the environment template:
```
cp .env.example .env
```

Edit .env if needed (defaults should work):

VITE_API_BASE_URL=http://localhost:8000
VITE_WS_BASE_URL=ws://localhost:8000
VITE_ENABLE_SPEECH_FEATURES=true
VITE_ENABLE_AI_CHAT=true

4. Backend Setup

cd backend
pip install uv  # if not already installed
uv sync

5. Frontend Setup

npm install

6. Running the Application

Start the backend:

cd backend
uv run python main.py

The backend will run on http://localhost:8000

Start the frontend:

npm run dev

The frontend will run on http://localhost:5173

7. Using the App

Traditional Mode: The original structured learning experience
AI Chat Mode: New conversational AI with speech-to-text and text-to-speech

AI Chat Features:

Speech Input: Click "🎤 Speak" to record your voice in Indonesian
Text Input: Type messages in Indonesian
AI Response: GPT-4o-mini responds in Indonesian with educational guidance
Speech Output: AI responses are automatically converted to speech
Real-time: WebSocket streaming for low-latency conversation

8. Environment Variables Summary

Backend (.env file):

# Required
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
OPENAI_API_KEY=your-openai-api-key

# Optional Configuration
OPENAI_MODEL=gpt-4o-mini
GOOGLE_CLOUD_PROJECT=your-project-id
SPEECH_LANGUAGE_CODE=id-ID
SPEECH_SAMPLE_RATE=48000
SPEECH_ENCODING=WEBM_OPUS
TTS_LANGUAGE_CODE=id-ID
TTS_VOICE_NAME=id-ID-Standard-A
TTS_VOICE_GENDER=FEMALE
TTS_SPEAKING_RATE=1.0
TTS_PITCH=0.0
HOST=0.0.0.0
PORT=8000
DEBUG=false
CORS_ORIGINS=http://localhost:3000,http://localhost:5173

Frontend (.env file):

VITE_API_BASE_URL=http://localhost:8000
VITE_WS_BASE_URL=ws://localhost:8000
VITE_DEV_MODE=true
VITE_LOG_LEVEL=info
VITE_ENABLE_SPEECH_FEATURES=true
VITE_ENABLE_AI_CHAT=true
VITE_ENABLE_TRADITIONAL_MODE=true

9. Testing

Visit any scenario (warung, ojek, alfamart)
Toggle between "📝 Traditional" and "🗣️ AI Chat" modes
Test speech input (requires microphone permission)
Verify audio output plays automatically

10. Troubleshooting

Common Issues:

Microphone not working: Check browser permissions
Audio not playing: Check browser audio settings
Google Cloud errors: Verify service account permissions
OpenAI errors: Check API key and usage limits
WebSocket connection issues: Check backend is running on port 8000

Browser Compatibility:

Chrome/Edge: Full support
Firefox: Limited WebRTC support
Safari: May require additional permissions

11. Architecture

User speaks → Browser captures audio → WebSocket → 
Google Cloud Speech-to-Text → OpenAI GPT-4o-mini → 
Google Cloud Text-to-Speech → WebSocket → Browser plays audio

12. Cost Considerations

Google Cloud Speech-to-Text: ~$0.006 per 15-second chunk
Google Cloud Text-to-Speech: ~$0.000004 per character
OpenAI GPT-4o-mini: ~$0.150 per 1M input tokens, ~$0.600 per 1M output tokens

For typical usage (5-10 minutes of conversation), costs should be under $0.50 per session.

4.3 KiB Raw Blame History