4.3 KiB
4.3 KiB
Indonesian Learning App with AI Speech Integration
Setup Instructions
1. Prerequisites
- Python 3.11+
- Node.js 16+
- Google Cloud Account
- OpenAI API Key
2. Google Cloud Setup
- Create a new Google Cloud project or use existing one
- Enable the following APIs:
- Cloud Speech-to-Text API
- Cloud Text-to-Speech API
- Create a service account with the following roles:
- Speech Client
- Text-to-Speech Client
- Download the service account key JSON file
3. Environment Configuration
Backend Configuration
-
Copy the environment template:
cd backend cp .env.example .env
-
Edit
backend/.env
with your credentials:# Required GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json OPENAI_API_KEY=your-openai-api-key-here # Optional - customize as needed OPENAI_MODEL=gpt-4o-mini GOOGLE_CLOUD_PROJECT=your-project-id SPEECH_LANGUAGE_CODE=id-ID TTS_VOICE_NAME=id-ID-Standard-A TTS_VOICE_GENDER=FEMALE HOST=0.0.0.0 PORT=8000 CORS_ORIGINS=http://localhost:3000,http://localhost:5173
Frontend Configuration
-
Copy the environment template:
cp .env.example .env
-
Edit
.env
if needed (defaults should work):VITE_API_BASE_URL=http://localhost:8000 VITE_WS_BASE_URL=ws://localhost:8000 VITE_ENABLE_SPEECH_FEATURES=true VITE_ENABLE_AI_CHAT=true
4. Backend Setup
cd backend
pip install uv # if not already installed
uv sync
5. Frontend Setup
npm install
6. Running the Application
Start the backend:
cd backend
uv run python main.py
The backend will run on http://localhost:8000
Start the frontend:
npm run dev
The frontend will run on http://localhost:5173
7. Using the App
- Traditional Mode: The original structured learning experience
- AI Chat Mode: New conversational AI with speech-to-text and text-to-speech
AI Chat Features:
- Speech Input: Click "🎤 Speak" to record your voice in Indonesian
- Text Input: Type messages in Indonesian
- AI Response: GPT-4o-mini responds in Indonesian with educational guidance
- Speech Output: AI responses are automatically converted to speech
- Real-time: WebSocket streaming for low-latency conversation
8. Environment Variables Summary
Backend (.env file):
# Required
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
OPENAI_API_KEY=your-openai-api-key
# Optional Configuration
OPENAI_MODEL=gpt-4o-mini
GOOGLE_CLOUD_PROJECT=your-project-id
SPEECH_LANGUAGE_CODE=id-ID
SPEECH_SAMPLE_RATE=48000
SPEECH_ENCODING=WEBM_OPUS
TTS_LANGUAGE_CODE=id-ID
TTS_VOICE_NAME=id-ID-Standard-A
TTS_VOICE_GENDER=FEMALE
TTS_SPEAKING_RATE=1.0
TTS_PITCH=0.0
HOST=0.0.0.0
PORT=8000
DEBUG=false
CORS_ORIGINS=http://localhost:3000,http://localhost:5173
Frontend (.env file):
VITE_API_BASE_URL=http://localhost:8000
VITE_WS_BASE_URL=ws://localhost:8000
VITE_DEV_MODE=true
VITE_LOG_LEVEL=info
VITE_ENABLE_SPEECH_FEATURES=true
VITE_ENABLE_AI_CHAT=true
VITE_ENABLE_TRADITIONAL_MODE=true
9. Testing
- Visit any scenario (warung, ojek, alfamart)
- Toggle between "📝 Traditional" and "🗣️ AI Chat" modes
- Test speech input (requires microphone permission)
- Verify audio output plays automatically
10. Troubleshooting
Common Issues:
- Microphone not working: Check browser permissions
- Audio not playing: Check browser audio settings
- Google Cloud errors: Verify service account permissions
- OpenAI errors: Check API key and usage limits
- WebSocket connection issues: Check backend is running on port 8000
Browser Compatibility:
- Chrome/Edge: Full support
- Firefox: Limited WebRTC support
- Safari: May require additional permissions
11. Architecture
User speaks → Browser captures audio → WebSocket →
Google Cloud Speech-to-Text → OpenAI GPT-4o-mini →
Google Cloud Text-to-Speech → WebSocket → Browser plays audio
12. Cost Considerations
- Google Cloud Speech-to-Text: ~$0.006 per 15-second chunk
- Google Cloud Text-to-Speech: ~$0.000004 per character
- OpenAI GPT-4o-mini: ~$0.150 per 1M input tokens, ~$0.600 per 1M output tokens
For typical usage (5-10 minutes of conversation), costs should be under $0.50 per session.