# Indonesian Learning App with AI Speech Integration ## Setup Instructions ### 1. Prerequisites - Python 3.11+ - Node.js 16+ - Google Cloud Account - OpenAI API Key ### 2. Google Cloud Setup 1. Create a new Google Cloud project or use existing one 2. Enable the following APIs: - Cloud Speech-to-Text API - Cloud Text-to-Speech API 3. Create a service account with the following roles: - Speech Client - Text-to-Speech Client 4. Download the service account key JSON file ### 3. Environment Configuration #### Backend Configuration 1. Copy the environment template: ```bash cd backend cp .env.example .env ``` 2. Edit `backend/.env` with your credentials: ```bash # Required GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json OPENAI_API_KEY=your-openai-api-key-here # Optional - customize as needed OPENAI_MODEL=gpt-4o-mini GOOGLE_CLOUD_PROJECT=your-project-id SPEECH_LANGUAGE_CODE=id-ID TTS_VOICE_NAME=id-ID-Standard-A TTS_VOICE_GENDER=FEMALE HOST=0.0.0.0 PORT=8000 CORS_ORIGINS=http://localhost:3000,http://localhost:5173 ``` #### Frontend Configuration 1. Copy the environment template: ```bash cp .env.example .env ``` 2. Edit `.env` if needed (defaults should work): ```bash VITE_API_BASE_URL=http://localhost:8000 VITE_WS_BASE_URL=ws://localhost:8000 VITE_ENABLE_SPEECH_FEATURES=true VITE_ENABLE_AI_CHAT=true ``` ### 4. Backend Setup ```bash cd backend pip install uv # if not already installed uv sync ``` ### 5. Frontend Setup ```bash npm install ``` ### 6. Running the Application #### Start the backend: ```bash cd backend uv run python main.py ``` The backend will run on `http://localhost:8000` #### Start the frontend: ```bash npm run dev ``` The frontend will run on `http://localhost:5173` ### 7. Using the App 1. **Traditional Mode**: The original structured learning experience 2. **AI Chat Mode**: New conversational AI with speech-to-text and text-to-speech #### AI Chat Features: - **Speech Input**: Click "🎤 Speak" to record your voice in Indonesian - **Text Input**: Type messages in Indonesian - **AI Response**: GPT-4o-mini responds in Indonesian with educational guidance - **Speech Output**: AI responses are automatically converted to speech - **Real-time**: WebSocket streaming for low-latency conversation ### 8. Environment Variables Summary #### Backend (.env file): ```bash # Required GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json OPENAI_API_KEY=your-openai-api-key # Optional Configuration OPENAI_MODEL=gpt-4o-mini GOOGLE_CLOUD_PROJECT=your-project-id SPEECH_LANGUAGE_CODE=id-ID SPEECH_SAMPLE_RATE=48000 SPEECH_ENCODING=WEBM_OPUS TTS_LANGUAGE_CODE=id-ID TTS_VOICE_NAME=id-ID-Standard-A TTS_VOICE_GENDER=FEMALE TTS_SPEAKING_RATE=1.0 TTS_PITCH=0.0 HOST=0.0.0.0 PORT=8000 DEBUG=false CORS_ORIGINS=http://localhost:3000,http://localhost:5173 ``` #### Frontend (.env file): ```bash VITE_API_BASE_URL=http://localhost:8000 VITE_WS_BASE_URL=ws://localhost:8000 VITE_DEV_MODE=true VITE_LOG_LEVEL=info VITE_ENABLE_SPEECH_FEATURES=true VITE_ENABLE_AI_CHAT=true VITE_ENABLE_TRADITIONAL_MODE=true ``` ### 9. Testing - Visit any scenario (warung, ojek, alfamart) - Toggle between "📝 Traditional" and "🗣️ AI Chat" modes - Test speech input (requires microphone permission) - Verify audio output plays automatically ### 10. Troubleshooting #### Common Issues: 1. **Microphone not working**: Check browser permissions 2. **Audio not playing**: Check browser audio settings 3. **Google Cloud errors**: Verify service account permissions 4. **OpenAI errors**: Check API key and usage limits 5. **WebSocket connection issues**: Check backend is running on port 8000 #### Browser Compatibility: - Chrome/Edge: Full support - Firefox: Limited WebRTC support - Safari: May require additional permissions ### 11. Architecture ``` User speaks → Browser captures audio → WebSocket → Google Cloud Speech-to-Text → OpenAI GPT-4o-mini → Google Cloud Text-to-Speech → WebSocket → Browser plays audio ``` ### 12. Cost Considerations - Google Cloud Speech-to-Text: ~$0.006 per 15-second chunk - Google Cloud Text-to-Speech: ~$0.000004 per character - OpenAI GPT-4o-mini: ~$0.150 per 1M input tokens, ~$0.600 per 1M output tokens For typical usage (5-10 minutes of conversation), costs should be under $0.50 per session.