168 lines
4.3 KiB
Markdown
168 lines
4.3 KiB
Markdown
# Indonesian Learning App with AI Speech Integration
|
|
|
|
## Setup Instructions
|
|
|
|
### 1. Prerequisites
|
|
- Python 3.11+
|
|
- Node.js 16+
|
|
- Google Cloud Account
|
|
- OpenAI API Key
|
|
|
|
### 2. Google Cloud Setup
|
|
1. Create a new Google Cloud project or use existing one
|
|
2. Enable the following APIs:
|
|
- Cloud Speech-to-Text API
|
|
- Cloud Text-to-Speech API
|
|
3. Create a service account with the following roles:
|
|
- Speech Client
|
|
- Text-to-Speech Client
|
|
4. Download the service account key JSON file
|
|
|
|
### 3. Environment Configuration
|
|
|
|
#### Backend Configuration
|
|
1. Copy the environment template:
|
|
```bash
|
|
cd backend
|
|
cp .env.example .env
|
|
```
|
|
|
|
2. Edit `backend/.env` with your credentials:
|
|
```bash
|
|
# Required
|
|
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
|
|
OPENAI_API_KEY=your-openai-api-key-here
|
|
|
|
# Optional - customize as needed
|
|
OPENAI_MODEL=gpt-4o-mini
|
|
GOOGLE_CLOUD_PROJECT=your-project-id
|
|
SPEECH_LANGUAGE_CODE=id-ID
|
|
TTS_VOICE_NAME=id-ID-Standard-A
|
|
TTS_VOICE_GENDER=FEMALE
|
|
HOST=0.0.0.0
|
|
PORT=8000
|
|
CORS_ORIGINS=http://localhost:3000,http://localhost:5173
|
|
```
|
|
|
|
#### Frontend Configuration
|
|
1. Copy the environment template:
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
2. Edit `.env` if needed (defaults should work):
|
|
```bash
|
|
VITE_API_BASE_URL=http://localhost:8000
|
|
VITE_WS_BASE_URL=ws://localhost:8000
|
|
VITE_ENABLE_SPEECH_FEATURES=true
|
|
VITE_ENABLE_AI_CHAT=true
|
|
```
|
|
|
|
### 4. Backend Setup
|
|
```bash
|
|
cd backend
|
|
pip install uv # if not already installed
|
|
uv sync
|
|
```
|
|
|
|
### 5. Frontend Setup
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
### 6. Running the Application
|
|
|
|
#### Start the backend:
|
|
```bash
|
|
cd backend
|
|
uv run python main.py
|
|
```
|
|
The backend will run on `http://localhost:8000`
|
|
|
|
#### Start the frontend:
|
|
```bash
|
|
npm run dev
|
|
```
|
|
The frontend will run on `http://localhost:5173`
|
|
|
|
### 7. Using the App
|
|
|
|
1. **Traditional Mode**: The original structured learning experience
|
|
2. **AI Chat Mode**: New conversational AI with speech-to-text and text-to-speech
|
|
|
|
#### AI Chat Features:
|
|
- **Speech Input**: Click "🎤 Speak" to record your voice in Indonesian
|
|
- **Text Input**: Type messages in Indonesian
|
|
- **AI Response**: GPT-4o-mini responds in Indonesian with educational guidance
|
|
- **Speech Output**: AI responses are automatically converted to speech
|
|
- **Real-time**: WebSocket streaming for low-latency conversation
|
|
|
|
### 8. Environment Variables Summary
|
|
|
|
#### Backend (.env file):
|
|
```bash
|
|
# Required
|
|
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
|
|
OPENAI_API_KEY=your-openai-api-key
|
|
|
|
# Optional Configuration
|
|
OPENAI_MODEL=gpt-4o-mini
|
|
GOOGLE_CLOUD_PROJECT=your-project-id
|
|
SPEECH_LANGUAGE_CODE=id-ID
|
|
SPEECH_SAMPLE_RATE=48000
|
|
SPEECH_ENCODING=WEBM_OPUS
|
|
TTS_LANGUAGE_CODE=id-ID
|
|
TTS_VOICE_NAME=id-ID-Standard-A
|
|
TTS_VOICE_GENDER=FEMALE
|
|
TTS_SPEAKING_RATE=1.0
|
|
TTS_PITCH=0.0
|
|
HOST=0.0.0.0
|
|
PORT=8000
|
|
DEBUG=false
|
|
CORS_ORIGINS=http://localhost:3000,http://localhost:5173
|
|
```
|
|
|
|
#### Frontend (.env file):
|
|
```bash
|
|
VITE_API_BASE_URL=http://localhost:8000
|
|
VITE_WS_BASE_URL=ws://localhost:8000
|
|
VITE_DEV_MODE=true
|
|
VITE_LOG_LEVEL=info
|
|
VITE_ENABLE_SPEECH_FEATURES=true
|
|
VITE_ENABLE_AI_CHAT=true
|
|
VITE_ENABLE_TRADITIONAL_MODE=true
|
|
```
|
|
|
|
### 9. Testing
|
|
- Visit any scenario (warung, ojek, alfamart)
|
|
- Toggle between "📝 Traditional" and "🗣️ AI Chat" modes
|
|
- Test speech input (requires microphone permission)
|
|
- Verify audio output plays automatically
|
|
|
|
### 10. Troubleshooting
|
|
|
|
#### Common Issues:
|
|
1. **Microphone not working**: Check browser permissions
|
|
2. **Audio not playing**: Check browser audio settings
|
|
3. **Google Cloud errors**: Verify service account permissions
|
|
4. **OpenAI errors**: Check API key and usage limits
|
|
5. **WebSocket connection issues**: Check backend is running on port 8000
|
|
|
|
#### Browser Compatibility:
|
|
- Chrome/Edge: Full support
|
|
- Firefox: Limited WebRTC support
|
|
- Safari: May require additional permissions
|
|
|
|
### 11. Architecture
|
|
```
|
|
User speaks → Browser captures audio → WebSocket →
|
|
Google Cloud Speech-to-Text → OpenAI GPT-4o-mini →
|
|
Google Cloud Text-to-Speech → WebSocket → Browser plays audio
|
|
```
|
|
|
|
### 12. Cost Considerations
|
|
- Google Cloud Speech-to-Text: ~$0.006 per 15-second chunk
|
|
- Google Cloud Text-to-Speech: ~$0.000004 per character
|
|
- OpenAI GPT-4o-mini: ~$0.150 per 1M input tokens, ~$0.600 per 1M output tokens
|
|
|
|
For typical usage (5-10 minutes of conversation), costs should be under $0.50 per session. |