street-lingo/SETUP.md

168 lines
4.3 KiB
Markdown

# Indonesian Learning App with AI Speech Integration
## Setup Instructions
### 1. Prerequisites
- Python 3.11+
- Node.js 16+
- Google Cloud Account
- OpenAI API Key
### 2. Google Cloud Setup
1. Create a new Google Cloud project or use existing one
2. Enable the following APIs:
- Cloud Speech-to-Text API
- Cloud Text-to-Speech API
3. Create a service account with the following roles:
- Speech Client
- Text-to-Speech Client
4. Download the service account key JSON file
### 3. Environment Configuration
#### Backend Configuration
1. Copy the environment template:
```bash
cd backend
cp .env.example .env
```
2. Edit `backend/.env` with your credentials:
```bash
# Required
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
OPENAI_API_KEY=your-openai-api-key-here
# Optional - customize as needed
OPENAI_MODEL=gpt-4o-mini
GOOGLE_CLOUD_PROJECT=your-project-id
SPEECH_LANGUAGE_CODE=id-ID
TTS_VOICE_NAME=id-ID-Standard-A
TTS_VOICE_GENDER=FEMALE
HOST=0.0.0.0
PORT=8000
CORS_ORIGINS=http://localhost:3000,http://localhost:5173
```
#### Frontend Configuration
1. Copy the environment template:
```bash
cp .env.example .env
```
2. Edit `.env` if needed (defaults should work):
```bash
VITE_API_BASE_URL=http://localhost:8000
VITE_WS_BASE_URL=ws://localhost:8000
VITE_ENABLE_SPEECH_FEATURES=true
VITE_ENABLE_AI_CHAT=true
```
### 4. Backend Setup
```bash
cd backend
pip install uv # if not already installed
uv sync
```
### 5. Frontend Setup
```bash
npm install
```
### 6. Running the Application
#### Start the backend:
```bash
cd backend
uv run python main.py
```
The backend will run on `http://localhost:8000`
#### Start the frontend:
```bash
npm run dev
```
The frontend will run on `http://localhost:5173`
### 7. Using the App
1. **Traditional Mode**: The original structured learning experience
2. **AI Chat Mode**: New conversational AI with speech-to-text and text-to-speech
#### AI Chat Features:
- **Speech Input**: Click "🎤 Speak" to record your voice in Indonesian
- **Text Input**: Type messages in Indonesian
- **AI Response**: GPT-4o-mini responds in Indonesian with educational guidance
- **Speech Output**: AI responses are automatically converted to speech
- **Real-time**: WebSocket streaming for low-latency conversation
### 8. Environment Variables Summary
#### Backend (.env file):
```bash
# Required
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
OPENAI_API_KEY=your-openai-api-key
# Optional Configuration
OPENAI_MODEL=gpt-4o-mini
GOOGLE_CLOUD_PROJECT=your-project-id
SPEECH_LANGUAGE_CODE=id-ID
SPEECH_SAMPLE_RATE=48000
SPEECH_ENCODING=WEBM_OPUS
TTS_LANGUAGE_CODE=id-ID
TTS_VOICE_NAME=id-ID-Standard-A
TTS_VOICE_GENDER=FEMALE
TTS_SPEAKING_RATE=1.0
TTS_PITCH=0.0
HOST=0.0.0.0
PORT=8000
DEBUG=false
CORS_ORIGINS=http://localhost:3000,http://localhost:5173
```
#### Frontend (.env file):
```bash
VITE_API_BASE_URL=http://localhost:8000
VITE_WS_BASE_URL=ws://localhost:8000
VITE_DEV_MODE=true
VITE_LOG_LEVEL=info
VITE_ENABLE_SPEECH_FEATURES=true
VITE_ENABLE_AI_CHAT=true
VITE_ENABLE_TRADITIONAL_MODE=true
```
### 9. Testing
- Visit any scenario (warung, ojek, alfamart)
- Toggle between "📝 Traditional" and "🗣️ AI Chat" modes
- Test speech input (requires microphone permission)
- Verify audio output plays automatically
### 10. Troubleshooting
#### Common Issues:
1. **Microphone not working**: Check browser permissions
2. **Audio not playing**: Check browser audio settings
3. **Google Cloud errors**: Verify service account permissions
4. **OpenAI errors**: Check API key and usage limits
5. **WebSocket connection issues**: Check backend is running on port 8000
#### Browser Compatibility:
- Chrome/Edge: Full support
- Firefox: Limited WebRTC support
- Safari: May require additional permissions
### 11. Architecture
```
User speaks → Browser captures audio → WebSocket →
Google Cloud Speech-to-Text → OpenAI GPT-4o-mini →
Google Cloud Text-to-Speech → WebSocket → Browser plays audio
```
### 12. Cost Considerations
- Google Cloud Speech-to-Text: ~$0.006 per 15-second chunk
- Google Cloud Text-to-Speech: ~$0.000004 per character
- OpenAI GPT-4o-mini: ~$0.150 per 1M input tokens, ~$0.600 per 1M output tokens
For typical usage (5-10 minutes of conversation), costs should be under $0.50 per session.