street-lingo/SETUP.md

# Indonesian Learning App with AI Speech Integration

## Setup Instructions

### 1. Prerequisites
- Python 3.11+
- Node.js 16+
- Google Cloud Account
- OpenAI API Key

### 2. Google Cloud Setup
1. Create a new Google Cloud project or use existing one
2. Enable the following APIs:
   - Cloud Speech-to-Text API
   - Cloud Text-to-Speech API
3. Create a service account with the following roles:
   - Speech Client
   - Text-to-Speech Client
4. Download the service account key JSON file

### 3. Environment Configuration

#### Backend Configuration
1. Copy the environment template:
   ```bash
   cd backend
   cp .env.example .env
   ```

2. Edit `backend/.env` with your credentials:
   ```bash
   # Required
   GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
   OPENAI_API_KEY=your-openai-api-key-here

   # Optional - customize as needed
   OPENAI_MODEL=gpt-4o-mini
   GOOGLE_CLOUD_PROJECT=your-project-id
   SPEECH_LANGUAGE_CODE=id-ID
   TTS_VOICE_NAME=id-ID-Standard-A
   TTS_VOICE_GENDER=FEMALE
   HOST=0.0.0.0
   PORT=8000
   CORS_ORIGINS=http://localhost:3000,http://localhost:5173
   ```

#### Frontend Configuration
1. Copy the environment template:
   ```bash
   cp .env.example .env
   ```

2. Edit `.env` if needed (defaults should work):
   ```bash
   VITE_API_BASE_URL=http://localhost:8000
   VITE_WS_BASE_URL=ws://localhost:8000
   VITE_ENABLE_SPEECH_FEATURES=true
   VITE_ENABLE_AI_CHAT=true
   ```

### 4. Backend Setup
```bash
cd backend
pip install uv  # if not already installed
uv sync
```

### 5. Frontend Setup
```bash
npm install
```

### 6. Running the Application

#### Start the backend:
```bash
cd backend
uv run python main.py
```
The backend will run on `http://localhost:8000`

#### Start the frontend:
```bash
npm run dev
```
The frontend will run on `http://localhost:5173`

### 7. Using the App

1. **Traditional Mode**: The original structured learning experience
2. **AI Chat Mode**: New conversational AI with speech-to-text and text-to-speech

#### AI Chat Features:
- **Speech Input**: Click "🎤 Speak" to record your voice in Indonesian
- **Text Input**: Type messages in Indonesian
- **AI Response**: GPT-4o-mini responds in Indonesian with educational guidance
- **Speech Output**: AI responses are automatically converted to speech
- **Real-time**: WebSocket streaming for low-latency conversation

### 8. Environment Variables Summary

#### Backend (.env file):
```bash
# Required
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
OPENAI_API_KEY=your-openai-api-key

# Optional Configuration
OPENAI_MODEL=gpt-4o-mini
GOOGLE_CLOUD_PROJECT=your-project-id
SPEECH_LANGUAGE_CODE=id-ID
SPEECH_SAMPLE_RATE=48000
SPEECH_ENCODING=WEBM_OPUS
TTS_LANGUAGE_CODE=id-ID
TTS_VOICE_NAME=id-ID-Standard-A
TTS_VOICE_GENDER=FEMALE
TTS_SPEAKING_RATE=1.0
TTS_PITCH=0.0
HOST=0.0.0.0
PORT=8000
DEBUG=false
CORS_ORIGINS=http://localhost:3000,http://localhost:5173
```

#### Frontend (.env file):
```bash
VITE_API_BASE_URL=http://localhost:8000
VITE_WS_BASE_URL=ws://localhost:8000
VITE_DEV_MODE=true
VITE_LOG_LEVEL=info
VITE_ENABLE_SPEECH_FEATURES=true
VITE_ENABLE_AI_CHAT=true
VITE_ENABLE_TRADITIONAL_MODE=true
```

### 9. Testing
- Visit any scenario (warung, ojek, alfamart)
- Toggle between "📝 Traditional" and "🗣️ AI Chat" modes
- Test speech input (requires microphone permission)
- Verify audio output plays automatically

### 10. Troubleshooting

#### Common Issues:
1. **Microphone not working**: Check browser permissions
2. **Audio not playing**: Check browser audio settings
3. **Google Cloud errors**: Verify service account permissions
4. **OpenAI errors**: Check API key and usage limits
5. **WebSocket connection issues**: Check backend is running on port 8000

#### Browser Compatibility:
- Chrome/Edge: Full support
- Firefox: Limited WebRTC support
- Safari: May require additional permissions

### 11. Architecture
```
User speaks → Browser captures audio → WebSocket →
Google Cloud Speech-to-Text → OpenAI GPT-4o-mini →
Google Cloud Text-to-Speech → WebSocket → Browser plays audio
```

### 12. Cost Considerations
- Google Cloud Speech-to-Text: ~$0.006 per 15-second chunk
- Google Cloud Text-to-Speech: ~$0.000004 per character
- OpenAI GPT-4o-mini: ~$0.150 per 1M input tokens, ~$0.600 per 1M output tokens

For typical usage (5-10 minutes of conversation), costs should be under $0.50 per session.