Audio Feedback Generator
A cloud-based tool where users record audio and receive instant feedback on both content and delivery.
Try the toolOverview
The Audio Feedback Generator helps users improve their speaking skills. With a simple interface, users can record audio (up to 15 minutes) and receive structured feedback powered by audio processing and AI models.
System Architecture
1. Recording & Upload
- Frontend (React + Vite) for recording (max 15 min).
- Signed URL requested from backend with anonymous Firestore ID.
- Cloud Run service generates signed GCS URL & logs metadata in BigQuery.
- Raw audio file uploaded to
audio-bucket/raw_audio_webm/
.
2. Preprocessing
- GCS
finalize
event triggers Pub/Sub message. - Uploader converts audio file from its original extension to
.wav
, and cleans audio. - Outputs stored in
raw_audio_wav/
&clean_audio_wav/
. - Updates Firestore & publishes to the Audio process success Pub/Sub topic.
3. Processing & Feedback
- Cloud Run Processor listens to the Audio process success Pub/Sub topic.
- Performs:
- Spectral feature analysis
- Speech-to-text with Whisper
- Feedback generation with Gemini
- Pushes results to Firestore.
4. Real-time Updates
Firestore provides near real-time updates to the frontend, restricted by Firestore rules so that each user can only access their own audio and feedback results.
Infrastructure
The project runs fully on Google Cloud Platform, with services provisioned via Terraform:
- Cloud Storage
- Cloud Run
- Pub/Sub
- Firestore
- BigQuery
- IAM & Service Accounts
Key Learnings
✔️ Designing an event-driven architecture with GCP services.
✔️ Handling large file uploads securely with signed URLs.
✔️ Integrating open-source models (Whisper, Gemini) into a production pipeline.
✔️ Managing real-time client updates with Firestore rules.