Audio Feedback Generator

A cloud-based tool where users record audio and receive instant feedback on both content and delivery.

Overview

The Audio Feedback Generator helps users improve their speaking skills. With a simple interface, users can record audio (up to 15 minutes) and receive structured feedback powered by audio processing and AI models.

System Architecture

1. Recording & Upload

Frontend (React + Vite) for recording (max 15 min).
Signed URL requested from backend with anonymous Firestore ID.
Cloud Run service generates signed GCS URL & logs metadata in BigQuery.
Raw audio file uploaded to audio-bucket/raw_audio_webm/.

2. Preprocessing

GCS finalize event triggers Pub/Sub message.
Uploader converts audio file from its original extension to .wav, and cleans audio.
Outputs stored in raw_audio_wav/ & clean_audio_wav/.
Updates Firestore & publishes to the Audio process success Pub/Sub topic.

3. Processing & Feedback

Cloud Run Processor listens to the Audio process success Pub/Sub topic.
Performs:
- Spectral feature analysis
- Speech-to-text with Whisper
- Feedback generation with Gemini
Pushes results to Firestore.

4. Real-time Updates

Firestore provides near real-time updates to the frontend, restricted by Firestore rules so that each user can only access their own audio and feedback results.

Infrastructure

The project runs fully on Google Cloud Platform, with services provisioned via Terraform:

Cloud Storage
Cloud Run
Pub/Sub
Firestore
BigQuery
IAM & Service Accounts

Key Learnings

✔️ Designing an event-driven architecture with GCP services.

✔️ Handling large file uploads securely with signed URLs.

✔️ Integrating open-source models (Whisper, Gemini) into a production pipeline.

✔️ Managing real-time client updates with Firestore rules.