---
title: "Yemeni Music Classification"
period: "Nov 2024 - Jan 2025"
slug: "yemeni-music-classification"
roles:
  - "Fullstack Developer"
stacks:
  - "Flask"
  - "Python"
  - "Next.js"
  - "React"
  - "Typescript"
  - "Tailwind CSS"
  - "Shadcn UI"
links:
  - label: "Live"
    url: "https://team1-deeplearning-sic.vercel.app"
  - label: "Repository"
    url: "https://github.com/hibatillah/deep-learning"
headline: "Deep learning audio classification."
description: "Full-stack deep learning solution that replaces manual archiving with high-precision AI analysis, streamlining the preservation of Yaman musical traditions"
---

<ImageFrameGrid className="mb-8 w-fit mx-auto">
	<ImageFrame offset="c" transform="rotate" className="col-span-full sm:h-80">
		<Image
			src={thumbnail}
			alt="Yemeni Music Classification"
			placeholder="blur"
			className="object-top"
		/>
	</ImageFrame>
</ImageFrameGrid>

## AI for Cultural Preservation

Traditional music is a vital part of Yemen's cultural identity, but much of it remains undocumented and unorganized. Manual classification is slow and requires expert knowledge.

This project was built to solve this. It is an automated tool that uses `Deep Learning` to listen to audio files and categorize them into three distinct regional styles: Adeni, Hadrami, and Lahji.

## Architecture

This project required a seamless handshake between a modern _JavaScript_ frontend and a powerful Python backend. I architected a `Monorepo` setup to handle both concurrently.

- Frontend - A clean, responsive interface for users to upload `.mp3` files and view results.

- Backend - A robust <abbr title="Application Programming Interface">API</abbr> that receives the audio, processes it, and serves the _Deep Learning_ model.

- The Engine - A Convolutional Neural Network (CNN) trained to recognize audio patterns.

## The Engineering Challenge

### 1. From Audio to Image (Preprocessing)

Machines cannot _hear_ music, but they can _see_ it. The core of our system relies on converting raw audio waves into Mel-Spectrograms, visual representations of sound frequencies over time.

We built the _Python_ pipeline that:

1.  Accepts the user's uploaded file.

2.  Normalizes the audio duration.

3.  Generates the Spectrogram image.

4.  Feeds it into the CNN model for prediction.

### 2. Improving Accuracy

Initially, our model struggled with a 60% accuracy rate due to a small dataset.
Working closely with my teammate (who focused on the model architecture), we implemented Data Augmentation techniques. By adding noise, changing pitch, and stretching time in our training data, we made the model robust against poor recording quality.

> This iterative tuning pushed our final accuracy to 93.84%.

## User Experience

While the backend was complex, I ensured the frontend remained simple. Users don't need to know about _Spectrograms_ or _CNNs_. They simply upload a song and get an instant result.

---

- Model Serving - I learned that serving an AI model requires different considerations than a standard CRUD API, specifically regarding request timeouts and processing power.

- Cross-Language Integration - Bridging `Next.js` and `Flask` taught me how to design efficient APIs that handle file streams effectively.

- Cultural Impact - It proved that modern technology can be a powerful tool for archiving and preserving traditional art forms.