Add audio spectrogram processing utilities for YOLO audio classification#70
Draft
Add audio spectrogram processing utilities for YOLO audio classification#70
Conversation
Delete 41 development artifact markdown files
- Add audio_processing.py module with complete workflow functions - Implement audio chunking with sliding windows - Add spectrogram generation and batch processing - Implement video creation from spectrogram sequences - Add image annotation with classification results - Include audio-video synchronization support - Add comprehensive test suite (9 tests, all passing) - Create demo script with usage examples - Add detailed documentation guide Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>
- Update README.md with audio processing requirements and documentation links - Add simple_audio_spectrogram_example.py demonstrating the workflow - Create examples/ directory for code samples - Example creates 3-second audio with A-C-E notes and processes it - All functionality tested and working Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>
- Create comprehensive implementation summary document - Document all features, technical details, and testing results - Include usage examples and performance characteristics - Document known limitations and future integration plans - Complete project documentation Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Utilise code for developing spectrogram feature
Add audio spectrogram processing utilities for YOLO audio classification
Nov 8, 2025
d9f4029 to
f5ce349
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the audio-to-spectrogram-to-video workflow from the provided Colab notebook for ESC-50 and custom audio classification tasks.
Core Module
node/InputNode/audio_processing.py(436 lines)chunk_audio_wav_or_mp3()- Sliding window audio chunking (configurable duration/step)fourier_transformation()- STFT with windowing using stride tricksmake_logscale()- Logarithmic frequency binning for better low-freq resolutionplot_spectrogram()- Audio → spectrogram image with configurable colormapsprocess_chunks_to_spectrograms()- Batch processing for audio folderscreate_video_from_spectrograms()- Spectrogram sequence → MP4 with temporal alignmentcreate_video_with_audio_sync()- Optional ffmpeg audio track mergingannotate_image_with_classification()- Overlay top-N predictions with styled textTesting
tests/test_audio_processing.py- 9 tests covering all functions + full workflow integrationDocumentation
AUDIO_SPECTROGRAM_GUIDE.md- API reference, workflow examples (ESC-50, YOLO, custom)examples/simple_audio_spectrogram_example.py- Self-contained working demoUsage
Technical Details
Dependencies
Already in
requirements.txt: librosa, matplotlib, soundfileOriginal prompt
utilise ce code pour le développement du spectrogramme : # -- coding: utf-8 --
"""AudioTrain_LAMAAZ_1M.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1AgWLLSACNAYYiBZyu414xvq2ri3q2IcW
TELECHARGEMENT DATA
"""
! wget https://github.com/karoldvl/ESC-50/archive/master.zip
"""https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2023-09-23--yolo8-listen/2023-09-23/"""
! unzip master.zip
"""## IMPORT"""
import numpy as np
from matplotlib import pyplot as plt
from numpy.lib import stride_tricks
import os
import pandas as pd
import scipy.io.wavfile as wav
esc50_df = pd.read_csv('/content/ESC-50-master/meta/esc50.csv')
esc50_df.head()
esc50_df['category'].value_counts()
def fourier_transformation(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))
def make_logscale(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)
import os
import pandas as pd
import scipy.io.wavfile as wav
import numpy as np
import matplotlib.pyplot as plt
Ta fonction plot_spectrogram
def plot_spectrogram(location, plotpath=None, binsize=2**10, colormap="jet"):
samplerate, samples = wav.read(location)
s = fourier_transformation(samples, binsize)
sshow, freq = make_logscale(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel
Charger le CSV
esc50_df = pd.read_csv('/content/ESC-50-master/meta/esc50.csv')
Créer les dossiers
spectrogram_root = '/content/ESC-50-master/spectrogram'
os.makedirs(spectrogram_root, exist_ok=True)
for cat in esc50_df['category'].unique():
os.makedirs(os.path.join(spectrogram_root, cat), exist_ok=True)
Générer tous les spectrogrammes
for i, row in esc50_df.iterrows():
filename = row['filename']
category = row['category']
audio_path = os.path.join('/content/ESC-50-master/audio', filename)
save_path = os.path.join(spectrogram_root, category, filename.replace('.wav', '.jpg'))
plot = plot_spectrogram('/content/ESC-50-master/audio/' + esc50_df[esc50_df['category'] == 'crow']['filename'].iloc[0])
conversion = []
for i in range(len(esc50_df.index)):
filename = esc50_df['filename'].iloc[i]
location = '/content/ESC-50-master/audio' + filename
category = esc50_df['category'].iloc[i]
catpath = '/content/ESC-50-master/spectrogram/' + category
filepath = catpath + '/' + filename[:-4] + '.jpg'
conversion[0]
!pip install split-folders
import splitfolders
input_folder = '/content/ESC-50-master/spectrogram'
output = 'data'
splitfolders.ratio(input_folder, output=output, seed=42, ratio=(.8, .2))
testing = [
'data/test/helicopter.wav',
'data/test/cat.wav'
]
! pip...
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.