Prerequisites
Exception report
Last 5 Keys:
& Space " C :
Exception:
System.ArgumentOutOfRangeException: The value must be greater than or equal to zero and less than the console's buffer size in that dimension.
Parameter name: left
Actual value was -2.
at System.Console.SetCursorPosition(Int32 left, Int32 top)
at Microsoft.PowerShell.Internal.VirtualTerminal.set_CursorLeft(Int32 value)
at Microsoft.PowerShell.PSConsoleReadLine.ReallyRender(RenderData renderData, String defaultColor)
at Microsoft.PowerShell.PSConsoleReadLine.ForceRender()
at Microsoft.PowerShell.PSConsoleReadLine.Insert(Char c)
at Microsoft.PowerShell.PSConsoleReadLine.SelfInsert(Nullable`1 key, Object arg)
at Microsoft.PowerShell.PSConsoleReadLine.ProcessOneKey(ConsoleKeyInfo key, Dictionary`2 dispatchTable, Boolean ignoreIfNoAction, Object arg)
at Microsoft.PowerShell.PSConsoleReadLine.InputLoop()
at Microsoft.PowerShell.PSConsoleReadLine.ReadLine(Runspace runspace, EngineIntrinsics engineIntrinsics)
Screenshot

Environment data
PS HostName: Visual Studio Code Host
PSReadLine EditMode: Windows
Steps to reproduce
import os
import librosa
import numpy as np
import sidekit
Set your dataset directory
audio_dir = 'path-to-dataset'
Initialize a list to store MFCC features
mfcc_features_list = []
session_list = [] # Initialize a list to store session identifiers
Step 1: Extract MFCC Features
print("Extracting MFCC features...")
for file_name in os.listdir(audio_dir):
if file_name.endswith('.WAV'):
file_path = os.path.join(audio_dir, file_name)
# Load the audio file
audio_signal, sample_rate = librosa.load(file_path, sr=16000)
# Extract MFCC features
mfcc_features = librosa.feature.mfcc(y=audio_signal, sr=sample_rate, n_mfcc=13)
# Append the MFCC features to the list
mfcc_features_list.append(mfcc_features)
session_list.append(file_name) # Append the file name as the session identifier
Determine the maximum number of frames (time steps)
max_frames = max([mfcc.shape[1] for mfcc in mfcc_features_list])
Pad the MFCC features so that they all have the same shape
padded_mfcc_features_list = [np.pad(mfcc, ((0, 0), (0, max_frames - mfcc.shape[1])), mode='constant') for mfcc in mfcc_features_list]
Convert the list to a numpy array
features = np.array(padded_mfcc_features_list)
Concatenate features along the time axis
concatenated_features = np.concatenate(features, axis=1)
Step 2: Train the UBM Model
print("Training the UBM model...")
ubm = sidekit.Mixture()
distrib_nb = 512 # Number of distributions (mixtures)
llr = [] # Initialize log-likelihood ratio list
ubm.EM_split(concatenated_features, distrib_nb, session_list, llr) # Train the UBM model using the EM algorithm
np.save('ubm_model.npy', ubm) # Save the trained UBM model to a file
Step 3: Train the TV Matrix
print("Training the TV Matrix...")
tv_matrix = sidekit.TotalVariability()
tv_matrix.factor_analyze(features, ubm)
np.save('tv_matrix.npy', tv_matrix) # Save the trained TV matrix to a file
Step 4: Extract and Save I-Vectors
print("Extracting and saving I-vectors...")
enrolled_ivectors = []
labels = ['Alice', 'Bob', 'Charlie'] # Replace with actual names
for feature in features: # Iterate over each speaker's features
ivector = sidekit.IVector()
ivector.compute(feature, ubm, tv_matrix)
enrolled_ivectors.append(ivector)
np.save('enrolled_ivectors.npy', np.array(enrolled_ivectors))
np.save('labels.npy', labels)
print("All models and vectors have been generated and saved.")
Expected behavior
this code should train my model so i ca run it and test it on the dataset i have
Actual behavior
it does not work and gave the above error
Prerequisites
Exception report
Screenshot
Environment data
Steps to reproduce
import os
import librosa
import numpy as np
import sidekit
Set your dataset directory
audio_dir = 'path-to-dataset'
Initialize a list to store MFCC features
mfcc_features_list = []
session_list = [] # Initialize a list to store session identifiers
Step 1: Extract MFCC Features
print("Extracting MFCC features...")
for file_name in os.listdir(audio_dir):
if file_name.endswith('.WAV'):
file_path = os.path.join(audio_dir, file_name)
Determine the maximum number of frames (time steps)
max_frames = max([mfcc.shape[1] for mfcc in mfcc_features_list])
Pad the MFCC features so that they all have the same shape
padded_mfcc_features_list = [np.pad(mfcc, ((0, 0), (0, max_frames - mfcc.shape[1])), mode='constant') for mfcc in mfcc_features_list]
Convert the list to a numpy array
features = np.array(padded_mfcc_features_list)
Concatenate features along the time axis
concatenated_features = np.concatenate(features, axis=1)
Step 2: Train the UBM Model
print("Training the UBM model...")
ubm = sidekit.Mixture()
distrib_nb = 512 # Number of distributions (mixtures)
llr = [] # Initialize log-likelihood ratio list
ubm.EM_split(concatenated_features, distrib_nb, session_list, llr) # Train the UBM model using the EM algorithm
np.save('ubm_model.npy', ubm) # Save the trained UBM model to a file
Step 3: Train the TV Matrix
print("Training the TV Matrix...")
tv_matrix = sidekit.TotalVariability()
tv_matrix.factor_analyze(features, ubm)
np.save('tv_matrix.npy', tv_matrix) # Save the trained TV matrix to a file
Step 4: Extract and Save I-Vectors
print("Extracting and saving I-vectors...")
enrolled_ivectors = []
labels = ['Alice', 'Bob', 'Charlie'] # Replace with actual names
for feature in features: # Iterate over each speaker's features
ivector = sidekit.IVector()
ivector.compute(feature, ubm, tv_matrix)
enrolled_ivectors.append(ivector)
np.save('enrolled_ivectors.npy', np.array(enrolled_ivectors))
np.save('labels.npy', labels)
print("All models and vectors have been generated and saved.")
Expected behavior
this code should train my model so i ca run it and test it on the dataset i have
Actual behavior
it does not work and gave the above error