Skip to content

Conversation

@singalsu
Copy link
Collaborator

WIP

The rename is done to avoid possible conflict with other math
libraries. The change is done to prepare add of 32 bit square root
function.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds a higher precision 32-bit fractional integer
square root function to SOF math library. The algorithm uses
a lookup table for initial value and two iterations with
Newton-Raphson method to improve the accuracy. Both input and
output format is Q2.30. The format was chosen to match complex
to polar conversions numbers range for Q1.31 complex values.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch helps with more generic use of complex numbers not
directly related to the FFTs domain. It prepares to add polar
complex numbers format that is commonly used in frequency domain
signal processing.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds functions sofm_icomplex32_to_polar() and
sofm_ipolar32_to_complex(). In polar format the Q1.31
(real, imag) numbers pair is converted to (magnitude, angle).
The magnitude is Q2.30 format and angle in -pi to +pi radians
in Q3.29 format.

The conversion to polar and back loses some quality so there
currently is no support for icomplex16.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The testbench quits after three file module copies without data
written. The value is too low for components those accumulate
more than one LL period of data before producing output or can't
output at every copy.

The value 10 should better ensure that testbench run is not ended
too early. Currently testbench lacks the DP scheduler so, so the
modules those are designed for DP can be run with this workaround.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch adds the Phase Vocoder SOF module. It provides
render speed control in range 0.5-2.0x. The pitch is preserved
in audio waveform stretch or shorten. The module is using a
frequency domain algorithm in STFT domain to interpolate
magnitude and phase of output IFFT frames from input FFT frames.

The render speed can be controlled via enable/disable switch and
enum control with steps of 0.1, or with finer precision with
bytes control. (WIP)

The STFT parameters are configured with bytes control blob. The
default is 1024 size FFT with hop of 256 and Hann window.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
@singalsu
Copy link
Collaborator Author

Fixed s16 processing, and did some improvements. The MCPS in MTL platform is 175 - 313 depending on selected speed.

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that "vocoder" is a family of algorithms / audio processing methods, and this module implements one of them - speed control? Maybe call it vocoder_speed? Or does "Phase Vocoder" actually mean the same - only the speed processing part?

# SPDX-License-Identifier: BSD-3-Clause

config COMP_PHASE_VOCODER
tristate "Phase Vocoder component"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to make it default m if LIBRARY_DEFAULT_MODULAR=

# tests it can't use extra CONFIGs. See #9410, #8722 and #9386
CONFIG_COMP_GOOGLE_RTC_AUDIO_PROCESSING=m
CONFIG_GOOGLE_RTC_AUDIO_PROCESSING_MOCK=y
CONFIG_COMP_PHASE_VOCODER=y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you make it modular by default, then please drop =y from all ACE 3.0+ platforms

fft_buf_ptr = fft->fft_buf;
for (j = 0; j < prev_data_size; j++) {
fft_buf_ptr->real = prev_data[j];
fft_buf_ptr->imag = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if using *fft_buf_ptr = (struct icomplex32){.real = prev_data[j], .imag = 0} would give the compiler a better chance to optimise 64-bit writes, but maybe not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll check with profiler run! This will need as heaviest open-source component so far a lot of optimization.

FILE *stft_debug_ifft_out_fh;
#endif

__cold static void phase_vocoder_reset_parameters(struct processing_module *mod)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is called from .reset(), so it shouldn't be __cold?

LOG_MODULE_REGISTER(phase_vocoder_common, CONFIG_SOF_LOG_LEVEL);

/*
* The main processing function for PHASE_VOCODER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to which function is this comment referring?

@singalsu
Copy link
Collaborator Author

singalsu commented Feb 12, 2026

Do I understand it correctly that "vocoder" is a family of algorithms / audio processing methods, and this module implements one of them - speed control? Maybe call it vocoder_speed? Or does "Phase Vocoder" actually mean the same - only the speed processing part?

Need to think the name. Phase vocoder is more generic than this, the same technique also works for pitch shift. The history of this algorithm is as described by this Google AI response:

_"The phase vocoder algorithm, implemented in the Short-Time Fourier
Transform (STFT) domain, was first presented by James L. Flanagan and
Robert M. Golden in their 1966 paper, "Phase Vocoder," published in
The Journal of the Acoustical Society of America.

Original Paper: J. L. Flanagan and R. M. Golden, "Phase Vocoder,"
Journal of the Acoustical Society of America, vol. 40, no. 6,
pp. 1488, Nov. 1966.

Key Contribution: Flanagan and Golden proposed the phase vocoder as an
analysis-synthesis system that uses a filter bank (interpreted as a
sliding Short-Time Fourier Transform) to analyze speech by determining
the instantaneous phase and amplitude of signals within spectral
bands.

Significance: While earlier vocoders (Dudley, 1930s) used analog
hardware, the 1966 phase vocoder provided a digital, frequency-domain
approach to speech coding and processing, which later became the
foundation for time-stretching and pitch-shifting in audio
engineering."_

Name suggestions are welcome. We could also have later in SOF maybe a more generic frequency domain modules architecture with STFT half spectrum streamed from module to module. Instead of samples we could stream spectral coefficients, e.g. in this used 32 bit (magnitude, phase) format that is quite generic for all frequency domain processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants