bug: utils.load_utils.extract_fbank 函數當遇到 雙聲道音檔時會錯誤

Notice: In order to resolve issues more efficiently, please raise issue following the template.
（注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

## 🐛 Bug



https://github.com/modelscope/FunASR/blob/main/funasr/utils/load_utils.py#L198 extract_fbank 函數當遇到 雙聲道音檔時會錯誤。這是因為這個時候 `data.shape=[2, 11520]`，但是 `data_len=[11520]`，`data_len`應該要跟 `batch_size`一樣所以應該是 `data_len=[11520, 11520]`。

### To Reproduce

Steps to reproduce the behavior (**always include the command you ran**):

1. 下載這個 雙聲道音檔

[shuiqian1004_90.mp3](https://github.com/user-attachments/files/24855507/shuiqian1004_90.mp3)

2. 修改這個腳本使它執行上面的音檔 https://github.com/FunAudioLLM/Fun-ASR/blob/main/demo2.py
3. 看到錯誤：

```
  File "/home/ubuntu/funasr/Fun-ASR/model.py", line 609, in inference
    return self.inference_llm(
           ~~~~~~~~~~~~~~~~~~^
        data_in,
        ^^^^^^^^
    ...<4 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/home/ubuntu/funasr/Fun-ASR/model.py", line 627, in inference_llm
    inputs_embeds, contents, batch, source_ids, meta_data = self.inference_prepare(
                                                            ~~~~~~~~~~~~~~~~~~~~~~^
        data_in, data_lengths, key, tokenizer, frontend, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/ubuntu/funasr/Fun-ASR/model.py", line 483, in inference_prepare
    output = self.data_load_speech(
        contents, tokenizer, frontend, meta_data=meta_data, **kwargs
    )
  File "/home/ubuntu/funasr/Fun-ASR/model.py", line 391, in data_load_speech
    speech, speech_lengths = extract_fbank(
                             ~~~~~~~~~~~~~^
        data_src,
        ^^^^^^^^^
    ...<2 lines>...
        is_final=True,
        ^^^^^^^^^^^^^^
    )  # speech: [b, T, d]
    ^
  File "/home/ubuntu/.local/lib/python3.13/site-packages/funasr/utils/load_utils.py", line 218, in extract_fbank
    data, data_len = frontend(data, data_len, **kwargs)
                     ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/.local/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/.local/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.13/site-packages/funasr/frontends/wav_frontend.py", line 128, in forward
    waveform_length = input_lengths[i]
                      ~~~~~~~~~~~~~^^^
IndexError: list index out of range
```




#### Code sample


```
import numpy as np
import soundfile as sf
import torch

from model import FunASRNano
from tools.utils import load_audio


def main():    
    wav_path = "shuiqian1004_90.mp3"

    model_dir = "FunAudioLLM/Fun-ASR-Nano-2512"
    device = (
        "cuda:0"
        if torch.cuda.is_available()
        else "mps"
        if torch.backends.mps.is_available()
        else "cpu"
    )
    m, kwargs = FunASRNano.from_pretrained(model=model_dir, device=device)
    tokenizer = kwargs.get("tokenizer", None)
    m.eval()

    chunk_size = 0.72
    duration = sf.info(wav_path).duration
    cum_durations = np.arange(chunk_size, duration + chunk_size, chunk_size)
    prev_text = ""
    for idx, cum_duration in enumerate(cum_durations):
        audio, rate = load_audio(wav_path, 16000, duration=round(cum_duration, 3))
        prev_text = m.inference([torch.tensor(audio)], prev_text=prev_text, **kwargs)[0][0]["text"]
        if idx != len(cum_durations) - 1:
            prev_text = tokenizer.decode(tokenizer.encode(prev_text)[:-5]).replace("�", "")
        if prev_text:
            print(prev_text)


if __name__ == "__main__":
    main()
```

### Expected behavior


聽寫執行成功。

### Environment

 - OS (e.g., Linux): `Linux funasr 6.6.87.2-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Thu Jun  5 18:30:46 UTC 2025 x86_64 GNU/Linux`
 - FunASR Version (e.g., 1.0.0): 1.3.0
 - ModelScope Version (e.g., 1.11.0): 1.34.0
 - PyTorch Version (e.g., 2.0.0): 2.10.0+cu128
 - How you installed funasr (`pip`, source): pip
 - Python version: 3.13.7
 - GPU (e.g., V100M32): 無，使用 CPU
 - CUDA/cuDNN version (e.g., cuda11.7): 無，使用 CPU
 - Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1): 無，使用 ubuntu
 - Any other relevant information:

### Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: utils.load_utils.extract_fbank 函數當遇到雙聲道音檔時會錯誤 #2793

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: utils.load_utils.extract_fbank 函數當遇到 雙聲道音檔時會錯誤 #2793

Description

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

bug: utils.load_utils.extract_fbank 函數當遇到雙聲道音檔時會錯誤 #2793