Codec-SUPERB: Sound Codec Speech Processing Universal Performance Benchmark

Codec-SUPERB is a comprehensive benchmark designed to evaluate audio codec models across a variety of speech tasks. Our goal is to facilitate community collaboration and accelerate advancements in the field of speech processing by preserving and enhancing speech information quality.

Introduction

Codec-SUPERB sets a new benchmark in evaluating sound codec models, providing a rigorous and transparent framework for assessing performance across a range of speech processing tasks. Our goal is to foster innovation and set new standards in audio quality and processing efficiency.

Key Features

Out-of-the-Box Codec Interface

Codec-SUPERB offers an intuitive, out-of-the-box codec interface that allows for easy integration and testing of various codec models, facilitating quick iterations and experiments.

Multi-Perspective Leaderboard

Codec-SUPERB's unique blend of multi-perspective evaluation and an online leaderboard drives innovation in sound codec research by providing a comprehensive assessment and fostering competitive transparency among developers.

Standardized Environment

We ensure a standardized testing environment to guarantee fair and consistent comparison across all models. This uniformity brings reliability to benchmark results, making them universally interpretable.

Unified Datasets

We provide a collection of unified datasets, curated to test a wide range of speech processing scenarios. This ensures that models are evaluated under diverse conditions, reflecting real-world applications.

Batch Processing

🚀 NEW: Efficient Batch Processing Support

Codec-SUPERB now supports efficient batch processing for encoding and decoding multiple audio samples simultaneously, eliminating the need for for loops and providing significant performance improvements.

✅ Key Benefits

3-5x faster processing for multiple audio samples
GPU optimization through vectorized operations
Automatic padding for variable-length audio samples
Memory efficient batch operations
Backward compatible - existing code continues to work

✅ Supported Operations

batch_extract_unit(): Extract units from multiple audio samples at once
batch_decode_unit(): Decode multiple units back to audio at once
batch_synth(): Complete synthesis pipeline for multiple samples

✅ All Codecs Supported

Every codec in Codec-SUPERB includes optimized batch processing:

EnCodec (all variants): True tensor batching with automatic padding
SpeechTokenizer: RVQ-aware batch processing
AudioDec: Quantizer-optimized batch operations
HuggingFace EnCodec: Native transformer batch processing
Descript Audio Codec: Batch compression/decompression
SQCodec: Feature-aware batch encoding
FunCodec: AudioSignal batch handling
WavTokenizer: Bandwidth-aware batch processing
AcademicCodec: Acoustic token batch generation

Installation

git clone https://github.com/voidful/Codec-SUPERB.git
cd Codec-SUPERB
pip install -r requirements.txt

Usage

Leaderboard

Single Audio Processing

Traditional single audio processing (still fully supported):

from SoundCodec import codec
import torchaudio

# get all available codec
print(codec.list_codec())
# load codec by name, use encodec as example
encodec_24k_6bps = codec.load_codec('encodec_24k_6bps')

# load audio
waveform, sample_rate = torchaudio.load('sample_audio.wav')
resampled_waveform = waveform.numpy()[-1]
data_item = {'audio': {'array': resampled_waveform,
                       'sampling_rate': sample_rate}}

# extract unit
sound_unit = encodec_24k_6bps.extract_unit(data_item).unit

# sound synthesis
decoded_waveform = encodec_24k_6bps.synth(data_item, local_save=False)['audio']['array']

Batch Audio Processing

🚀 NEW: Process multiple audio samples efficiently:

from SoundCodec import codec
import torchaudio

# load codec
encodec_24k_6bps = codec.load_codec('encodec_24k_6bps')

# prepare multiple audio samples
audio_files = ['audio1.wav', 'audio2.wav', 'audio3.wav']
data_list = []

for audio_file in audio_files:
    waveform, sample_rate = torchaudio.load(audio_file)
    data_item = {
        'id': audio_file,
        'audio': {
            'array': waveform.numpy()[0],  # take first channel
            'sampling_rate': sample_rate
        }
    }
    data_list.append(data_item)

# OPTION 1: Batch extraction and decoding (recommended)
batch_extracted = encodec_24k_6bps.batch_extract_unit(data_list)
print(f"Extracted {batch_extracted.batch_size} samples")
print(f"Unit shapes: {[unit.shape for unit in batch_extracted.units]}")

batch_decoded = encodec_24k_6bps.batch_decode_unit(batch_extracted)
print(f"Decoded audio shapes: {[audio.shape for audio in batch_decoded]}")

# OPTION 2: Complete batch synthesis pipeline
results = encodec_24k_6bps.batch_synth(data_list, local_save=False)
for i, result in enumerate(results):
    print(f"Sample {i}: unit shape {result['unit'].shape}, "
          f"audio shape {result['audio']['array'].shape}")

Performance Comparison

Compare single vs batch processing performance:

import time

# Single processing (old approach)
start_time = time.time()
single_results = []
for data in data_list:
    extracted = encodec_24k_6bps.extract_unit(data)
    decoded = encodec_24k_6bps.decode_unit(extracted.stuff_for_synth)
    single_results.append(decoded)
single_time = time.time() - start_time

# Batch processing (new approach)  
start_time = time.time()
batch_extracted = encodec_24k_6bps.batch_extract_unit(data_list)
batch_results = encodec_24k_6bps.batch_decode_unit(batch_extracted)
batch_time = time.time() - start_time

print(f"Single processing: {single_time:.3f}s")
print(f"Batch processing: {batch_time:.3f}s") 
print(f"Speedup: {single_time/batch_time:.2f}x")

Advanced Batch Processing Tips

Group samples by length for optimal performance:

# Group samples by similar lengths
short_samples = [data for data in data_list if len(data['audio']['array']) < 48000]
long_samples = [data for data in data_list if len(data['audio']['array']) >= 48000]

# Process each group separately for better efficiency
if short_samples:
    short_results = encodec_24k_6bps.batch_extract_unit(short_samples)
if long_samples:
    long_results = encodec_24k_6bps.batch_extract_unit(long_samples)

Process large datasets in chunks:

def process_large_dataset(codec, data_list, batch_size=8):
    all_results = []
    for i in range(0, len(data_list), batch_size):
        batch = data_list[i:i+batch_size]
        batch_results = codec.batch_synth(batch, local_save=False)
        all_results.extend(batch_results)
    return all_results

# Process large dataset efficiently
large_results = process_large_dataset(encodec_24k_6bps, large_data_list)

Testing

Run the test suite to verify codec functionality:

# Run all tests
python -m pytest SoundCodec/test/

# Run batch processing tests specifically
python -m pytest SoundCodec/test/test_batch_processing.py -v

# Run performance benchmarks
python SoundCodec/test/benchmark_batch_performance.py

Citation

If you use this code or result in your paper, please cite our work as:

@article{wu2024codec,
  title={Codec-superb: An in-depth analysis of sound codec models},
  author={Wu, Haibin and Chung, Ho-Lam and Lin, Yi-Cheng and Wu, Yuan-Kuei and Chen, Xuanjun and Pai, Yu-Chi and Wang, Hsiu-Hsuan and Chang, Kai-Wei and Liu, Alexander H and Lee, Hung-yi},
  journal={arXiv preprint arXiv:2402.13071},
  year={2024}
}

@article{wu2024towards,
  title={Towards audio language modeling-an overview},
  author={Wu, Haibin and Chen, Xuanjun and Lin, Yi-Cheng and Chang, Kai-wei and Chung, Ho-Lam and Liu, Alexander H and Lee, Hung-yi},
  journal={arXiv preprint arXiv:2402.13236},
  year={2024}
}

@inproceedings{wu-etal-2024-codec,
    title = "Codec-{SUPERB}: An In-Depth Analysis of Sound Codec Models",
    author = "Wu, Haibin  and
      Chung, Ho-Lam  and
      Lin, Yi-Cheng  and
      Wu, Yuan-Kuei  and
      Chen, Xuanjun  and
      Pai, Yu-Chi  and
      Wang, Hsiu-Hsuan  and
      Chang, Kai-Wei  and
      Liu, Alexander  and
      Lee, Hung-yi",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.616",
    doi = "10.18653/v1/2024.findings-acl.616",
    pages = "10330--10348",
}

Contribution

Contributions are highly encouraged, whether it's through adding new codec models, expanding the dataset collection, or enhancing the benchmarking framework. Please see CONTRIBUTING.md for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github/workflows		.github/workflows
SoundCodec		SoundCodec
application		application
img		img
v2		v2
web		web
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
benchmarking.py		benchmarking.py
dataset_checker.py		dataset_checker.py
dataset_creator.py		dataset_creator.py
dataset_updater.py		dataset_updater.py
metrics.py		metrics.py
requirements.txt		requirements.txt
sampling_dataset.py		sampling_dataset.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Codec-SUPERB: Sound Codec Speech Processing Universal Performance Benchmark

Table of Contents

Introduction

Key Features

Out-of-the-Box Codec Interface

Multi-Perspective Leaderboard

Standardized Environment

Unified Datasets

Batch Processing

✅ Key Benefits

✅ Supported Operations

✅ All Codecs Supported

Installation

Usage

Leaderboard

Single Audio Processing

Batch Audio Processing

Performance Comparison

Advanced Batch Processing Tips

Testing

Citation

Contribution

License

Reference Sound Codec Repositories：

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

voidful/Codec-SUPERB

Folders and files

Latest commit

History

Repository files navigation

Codec-SUPERB: Sound Codec Speech Processing Universal Performance Benchmark

Table of Contents

Introduction

Key Features

Out-of-the-Box Codec Interface

Multi-Perspective Leaderboard

Standardized Environment

Unified Datasets

Batch Processing

✅ Key Benefits

✅ Supported Operations

✅ All Codecs Supported

Installation

Usage

Leaderboard

Single Audio Processing

Batch Audio Processing

Performance Comparison

Advanced Batch Processing Tips

Testing

Citation

Contribution

License

Reference Sound Codec Repositories：

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages