Skip to content

SensorsINI/fennec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FENNEC: ultra-low-power bionic speech processing

FENNEC (Feature Extractor with Neural Network for Efficient speech Comprehension) is an ultra-low-power bionic system-on-chip (SoC) that enables always-on voice user interface for extreme edge devices.

ISSCC'25 | JSSC'25 | Project Page | Demo | Citation


This repo contains the source code of the behavioral model of FENNEC's mixed-signal feature extractor (FEx) and the hardware-aware training (HAT) pipeline. Please see the project page and our paper for more information on the role of behavioral modeling and HAT.

Model training and evaluation are based on the spoken language understanding dataset FSCD (Fluent Speech Commands Dataset). Data augmentation involves the (negative) speech samples, noise samples, and room impulse responses fron DNS5.

Instructions for using the repo: Setup | FEx model | HAT

Getting started

Setup environment and datasets

  • Clone the repo:

    git clone [email protected]:SensorsINI/fennec.git
    cd fennec/python

    The working directory of all following commands is fennec/python unless stated otherwise.

  • Prepare the Python environment using conda:

    conda create -n fennec python=3.12
    conda activate fennec
    pip install -r requirements.txt

    The following commands assume that the conda environment fennec has been activated.

  • Download FSCD using Kaggle API (2.2GB after extraction):

    mkdir -p ../dataset
    kaggle datasets download -d tommyngx/fluent-speech-corpus -p ../dataset --unzip
  • Download DNS5 (31GB after extraction):

    ./prepare_DNS5.sh

Feature visualization

Visualize the simulated features generated by the behavioral model:

python -m fennec.visualize

The feature visualization app runs at https://127.0.0.1:8050. The app allows you to play with different configurations of the FEx behavioral model and visualize the generated features from FSCD samples. The interface is shown in the figure below. visualization

Model training

  • Generate features and (pre)train a floating point network from scratch:

    python -m fennec.train fennec/config/train-afex.yaml \
      --name <pretrain_name>

    Replace <pretrain_name> with a name of your choice, and the experiment data will be saved at results/<pretrain_name> under the repo root. The experiment data include the training log log.txt, hyperparameter settings hyperparams.yaml, model checkpoints ckpt/, and generated features cache/. The saved features can be reused in other experiments by specifying the --cache_folder argument, shown in the command for the next step.

  • Quantize the floating point network with quantization-aware training (QAT) and apply Δ-GRU:

    python -m fennec.train fennec/config/retrain-afex.yaml \
      --name <quantize_name> \
      --pretrain_name <pretrain_name> \
      --cache_folder ../results/<pretrain_name>/seed/<seed>/cache \
      --thres_x 0.125 --thres_h 0.125

    Replace <quantize_name> with a name of your choice, and use the same <pretrain_name> as the last step. <seed> is the experiment seed (default: 42). The threshold for Δ-GRU (Δth) is set to 0.125.

  • Launch tensorboard to visualize the training progress:

    tensorboard --logdir=../results

    Tensorboard runs at https://127.0.0.1:6006.

  • Extract test set accuracy from training log as CSV format:

    ./parse_results_SSLU.py ../results/<exp_name>/seed/<seed>/log.txt

    Replace <exp_name> with the name of the training experiment. The script prints the test set accuracy in CSV format with four columns: intent_error_rate, match_exact, match_last, match_any.

    • intent_error_rate is the edit distance between the predicted and reference intent sequences divided by the length of the reference intent sequence (i.e., 1), averaged over the test set. The name comes from the commonly used word error rate metric for speech recognition.
    • match_exact is the percentage of test samples where the predicted intent sequence matches the reference exactly.
    • match_last is the percentage of test samples where the last predicted intent matches the reference.
    • match_any is the percentage of test samples where at least one of the predicted intent matches the reference.
  • Export the trained and quantized model as a C source file:

    python -m fennec.export fennec/config/export.yaml \
      --name <export_name> \
      --pretrain_name <quantize_name> \
      --cache_folder ../results/<pretrain_name>/seed/<seed>/cache \
      --thres_x 0.125 --thres_h 0.125

    Replace <export_name> with a name of your choice. Use the same <pretrain_name>, <quantize_name>, and Δth as the previous steps. The exported data in ../results/<export_name>/seed/<seed>/model.c contain fixed-point representation of the model parameters stim_wmem, as well as per-channel offset CH_OFFSET and scale CH_SCALE for feature normalization. Everything needed by FENNEC from HAT is contained in the exported model.c.

Citation

If you find our work useful, please consider citing our papers:

Conference publication in ISSCC'25:

@inproceedings{2025-ISSCC-Zhou-fennec,
    author={Zhou, Sheng and Li, Zixiao and Delbruck, Tobi and Kim, Kwantae and Liu, Shih-Chii},
    booktitle={2025 IEEE International Solid-State Circuits Conference (ISSCC)},
    title={An 8.62{μW} {75dB-DR\textsubscript{SoC}} End-to-End Spoken-Language-Understanding {SoC} With Channel-Level {AGC} and Temporal-Sparsity-Aware Streaming-Mode {RNN}},
    year={2025},
    volume={68},
    number={},
    pages={238-240},
    doi={10.1109/ISSCC49661.2025.10904788}
}

Invited journal extension in JSSC'25:

@article{2025-JSSC-Zhou-fennec,
    author={Zhou, Sheng and Li, Zixiao and Cheng, Longbiao and Hadorn, Jérôme and Gao, Chang and Chen, Qinyu and Delbruck, Tobi and Kim, Kwantae and Liu, Shih-Chii},
    journal={IEEE Journal of Solid-State Circuits},
    title={An {8.62-μW} {75-dB} {DR\textsubscript{SoC}} Fully Integrated {SoC} for Spoken Language Understanding},
    year={2025},
    volume={},
    number={},
    pages={1-16},
    doi={10.1109/JSSC.2025.3602936}
}

About

Ultra-low-power bionic speech processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published