FENNEC: ultra-low-power bionic speech processing

FENNEC (Feature Extractor with Neural Network for Efficient speech Comprehension) is an ultra-low-power bionic system-on-chip (SoC) that enables always-on voice user interface for extreme edge devices.

ISSCC'25 | JSSC'25 | Project Page | Demo | Citation

This repo contains the source code of the behavioral model of FENNEC's mixed-signal feature extractor (FEx) and the hardware-aware training (HAT) pipeline. Please see the project page and our paper for more information on the role of behavioral modeling and HAT.

Model training and evaluation are based on the spoken language understanding dataset FSCD (Fluent Speech Commands Dataset). Data augmentation involves the (negative) speech samples, noise samples, and room impulse responses fron DNS5.

Instructions for using the repo: Setup | FEx model | HAT

Getting started

Setup environment and datasets

Clone the repo:
```
git clone [email protected]:SensorsINI/fennec.git
cd fennec/python
```
The working directory of all following commands is fennec/python unless stated otherwise.
Prepare the Python environment using conda:
```
conda create -n fennec python=3.12
conda activate fennec
pip install -r requirements.txt
```
The following commands assume that the conda environment fennec has been activated.

Download FSCD using Kaggle API (2.2GB after extraction):

mkdir -p ../dataset
kaggle datasets download -d tommyngx/fluent-speech-corpus -p ../dataset --unzip

Download DNS5 (31GB after extraction):
```
./prepare_DNS5.sh
```

Feature visualization

Visualize the simulated features generated by the behavioral model:

python -m fennec.visualize

The feature visualization app runs at https://127.0.0.1:8050. The app allows you to play with different configurations of the FEx behavioral model and visualize the generated features from FSCD samples. The interface is shown in the figure below.

Model training

Generate features and (pre)train a floating point network from scratch:
```
python -m fennec.train fennec/config/train-afex.yaml \
  --name <pretrain_name>
```
Replace <pretrain_name> with a name of your choice, and the experiment data will be saved at results/<pretrain_name> under the repo root. The experiment data include the training log log.txt, hyperparameter settings hyperparams.yaml, model checkpoints ckpt/, and generated features cache/. The saved features can be reused in other experiments by specifying the --cache_folder argument, shown in the command for the next step.
Quantize the floating point network with quantization-aware training (QAT) and apply Δ-GRU:
```
python -m fennec.train fennec/config/retrain-afex.yaml \
  --name <quantize_name> \
  --pretrain_name <pretrain_name> \
  --cache_folder ../results/<pretrain_name>/seed/<seed>/cache \
  --thres_x 0.125 --thres_h 0.125
```
Replace <quantize_name> with a name of your choice, and use the same <pretrain_name> as the last step. <seed> is the experiment seed (default: 42). The threshold for Δ-GRU (Δ_th) is set to 0.125.
Launch tensorboard to visualize the training progress:
```
tensorboard --logdir=../results
```
Tensorboard runs at https://127.0.0.1:6006.
Extract test set accuracy from training log as CSV format:
```
./parse_results_SSLU.py ../results/<exp_name>/seed/<seed>/log.txt
```
Replace <exp_name> with the name of the training experiment. The script prints the test set accuracy in CSV format with four columns: intent_error_rate, match_exact, match_last, match_any.
- intent_error_rate is the edit distance between the predicted and reference intent sequences divided by the length of the reference intent sequence (i.e., 1), averaged over the test set. The name comes from the commonly used word error rate metric for speech recognition.
- match_exact is the percentage of test samples where the predicted intent sequence matches the reference exactly.
- match_last is the percentage of test samples where the last predicted intent matches the reference.
- match_any is the percentage of test samples where at least one of the predicted intent matches the reference.
Export the trained and quantized model as a C source file:
```
python -m fennec.export fennec/config/export.yaml \
  --name <export_name> \
  --pretrain_name <quantize_name> \
  --cache_folder ../results/<pretrain_name>/seed/<seed>/cache \
  --thres_x 0.125 --thres_h 0.125
```
Replace <export_name> with a name of your choice. Use the same <pretrain_name>, <quantize_name>, and Δ_th as the previous steps. The exported data in ../results/<export_name>/seed/<seed>/model.c contain fixed-point representation of the model parameters stim_wmem, as well as per-channel offset CH_OFFSET and scale CH_SCALE for feature normalization. Everything needed by FENNEC from HAT is contained in the exported model.c.

Citation

If you find our work useful, please consider citing our papers:

Conference publication in ISSCC'25:

@inproceedings{2025-ISSCC-Zhou-fennec,
    author={Zhou, Sheng and Li, Zixiao and Delbruck, Tobi and Kim, Kwantae and Liu, Shih-Chii},
    booktitle={2025 IEEE International Solid-State Circuits Conference (ISSCC)},
    title={An 8.62{μW} {75dB-DR\textsubscript{SoC}} End-to-End Spoken-Language-Understanding {SoC} With Channel-Level {AGC} and Temporal-Sparsity-Aware Streaming-Mode {RNN}},
    year={2025},
    volume={68},
    number={},
    pages={238-240},
    doi={10.1109/ISSCC49661.2025.10904788}
}

Invited journal extension in JSSC'25:

@article{2025-JSSC-Zhou-fennec,
    author={Zhou, Sheng and Li, Zixiao and Cheng, Longbiao and Hadorn, Jérôme and Gao, Chang and Chen, Qinyu and Delbruck, Tobi and Kim, Kwantae and Liu, Shih-Chii},
    journal={IEEE Journal of Solid-State Circuits},
    title={An {8.62-μW} {75-dB} {DR\textsubscript{SoC}} Fully Integrated {SoC} for Spoken Language Understanding},
    year={2025},
    volume={},
    number={},
    pages={1-16},
    doi={10.1109/JSSC.2025.3602936}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
python		python
site		site
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FENNEC: ultra-low-power bionic speech processing

Getting started

Setup environment and datasets

Feature visualization

Model training

Citation

About

Uh oh!

Releases

Packages

Languages

SensorsINI/fennec

Folders and files

Latest commit

History

Repository files navigation

FENNEC: ultra-low-power bionic speech processing

Getting started

Setup environment and datasets

Feature visualization

Model training

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages