FENNEC (Feature Extractor with Neural Network for Efficient speech Comprehension) is an ultra-low-power bionic system-on-chip (SoC) that enables always-on voice user interface for extreme edge devices.
ISSCC'25 | JSSC'25 | Project Page | Demo | Citation
This repo contains the source code of the behavioral model of FENNEC's mixed-signal feature extractor (FEx) and the hardware-aware training (HAT) pipeline. Please see the project page and our paper for more information on the role of behavioral modeling and HAT.
Model training and evaluation are based on the spoken language understanding dataset FSCD (Fluent Speech Commands Dataset). Data augmentation involves the (negative) speech samples, noise samples, and room impulse responses fron DNS5.
Instructions for using the repo: Setup | FEx model | HAT
-
Clone the repo:
git clone [email protected]:SensorsINI/fennec.git cd fennec/python
The working directory of all following commands is
fennec/pythonunless stated otherwise. -
Prepare the Python environment using conda:
conda create -n fennec python=3.12 conda activate fennec pip install -r requirements.txt
The following commands assume that the conda environment
fennechas been activated. -
Download FSCD using Kaggle API (2.2GB after extraction):
mkdir -p ../dataset kaggle datasets download -d tommyngx/fluent-speech-corpus -p ../dataset --unzip
-
Download DNS5 (31GB after extraction):
./prepare_DNS5.sh
Visualize the simulated features generated by the behavioral model:
python -m fennec.visualizeThe feature visualization app runs at https://127.0.0.1:8050.
The app allows you to play with different configurations of the FEx behavioral model and visualize the generated features from FSCD samples.
The interface is shown in the figure below.

-
Generate features and (pre)train a floating point network from scratch:
python -m fennec.train fennec/config/train-afex.yaml \ --name <pretrain_name>
Replace
<pretrain_name>with a name of your choice, and the experiment data will be saved atresults/<pretrain_name>under the repo root. The experiment data include the training loglog.txt, hyperparameter settingshyperparams.yaml, model checkpointsckpt/, and generated featurescache/. The saved features can be reused in other experiments by specifying the--cache_folderargument, shown in the command for the next step. -
Quantize the floating point network with quantization-aware training (QAT) and apply Δ-GRU:
python -m fennec.train fennec/config/retrain-afex.yaml \ --name <quantize_name> \ --pretrain_name <pretrain_name> \ --cache_folder ../results/<pretrain_name>/seed/<seed>/cache \ --thres_x 0.125 --thres_h 0.125
Replace
<quantize_name>with a name of your choice, and use the same<pretrain_name>as the last step.<seed>is the experiment seed (default: 42). The threshold for Δ-GRU (Δth) is set to0.125. -
Launch tensorboard to visualize the training progress:
tensorboard --logdir=../results
Tensorboard runs at
https://127.0.0.1:6006. -
Extract test set accuracy from training log as CSV format:
./parse_results_SSLU.py ../results/<exp_name>/seed/<seed>/log.txt
Replace
<exp_name>with the name of the training experiment. The script prints the test set accuracy in CSV format with four columns:intent_error_rate,match_exact,match_last,match_any.intent_error_rateis the edit distance between the predicted and reference intent sequences divided by the length of the reference intent sequence (i.e., 1), averaged over the test set. The name comes from the commonly used word error rate metric for speech recognition.match_exactis the percentage of test samples where the predicted intent sequence matches the reference exactly.match_lastis the percentage of test samples where the last predicted intent matches the reference.match_anyis the percentage of test samples where at least one of the predicted intent matches the reference.
-
Export the trained and quantized model as a C source file:
python -m fennec.export fennec/config/export.yaml \ --name <export_name> \ --pretrain_name <quantize_name> \ --cache_folder ../results/<pretrain_name>/seed/<seed>/cache \ --thres_x 0.125 --thres_h 0.125
Replace
<export_name>with a name of your choice. Use the same<pretrain_name>,<quantize_name>, and Δth as the previous steps. The exported data in../results/<export_name>/seed/<seed>/model.ccontain fixed-point representation of the model parametersstim_wmem, as well as per-channel offsetCH_OFFSETand scaleCH_SCALEfor feature normalization. Everything needed by FENNEC from HAT is contained in the exportedmodel.c.
If you find our work useful, please consider citing our papers:
Conference publication in ISSCC'25:
@inproceedings{2025-ISSCC-Zhou-fennec,
author={Zhou, Sheng and Li, Zixiao and Delbruck, Tobi and Kim, Kwantae and Liu, Shih-Chii},
booktitle={2025 IEEE International Solid-State Circuits Conference (ISSCC)},
title={An 8.62{μW} {75dB-DR\textsubscript{SoC}} End-to-End Spoken-Language-Understanding {SoC} With Channel-Level {AGC} and Temporal-Sparsity-Aware Streaming-Mode {RNN}},
year={2025},
volume={68},
number={},
pages={238-240},
doi={10.1109/ISSCC49661.2025.10904788}
}Invited journal extension in JSSC'25:
@article{2025-JSSC-Zhou-fennec,
author={Zhou, Sheng and Li, Zixiao and Cheng, Longbiao and Hadorn, Jérôme and Gao, Chang and Chen, Qinyu and Delbruck, Tobi and Kim, Kwantae and Liu, Shih-Chii},
journal={IEEE Journal of Solid-State Circuits},
title={An {8.62-μW} {75-dB} {DR\textsubscript{SoC}} Fully Integrated {SoC} for Spoken Language Understanding},
year={2025},
volume={},
number={},
pages={1-16},
doi={10.1109/JSSC.2025.3602936}
}