Voice Wake-Up Feature
Overview
py-xiaozhi integrates high-precision voice wake-up based on Sherpa-ONNX, supporting custom wake words and real-time detection. It uses a lightweight keyword detection model, delivering millisecond-level response speeds.
Wake Word Model
Built-in Models (Ready to Use)
The repository already includes Sherpa-ONNX keyword detection models in the models/zh and models/en directories, ready to use:
models/zh: Chinese model (defaultMODEL_PATH),keywords.txtcomes pre-configured with the 「小爱同学」 wake word, usable withWAKE_WORD_LANG: "zh", and supports adding extra pinyin lines.models/en: English model (BPE units),keywords.txtdefaults to theMOSSactivation word, usable withWAKE_WORD_LANG: "en".- Each directory already contains
encoder.onnx/decoder.onnx/joiner.onnx/tokens.txt/keywords.txt; no additional downloads are needed to run. - To customize wake words, simply edit
keywords.txtin the corresponding language directory and switchMODEL_PATHandWAKE_WORD_LANGinWAKE_WORD_OPTIONS.
| Language | MODEL_PATH | WAKE_WORD_LANG | Default keywords.txt Activation Word | Example WAKE_WORD Value |
|---|---|---|---|---|
| Chinese | models/zh | zh | 小爱同学 | 小爱同学 |
| English | models/en | en | MOSS | MOSS |
When switching languages, you must simultaneously:
- Update
WAKE_WORD_OPTIONS.MODEL_PATHto point to the target language directory; - Set
WAKE_WORD_OPTIONS.WAKE_WORD_LANGtozhoren; - Update
WAKE_WORD_OPTIONS.WAKE_WORDto a phrase present in the currentkeywords.txt(e.g.,MOSSor小爱同学); - If you edit
keywords.txt, ensure the new wake word's spelling/pinyin matches the configuration, otherwise it will not be detected.
Only follow the steps below if you want to update the model version or replace it with another training set.
Model Download (Optional)
Important: If replacing the model, download the configuration in advance.
Official Model Download Links
- Official Model List: https://csukuangfj.github.io/sherpa/onnx/kws/pretrained_models/index.html
- Recommended Model:
sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01
Download and Configuration Steps
1. Download the Model Package
# Method 1: Direct download (recommended)
cd /Users/junsen/Desktop/workspace/py-xiaozhi
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
# Extract
tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
# Method 2: Use ModelScope
pip install modelscope
python -c "
from modelscope import snapshot_download
snapshot_download('pkufool/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01', cache_dir='./models')
"2. Configure Model Files
After downloading, the model package contains the following files:
sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/
├── encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx # Speed priority
├── encoder-epoch-12-avg-2-chunk-16-left-64.onnx #
├── encoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx # Speed priority
├── encoder-epoch-99-avg-1-chunk-16-left-64.onnx # Accuracy priority
├── decoder-epoch-12-avg-2-chunk-16-left-64.onnx #
├── decoder-epoch-99-avg-1-chunk-16-left-64.onnx # Accuracy priority
├── joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx # Speed priority
├── joiner-epoch-12-avg-2-chunk-16-left-64.onnx #
├── joiner-epoch-99-avg-1-chunk-16-left-64.int8.onnx # Speed priority
├── joiner-epoch-99-avg-1-chunk-16-left-64.onnx # Accuracy priority
├── tokens.txt # Token mapping table (required)
├── keywords_raw.txt # Raw keywords (optional, for generation)
├── keywords.txt # Ready-to-use
├── test_wavs/ # Test audio (optional)
├── configuration.json # Model metadata (optional)
└── README.md # Documentation (optional)3. Choose Configuration Plan
Option 1: Accuracy Priority (Recommended)
cd sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01
# Copy accuracy-priority epoch-99 fp32 trio
cp encoder-epoch-99-avg-1-chunk-16-left-64.onnx ../models/encoder.onnx
cp decoder-epoch-99-avg-1-chunk-16-left-64.onnx ../models/decoder.onnx
cp joiner-epoch-99-avg-1-chunk-16-left-64.onnx ../models/joiner.onnx
# Copy supporting files
cp tokens.txt ../models/tokens.txt
cp keywords_raw.txt ../models/keywords_raw.txt # OptionalOption 2: Speed Priority
cd sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01
# Copy speed-priority epoch-99 int8 trio
cp encoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx ../models/encoder.onnx
cp decoder-epoch-99-avg-1-chunk-16-left-64.onnx ../models/decoder.onnx
cp joiner-epoch-99-avg-1-chunk-16-left-64.int8.onnx ../models/joiner.onnx
# Copy supporting files
cp tokens.txt ../models/tokens.txtImportant Notes:
- Do NOT mix fp32 and int8: All three model files must maintain consistent precision
- Prefer epoch-99: More thoroughly trained than epoch-12, with higher accuracy
- Required files:
encoder.onnx+decoder.onnx+joiner.onnx+tokens.txt+keywords.txt
Final Model File Structure
After configuration, your models directory should contain:
models/
├── encoder.onnx # Encoder model (renamed)
├── decoder.onnx # Decoder model (renamed)
├── joiner.onnx # Joiner model (renamed)
├── tokens.txt # Pinyin token mapping table (228-line version)
├── keywords.txt # Keyword configuration file (needs creation)
└── keywords_raw.txt # Raw keyword file (optional)Model Performance Comparison
| Model Version | File Size | Inference Speed | Accuracy | Resource Usage | Recommended Scenario |
|---|---|---|---|---|---|
| epoch-99 fp32 | ~13MB | Medium | Highest | Medium | Desktop (Recommended) |
| epoch-99 int8 | ~4MB | Fast | High | Low | Mobile / resource-constrained |
| epoch-12 fp32 | ~13MB | Medium | Medium-High | Medium | General use |
| epoch-12 int8 | ~4MB | Fastest | Medium | Lowest | Extreme speed requirements |
Enabling Voice Wake-Up
Configuration File Settings
Edit config/config.json:
{
"WAKE_WORD_OPTIONS": {
"USE_WAKE_WORD": true,
"MODEL_PATH": "models/zh",
"NUM_THREADS": 4,
"PROVIDER": "cpu",
"MAX_ACTIVE_PATHS": 2,
"KEYWORDS_SCORE": 1.8,
"KEYWORDS_THRESHOLD": 0.2,
"NUM_TRAILING_BLANKS": 1,
"WAKE_WORD": "小爱同学",
"WAKE_WORD_LANG": "zh"
}
}To switch to English wake words, simply change MODEL_PATH to models/en, WAKE_WORD to MOSS, and set WAKE_WORD_LANG to en. Regardless of the language chosen, WAKE_WORD must match one of the lines in the corresponding keywords.txt.
Configuration Parameter Details
| Parameter | Default Value | Description | Tuning Advice |
|---|---|---|---|
USE_WAKE_WORD | true | Enable voice wake-up | - |
MODEL_PATH | "models/zh" | Model file directory | Chinese: models/zh, English: models/en |
NUM_THREADS | 4 | Number of processing threads | Set 6-8 on powerful machines |
PROVIDER | "cpu" | Inference engine | Options: cpu, cuda, coreml |
MAX_ACTIVE_PATHS | 2 | Number of search paths | Lower = faster, higher = more accurate |
KEYWORDS_SCORE | 1.8 | Keyword boost score | Higher = fewer false positives, lower = more sensitive |
KEYWORDS_THRESHOLD | 0.2 | Detection threshold | Lower = more sensitive, higher = fewer false positives |
NUM_TRAILING_BLANKS | 1 | Number of trailing blanks | Usually keep at 1 |
WAKE_WORD | "小爱同学" | Wake word text (for UI display and logs) | Must exist in keywords.txt |
WAKE_WORD_LANG | "zh" | Wake word language (zh/en) | Must match MODEL_PATH |
Custom Wake Words
Recommended Method: Settings Window One-Click Save
- Open the desktop UI and go to Settings → Wake Words (
src/ui/gui/qml/windows/settings/WakeWordTab.qml). - Check "Enable Wake Word" and type a Chinese or English phrase directly into the input field (e.g., "你好小智" or "Hey Moss").
- A "conversion preview" is displayed in real time during input; Chinese is automatically converted to pinyin, English uses BPE tokens, and language tags are shown.
- Click "Save Wake Word" and the system calls
convert_wake_word()(src/ui/shared/models/settings_model.py:314-365):- Automatically detects language, updates
WAKE_WORD,WAKE_WORD_LANG, andMODEL_PATH; - Writes the generated keyword to the corresponding directory (
models/zh/keywords.txtormodels/en/keywords.txt); - Keeps the configuration file and UI state in sync.
- Automatically detects language, updates
This method is suitable for a single primary wake word scenario; saving will overwrite the content of the target
keywords.txt. If you need to keep multiple candidates or perform batch management, use "Manual Addition" below instead.
Currently Supported Wake Words
The repository ships with a set of Chinese and English wake words that work immediately after launch:
Chinese (models/zh/keywords.txt)
x iǎo ài t óng x ué @小爱同学Set
WAKE_WORDto小爱同学andWAKE_WORD_LANGtozhto wake directly; if you add custom words like "中国好助手", update these two fields as well.
English (models/en/keywords.txt)
▁MO S S @MOSSEnglish keywords use Sherpa-ONNX's SentencePiece/BPE tokens, with the initial
▁indicating word start. SetMODEL_PATH: "models/en",WAKE_WORD: "MOSS", andWAKE_WORD_LANG: "en"to use.
Manually Adding New Wake Words (Advanced)
Method 1: Directly Edit the Keyword File
Edit the corresponding file based on the chosen language:
- Chinese: Edit
models/zh/keywords.txt, one wake word per line, using "pinyin + @original text" format; - English: Edit
models/en/keywords.txt, one wake word per line, using SentencePiece tokens (example:▁HE L LO ▁X I AO ▁Z H I @HELLO XIAOZHI). It is recommended to use the conversion scripts provided by Sherpa-ONNX or refer to existing examples to maintain capitalization and spacing.
# Format: pinyin breakdown @Chinese original text
x iǎo zh ì @小智
n ǐ h ǎo x iǎo zh ì @你好小智
j iā w éi s ī @贾维斯
k āi sh ǐ g ōng z uò @开始工作After editing, remember to update
WAKE_WORDto the new wake word text, and confirm thatWAKE_WORD_LANGmatches the directory, otherwise wake-up will not trigger. If you later use the Settings UI's "Save Wake Word" feature, the file will be overwritten with a single line.
Method 2: Use the Pinyin Conversion Tool
from pypinyin import lazy_pinyin, Style
def generate_keyword_line(text):
pinyin_list = lazy_pinyin(text, style=Style.TONE3, neutral_tone_with_five=True)
processed_pinyin = [py.rstrip('12345') for py in pinyin_list]
pinyin_str = ' '.join(processed_pinyin)
return f'{pinyin_str} @{text}'
# Generate new wake words
wake_words = ['小助手', '开始工作', '星期五']
for word in wake_words:
print(generate_keyword_line(word))Wake Word Selection Tips
Recommended Wake Word Characteristics
- Moderate length: 2-4 characters
- Clear pronunciation: Avoid similar-sounding words
- Strong uniqueness: Avoid common daily conversation words
- Easy to say: Easy to remember and pronounce
Examples of Good Wake Words
- 你好小智 # 4 characters, unique, clear
- 贾维斯 # 3 characters, unique, tech-savvy
- 开始工作 # 4 characters, clear intent
- 小助手 # 3 characters, simple and easy to rememberAvoid Using
- 嗯 # Too short, easily triggered accidentally
- 你好 # Too common
- 请帮我做一个计划 # Too long
- 谢谢 # Everyday phraseUsage
Startup Process
Start the program:
bashcd /Users/junsen/Desktop/workspace/py-xiaozhi python main.pyModel loading:
- The system automatically loads the Sherpa-ONNX model
- Initializes the keyword detector
- Enters wake word listening state
Voice wake-up:
- Clearly speak the configured wake word
- The system automatically switches to the LISTENING state
- Begins voice conversation
Usage Tips
Best Wake-Up Practices
- Moderate volume: Normal speaking volume
- Natural speed: Not too fast or too slow
- Clear articulation: Pay special attention to tones
- Quiet environment: Minimize background noise
Performance Optimization
Speed Optimization Configuration
{
"WAKE_WORD_OPTIONS": {
"NUM_THREADS": 6, // Increase thread count
"MAX_ACTIVE_PATHS": 1, // Reduce search paths
"KEYWORDS_THRESHOLD": 0.15, // Lower threshold for higher sensitivity
"KEYWORDS_SCORE": 1.5 // Lower score for faster speed
}
}Accuracy Optimization Configuration
{
"WAKE_WORD_OPTIONS": {
"NUM_THREADS": 4, // Moderate thread count
"MAX_ACTIVE_PATHS": 3, // Increase search paths
"KEYWORDS_THRESHOLD": 0.25, // Higher threshold to reduce false positives
"KEYWORDS_SCORE": 2.2 // Higher score for better accuracy
}
}Performance Monitoring
Check current performance:
# View statistics in the application
stats = wake_word_detector.get_performance_stats()
print(f"Engine: {stats['engine']}")
print(f"Threads: {stats['num_threads']}")
print(f"Detection threshold: {stats['keywords_threshold']}")
print(f"Running: {stats['is_running']}")Troubleshooting
Common Issues
1. Wake Word Not Responding
Symptom: No response when speaking the wake word
Solutions:
# Check configuration
grep -A 10 "WAKE_WORD_OPTIONS" config/config.json
# Check model files
ls -la models/
# Test functionality
python test_new_keywords.py2. Slow Response
Symptom: Large delay in wake word recognition
Solutions:
{
"WAKE_WORD_OPTIONS": {
"KEYWORDS_THRESHOLD": 0.15, // Lower threshold
"NUM_THREADS": 6, // Increase threads
"MAX_ACTIVE_PATHS": 1 // Reduce search paths
}
}3. Frequent False Triggers
Symptom: Frequently triggers wake-up by mistake
Solutions:
{
"WAKE_WORD_OPTIONS": {
"KEYWORDS_THRESHOLD": 0.3, // Raise threshold
"KEYWORDS_SCORE": 2.5, // Raise score
"MAX_ACTIVE_PATHS": 3 // Increase search paths
}
}4. Model Loading Failure
Symptom: Model file errors on startup
Solutions:
# Check file integrity
ls -la models/
file models/*.onnx
file models/tokens.txt
# Re-validate model
python test_new_keywords.pyDebug Commands
# View system logs
tail -f logs/app.log | grep -i kws
# Monitor performance
top -p $(pgrep -f "python main.py")
# Test audio device
python -c "import sounddevice as sd; print(sd.query_devices())"Advanced Configuration
Environment Adaptation
Quiet Environment (Office)
{
"WAKE_WORD_OPTIONS": {
"KEYWORDS_THRESHOLD": 0.15,
"KEYWORDS_SCORE": 1.5,
"MAX_ACTIVE_PATHS": 1
}
}Noisy Environment (Open Space)
{
"WAKE_WORD_OPTIONS": {
"KEYWORDS_THRESHOLD": 0.25,
"KEYWORDS_SCORE": 2.5,
"MAX_ACTIVE_PATHS": 3
}
}Integration with AEC
Voice wake-up integrates perfectly with echo cancellation (AEC):
{
"AEC_OPTIONS": {
"ENABLED": true, // AEC provides clean audio for wake word
"ENABLE_PREPROCESS": true // Noise suppression improves detection accuracy
},
"WAKE_WORD_OPTIONS": {
"USE_WAKE_WORD": true // Use AEC-processed audio
}
}Performance Benchmarks
Expected performance under standard configuration:
| Metric | Target | Description |
|---|---|---|
| Response Latency | < 1s | From speech to detection complete |
| Detection Accuracy | > 95% | Correctly recognizes set wake word |
| False Trigger Rate | < 5% | Frequency of erroneous triggers |
| CPU Usage | < 30% | Resource consumption during continuous operation |
| Memory Usage | < 100MB | Model and buffer memory usage |
Summary
Sherpa-ONNX Voice Wake-Up Features:
- High Accuracy: Deep learning-based end-to-end detection
- Low Latency: Millisecond-level response speed
- Low Resource: Lightweight model suitable for PC operation
- Customizable: Supports custom wake words
- Easy Integration: Perfect integration with existing audio processing
Now you can enjoy an intelligent, fast, and accurate voice wake-up experience!