Voice Wake-Up Feature

Overview

py-xiaozhi integrates high-precision voice wake-up based on Sherpa-ONNX, supporting custom wake words and real-time detection. It uses a lightweight keyword detection model, delivering millisecond-level response speeds.

Wake Word Model

Built-in Models (Ready to Use)

The repository already includes Sherpa-ONNX keyword detection models in the models/zh and models/en directories, ready to use:

models/zh: Chinese model (default MODEL_PATH), keywords.txt comes pre-configured with the 「小爱同学」 wake word, usable with WAKE_WORD_LANG: "zh", and supports adding extra pinyin lines.
models/en: English model (BPE units), keywords.txt defaults to the MOSS activation word, usable with WAKE_WORD_LANG: "en".
Each directory already contains encoder.onnx/decoder.onnx/joiner.onnx/tokens.txt/keywords.txt; no additional downloads are needed to run.
To customize wake words, simply edit keywords.txt in the corresponding language directory and switch MODEL_PATH and WAKE_WORD_LANG in WAKE_WORD_OPTIONS.

Language	`MODEL_PATH`	`WAKE_WORD_LANG`	Default `keywords.txt` Activation Word	Example `WAKE_WORD` Value
Chinese	`models/zh`	`zh`	`小爱同学`	`小爱同学`
English	`models/en`	`en`	`MOSS`	`MOSS`

When switching languages, you must simultaneously:

Update WAKE_WORD_OPTIONS.MODEL_PATH to point to the target language directory;
Set WAKE_WORD_OPTIONS.WAKE_WORD_LANG to zh or en;
Update WAKE_WORD_OPTIONS.WAKE_WORD to a phrase present in the current keywords.txt (e.g., MOSS or 小爱同学);
If you edit keywords.txt, ensure the new wake word's spelling/pinyin matches the configuration, otherwise it will not be detected.

Only follow the steps below if you want to update the model version or replace it with another training set.

Model Download (Optional)

Important: If replacing the model, download the configuration in advance.

Official Model Download Links

Official Model List: https://csukuangfj.github.io/sherpa/onnx/kws/pretrained_models/index.html
Recommended Model: sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01

Download and Configuration Steps

1. Download the Model Package

bash

# Method 1: Direct download (recommended)
cd /Users/junsen/Desktop/workspace/py-xiaozhi
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2

# Extract
tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2

# Method 2: Use ModelScope
pip install modelscope
python -c "
from modelscope import snapshot_download
snapshot_download('pkufool/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01', cache_dir='./models')
"

2. Configure Model Files

After downloading, the model package contains the following files:

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/
├── encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx    # Speed priority
├── encoder-epoch-12-avg-2-chunk-16-left-64.onnx         #
├── encoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx    # Speed priority
├── encoder-epoch-99-avg-1-chunk-16-left-64.onnx         # Accuracy priority
├── decoder-epoch-12-avg-2-chunk-16-left-64.onnx         #
├── decoder-epoch-99-avg-1-chunk-16-left-64.onnx         # Accuracy priority
├── joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx     # Speed priority
├── joiner-epoch-12-avg-2-chunk-16-left-64.onnx          #
├── joiner-epoch-99-avg-1-chunk-16-left-64.int8.onnx     # Speed priority
├── joiner-epoch-99-avg-1-chunk-16-left-64.onnx          # Accuracy priority
├── tokens.txt                    # Token mapping table (required)
├── keywords_raw.txt              # Raw keywords (optional, for generation)
├── keywords.txt                  # Ready-to-use
├── test_wavs/                    # Test audio (optional)
├── configuration.json            # Model metadata (optional)
└── README.md                     # Documentation (optional)

3. Choose Configuration Plan

Option 1: Accuracy Priority (Recommended)

bash

cd sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01

# Copy accuracy-priority epoch-99 fp32 trio
cp encoder-epoch-99-avg-1-chunk-16-left-64.onnx ../models/encoder.onnx
cp decoder-epoch-99-avg-1-chunk-16-left-64.onnx ../models/decoder.onnx
cp joiner-epoch-99-avg-1-chunk-16-left-64.onnx ../models/joiner.onnx

# Copy supporting files
cp tokens.txt ../models/tokens.txt
cp keywords_raw.txt ../models/keywords_raw.txt  # Optional

Option 2: Speed Priority

bash

cd sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01

# Copy speed-priority epoch-99 int8 trio
cp encoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx ../models/encoder.onnx
cp decoder-epoch-99-avg-1-chunk-16-left-64.onnx ../models/decoder.onnx
cp joiner-epoch-99-avg-1-chunk-16-left-64.int8.onnx ../models/joiner.onnx

# Copy supporting files
cp tokens.txt ../models/tokens.txt

Important Notes:

Do NOT mix fp32 and int8: All three model files must maintain consistent precision
Prefer epoch-99: More thoroughly trained than epoch-12, with higher accuracy
Required files: encoder.onnx + decoder.onnx + joiner.onnx + tokens.txt + keywords.txt

Final Model File Structure

After configuration, your models directory should contain:

models/
├── encoder.onnx      # Encoder model (renamed)
├── decoder.onnx      # Decoder model (renamed)
├── joiner.onnx       # Joiner model (renamed)
├── tokens.txt        # Pinyin token mapping table (228-line version)
├── keywords.txt      # Keyword configuration file (needs creation)
└── keywords_raw.txt  # Raw keyword file (optional)

Model Performance Comparison

Model Version	File Size	Inference Speed	Accuracy	Resource Usage	Recommended Scenario
epoch-99 fp32	~13MB	Medium	Highest	Medium	Desktop (Recommended)
epoch-99 int8	~4MB	Fast	High	Low	Mobile / resource-constrained
epoch-12 fp32	~13MB	Medium	Medium-High	Medium	General use
epoch-12 int8	~4MB	Fastest	Medium	Lowest	Extreme speed requirements

Enabling Voice Wake-Up

Configuration File Settings

Edit config/config.json:

json

{
  "WAKE_WORD_OPTIONS": {
    "USE_WAKE_WORD": true,
    "MODEL_PATH": "models/zh",
    "NUM_THREADS": 4,
    "PROVIDER": "cpu",
    "MAX_ACTIVE_PATHS": 2,
    "KEYWORDS_SCORE": 1.8,
    "KEYWORDS_THRESHOLD": 0.2,
    "NUM_TRAILING_BLANKS": 1,
    "WAKE_WORD": "小爱同学",
    "WAKE_WORD_LANG": "zh"
  }
}

To switch to English wake words, simply change MODEL_PATH to models/en, WAKE_WORD to MOSS, and set WAKE_WORD_LANG to en. Regardless of the language chosen, WAKE_WORD must match one of the lines in the corresponding keywords.txt.

Configuration Parameter Details

Parameter	Default Value	Description	Tuning Advice
`USE_WAKE_WORD`	`true`	Enable voice wake-up	-
`MODEL_PATH`	`"models/zh"`	Model file directory	Chinese: `models/zh`, English: `models/en`
`NUM_THREADS`	`4`	Number of processing threads	Set 6-8 on powerful machines
`PROVIDER`	`"cpu"`	Inference engine	Options: cpu, cuda, coreml
`MAX_ACTIVE_PATHS`	`2`	Number of search paths	Lower = faster, higher = more accurate
`KEYWORDS_SCORE`	`1.8`	Keyword boost score	Higher = fewer false positives, lower = more sensitive
`KEYWORDS_THRESHOLD`	`0.2`	Detection threshold	Lower = more sensitive, higher = fewer false positives
`NUM_TRAILING_BLANKS`	`1`	Number of trailing blanks	Usually keep at 1
`WAKE_WORD`	`"小爱同学"`	Wake word text (for UI display and logs)	Must exist in `keywords.txt`
`WAKE_WORD_LANG`	`"zh"`	Wake word language (`zh`/`en`)	Must match `MODEL_PATH`

Custom Wake Words

Recommended Method: Settings Window One-Click Save

Open the desktop UI and go to Settings → Wake Words (src/ui/gui/qml/windows/settings/WakeWordTab.qml).
Check "Enable Wake Word" and type a Chinese or English phrase directly into the input field (e.g., "你好小智" or "Hey Moss").
A "conversion preview" is displayed in real time during input; Chinese is automatically converted to pinyin, English uses BPE tokens, and language tags are shown.
Click "Save Wake Word" and the system calls convert_wake_word() (src/ui/shared/models/settings_model.py:314-365):
- Automatically detects language, updates WAKE_WORD, WAKE_WORD_LANG, and MODEL_PATH;
- Writes the generated keyword to the corresponding directory (models/zh/keywords.txt or models/en/keywords.txt);
- Keeps the configuration file and UI state in sync.

This method is suitable for a single primary wake word scenario; saving will overwrite the content of the target keywords.txt. If you need to keep multiple candidates or perform batch management, use "Manual Addition" below instead.

Currently Supported Wake Words

The repository ships with a set of Chinese and English wake words that work immediately after launch:

Chinese (models/zh/keywords.txt)

x iǎo ài t óng x ué @小爱同学

Set WAKE_WORD to 小爱同学 and WAKE_WORD_LANG to zh to wake directly; if you add custom words like "中国好助手", update these two fields as well.

English (models/en/keywords.txt)

▁MO S S @MOSS

English keywords use Sherpa-ONNX's SentencePiece/BPE tokens, with the initial ▁ indicating word start. Set MODEL_PATH: "models/en", WAKE_WORD: "MOSS", and WAKE_WORD_LANG: "en" to use.

Manually Adding New Wake Words (Advanced)

Method 1: Directly Edit the Keyword File

Edit the corresponding file based on the chosen language:

Chinese: Edit models/zh/keywords.txt, one wake word per line, using "pinyin + @original text" format;
English: Edit models/en/keywords.txt, one wake word per line, using SentencePiece tokens (example: ▁HE L LO ▁X I AO ▁Z H I @HELLO XIAOZHI). It is recommended to use the conversion scripts provided by Sherpa-ONNX or refer to existing examples to maintain capitalization and spacing.

# Format: pinyin breakdown @Chinese original text
x iǎo zh ì @小智
n ǐ h ǎo x iǎo zh ì @你好小智
j iā w éi s ī @贾维斯
k āi sh ǐ g ōng z uò @开始工作

After editing, remember to update WAKE_WORD to the new wake word text, and confirm that WAKE_WORD_LANG matches the directory, otherwise wake-up will not trigger. If you later use the Settings UI's "Save Wake Word" feature, the file will be overwritten with a single line.

Method 2: Use the Pinyin Conversion Tool

python

from pypinyin import lazy_pinyin, Style

def generate_keyword_line(text):
    pinyin_list = lazy_pinyin(text, style=Style.TONE3, neutral_tone_with_five=True)
    processed_pinyin = [py.rstrip('12345') for py in pinyin_list]
    pinyin_str = ' '.join(processed_pinyin)
    return f'{pinyin_str} @{text}'

# Generate new wake words
wake_words = ['小助手', '开始工作', '星期五']
for word in wake_words:
    print(generate_keyword_line(word))

Wake Word Selection Tips

Recommended Wake Word Characteristics

Moderate length: 2-4 characters
Clear pronunciation: Avoid similar-sounding words
Strong uniqueness: Avoid common daily conversation words
Easy to say: Easy to remember and pronounce

Examples of Good Wake Words

- 你好小智    # 4 characters, unique, clear
- 贾维斯      # 3 characters, unique, tech-savvy
- 开始工作    # 4 characters, clear intent
- 小助手      # 3 characters, simple and easy to remember

Avoid Using

- 嗯         # Too short, easily triggered accidentally
- 你好       # Too common
- 请帮我做一个计划 # Too long
- 谢谢       # Everyday phrase

Usage

Startup Process

Start the program:

bash

cd /Users/junsen/Desktop/workspace/py-xiaozhi
python main.py

Model loading:
- The system automatically loads the Sherpa-ONNX model
- Initializes the keyword detector
- Enters wake word listening state
Voice wake-up:
- Clearly speak the configured wake word
- The system automatically switches to the LISTENING state
- Begins voice conversation

Usage Tips

Best Wake-Up Practices

Moderate volume: Normal speaking volume
Natural speed: Not too fast or too slow
Clear articulation: Pay special attention to tones
Quiet environment: Minimize background noise

Performance Optimization

Speed Optimization Configuration

json

{
  "WAKE_WORD_OPTIONS": {
    "NUM_THREADS": 6,           // Increase thread count
    "MAX_ACTIVE_PATHS": 1,      // Reduce search paths
    "KEYWORDS_THRESHOLD": 0.15, // Lower threshold for higher sensitivity
    "KEYWORDS_SCORE": 1.5       // Lower score for faster speed
  }
}

Accuracy Optimization Configuration

json

{
  "WAKE_WORD_OPTIONS": {
    "NUM_THREADS": 4,           // Moderate thread count
    "MAX_ACTIVE_PATHS": 3,      // Increase search paths
    "KEYWORDS_THRESHOLD": 0.25, // Higher threshold to reduce false positives
    "KEYWORDS_SCORE": 2.2       // Higher score for better accuracy
  }
}

Performance Monitoring

Check current performance:

python

# View statistics in the application
stats = wake_word_detector.get_performance_stats()
print(f"Engine: {stats['engine']}")
print(f"Threads: {stats['num_threads']}")
print(f"Detection threshold: {stats['keywords_threshold']}")
print(f"Running: {stats['is_running']}")

Troubleshooting

Common Issues

1. Wake Word Not Responding

Symptom: No response when speaking the wake word

Solutions:

bash

# Check configuration
grep -A 10 "WAKE_WORD_OPTIONS" config/config.json

# Check model files
ls -la models/

# Test functionality
python test_new_keywords.py

2. Slow Response

Symptom: Large delay in wake word recognition

Solutions:

json

{
  "WAKE_WORD_OPTIONS": {
    "KEYWORDS_THRESHOLD": 0.15,  // Lower threshold
    "NUM_THREADS": 6,            // Increase threads
    "MAX_ACTIVE_PATHS": 1        // Reduce search paths
  }
}

3. Frequent False Triggers

Symptom: Frequently triggers wake-up by mistake

Solutions:

json

{
  "WAKE_WORD_OPTIONS": {
    "KEYWORDS_THRESHOLD": 0.3,   // Raise threshold
    "KEYWORDS_SCORE": 2.5,       // Raise score
    "MAX_ACTIVE_PATHS": 3        // Increase search paths
  }
}

4. Model Loading Failure

Symptom: Model file errors on startup

Solutions:

bash

# Check file integrity
ls -la models/
file models/*.onnx
file models/tokens.txt

# Re-validate model
python test_new_keywords.py

Debug Commands

bash

# View system logs
tail -f logs/app.log | grep -i kws

# Monitor performance
top -p $(pgrep -f "python main.py")

# Test audio device
python -c "import sounddevice as sd; print(sd.query_devices())"

Advanced Configuration

Environment Adaptation

Quiet Environment (Office)

json

{
  "WAKE_WORD_OPTIONS": {
    "KEYWORDS_THRESHOLD": 0.15,
    "KEYWORDS_SCORE": 1.5,
    "MAX_ACTIVE_PATHS": 1
  }
}

Noisy Environment (Open Space)

json

{
  "WAKE_WORD_OPTIONS": {
    "KEYWORDS_THRESHOLD": 0.25,
    "KEYWORDS_SCORE": 2.5,
    "MAX_ACTIVE_PATHS": 3
  }
}

Integration with AEC

Voice wake-up integrates perfectly with echo cancellation (AEC):

json

{
  "AEC_OPTIONS": {
    "ENABLED": true,              // AEC provides clean audio for wake word
    "ENABLE_PREPROCESS": true     // Noise suppression improves detection accuracy
  },
  "WAKE_WORD_OPTIONS": {
    "USE_WAKE_WORD": true         // Use AEC-processed audio
  }
}

Performance Benchmarks

Expected performance under standard configuration:

Metric	Target	Description
Response Latency	< 1s	From speech to detection complete
Detection Accuracy	> 95%	Correctly recognizes set wake word
False Trigger Rate	< 5%	Frequency of erroneous triggers
CPU Usage	< 30%	Resource consumption during continuous operation
Memory Usage	< 100MB	Model and buffer memory usage

Summary

Sherpa-ONNX Voice Wake-Up Features:

High Accuracy: Deep learning-based end-to-end detection
Low Latency: Millisecond-level response speed
Low Resource: Lightweight model suitable for PC operation
Customizable: Supports custom wake words
Easy Integration: Perfect integration with existing audio processing

Now you can enjoy an intelligent, fast, and accurate voice wake-up experience!

Voice Wake-Up Feature ​

Overview ​

Wake Word Model ​

Built-in Models (Ready to Use) ​

Model Download (Optional) ​

Official Model Download Links ​

Download and Configuration Steps ​

1. Download the Model Package ​

2. Configure Model Files ​

3. Choose Configuration Plan ​

Final Model File Structure ​

Model Performance Comparison ​

Enabling Voice Wake-Up ​

Configuration File Settings ​

Configuration Parameter Details ​

Custom Wake Words ​

Recommended Method: Settings Window One-Click Save ​

Currently Supported Wake Words ​

Manually Adding New Wake Words (Advanced) ​

Method 1: Directly Edit the Keyword File ​

Method 2: Use the Pinyin Conversion Tool ​

Wake Word Selection Tips ​

Recommended Wake Word Characteristics ​

Examples of Good Wake Words ​

Avoid Using ​

Usage ​

Startup Process ​

Usage Tips ​

Best Wake-Up Practices ​

Performance Optimization ​

Speed Optimization Configuration ​

Accuracy Optimization Configuration ​

Performance Monitoring ​

Troubleshooting ​

Common Issues ​

1. Wake Word Not Responding ​

2. Slow Response ​

3. Frequent False Triggers ​

4. Model Loading Failure ​

Debug Commands ​

Advanced Configuration ​

Environment Adaptation ​

Quiet Environment (Office) ​

Noisy Environment (Open Space) ​

Integration with AEC ​

Performance Benchmarks ​

Summary ​

Voice Wake-Up Feature

Overview

Wake Word Model

Built-in Models (Ready to Use)

Model Download (Optional)

Official Model Download Links

Download and Configuration Steps

1. Download the Model Package

2. Configure Model Files

3. Choose Configuration Plan

Final Model File Structure

Model Performance Comparison

Enabling Voice Wake-Up

Configuration File Settings

Configuration Parameter Details

Custom Wake Words

Recommended Method: Settings Window One-Click Save

Currently Supported Wake Words

Manually Adding New Wake Words (Advanced)

Method 1: Directly Edit the Keyword File

Method 2: Use the Pinyin Conversion Tool

Wake Word Selection Tips

Recommended Wake Word Characteristics

Examples of Good Wake Words

Avoid Using

Usage

Startup Process

Usage Tips

Best Wake-Up Practices

Performance Optimization

Speed Optimization Configuration

Accuracy Optimization Configuration

Performance Monitoring

Troubleshooting

Common Issues

1. Wake Word Not Responding

2. Slow Response

3. Frequent False Triggers

4. Model Loading Failure

Debug Commands

Advanced Configuration

Environment Adaptation

Quiet Environment (Office)

Noisy Environment (Open Space)

Integration with AEC

Performance Benchmarks

Summary