Skip to content

Voice Wake-Up Feature

Overview

py-xiaozhi integrates high-precision voice wake-up based on Sherpa-ONNX, supporting custom wake words and real-time detection. It uses a lightweight keyword detection model, delivering millisecond-level response speeds.

Wake Word Model

Built-in Models (Ready to Use)

The repository already includes Sherpa-ONNX keyword detection models in the models/zh and models/en directories, ready to use:

  • models/zh: Chinese model (default MODEL_PATH), keywords.txt comes pre-configured with the 「小爱同学」 wake word, usable with WAKE_WORD_LANG: "zh", and supports adding extra pinyin lines.
  • models/en: English model (BPE units), keywords.txt defaults to the MOSS activation word, usable with WAKE_WORD_LANG: "en".
  • Each directory already contains encoder.onnx/decoder.onnx/joiner.onnx/tokens.txt/keywords.txt; no additional downloads are needed to run.
  • To customize wake words, simply edit keywords.txt in the corresponding language directory and switch MODEL_PATH and WAKE_WORD_LANG in WAKE_WORD_OPTIONS.
LanguageMODEL_PATHWAKE_WORD_LANGDefault keywords.txt Activation WordExample WAKE_WORD Value
Chinesemodels/zhzh小爱同学小爱同学
Englishmodels/enenMOSSMOSS

When switching languages, you must simultaneously:

  1. Update WAKE_WORD_OPTIONS.MODEL_PATH to point to the target language directory;
  2. Set WAKE_WORD_OPTIONS.WAKE_WORD_LANG to zh or en;
  3. Update WAKE_WORD_OPTIONS.WAKE_WORD to a phrase present in the current keywords.txt (e.g., MOSS or 小爱同学);
  4. If you edit keywords.txt, ensure the new wake word's spelling/pinyin matches the configuration, otherwise it will not be detected.

Only follow the steps below if you want to update the model version or replace it with another training set.

Model Download (Optional)

Important: If replacing the model, download the configuration in advance.

Download and Configuration Steps

1. Download the Model Package

bash
# Method 1: Direct download (recommended)
cd /Users/junsen/Desktop/workspace/py-xiaozhi
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2

# Extract
tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2

# Method 2: Use ModelScope
pip install modelscope
python -c "
from modelscope import snapshot_download
snapshot_download('pkufool/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01', cache_dir='./models')
"

2. Configure Model Files

After downloading, the model package contains the following files:

sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/
├── encoder-epoch-12-avg-2-chunk-16-left-64.int8.onnx    # Speed priority
├── encoder-epoch-12-avg-2-chunk-16-left-64.onnx         #
├── encoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx    # Speed priority
├── encoder-epoch-99-avg-1-chunk-16-left-64.onnx         # Accuracy priority
├── decoder-epoch-12-avg-2-chunk-16-left-64.onnx         #
├── decoder-epoch-99-avg-1-chunk-16-left-64.onnx         # Accuracy priority
├── joiner-epoch-12-avg-2-chunk-16-left-64.int8.onnx     # Speed priority
├── joiner-epoch-12-avg-2-chunk-16-left-64.onnx          #
├── joiner-epoch-99-avg-1-chunk-16-left-64.int8.onnx     # Speed priority
├── joiner-epoch-99-avg-1-chunk-16-left-64.onnx          # Accuracy priority
├── tokens.txt                    # Token mapping table (required)
├── keywords_raw.txt              # Raw keywords (optional, for generation)
├── keywords.txt                  # Ready-to-use
├── test_wavs/                    # Test audio (optional)
├── configuration.json            # Model metadata (optional)
└── README.md                     # Documentation (optional)

3. Choose Configuration Plan

Option 1: Accuracy Priority (Recommended)

bash
cd sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01

# Copy accuracy-priority epoch-99 fp32 trio
cp encoder-epoch-99-avg-1-chunk-16-left-64.onnx ../models/encoder.onnx
cp decoder-epoch-99-avg-1-chunk-16-left-64.onnx ../models/decoder.onnx
cp joiner-epoch-99-avg-1-chunk-16-left-64.onnx ../models/joiner.onnx

# Copy supporting files
cp tokens.txt ../models/tokens.txt
cp keywords_raw.txt ../models/keywords_raw.txt  # Optional

Option 2: Speed Priority

bash
cd sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01

# Copy speed-priority epoch-99 int8 trio
cp encoder-epoch-99-avg-1-chunk-16-left-64.int8.onnx ../models/encoder.onnx
cp decoder-epoch-99-avg-1-chunk-16-left-64.onnx ../models/decoder.onnx
cp joiner-epoch-99-avg-1-chunk-16-left-64.int8.onnx ../models/joiner.onnx

# Copy supporting files
cp tokens.txt ../models/tokens.txt

Important Notes:

  • Do NOT mix fp32 and int8: All three model files must maintain consistent precision
  • Prefer epoch-99: More thoroughly trained than epoch-12, with higher accuracy
  • Required files: encoder.onnx + decoder.onnx + joiner.onnx + tokens.txt + keywords.txt

Final Model File Structure

After configuration, your models directory should contain:

models/
├── encoder.onnx      # Encoder model (renamed)
├── decoder.onnx      # Decoder model (renamed)
├── joiner.onnx       # Joiner model (renamed)
├── tokens.txt        # Pinyin token mapping table (228-line version)
├── keywords.txt      # Keyword configuration file (needs creation)
└── keywords_raw.txt  # Raw keyword file (optional)

Model Performance Comparison

Model VersionFile SizeInference SpeedAccuracyResource UsageRecommended Scenario
epoch-99 fp32~13MBMediumHighestMediumDesktop (Recommended)
epoch-99 int8~4MBFastHighLowMobile / resource-constrained
epoch-12 fp32~13MBMediumMedium-HighMediumGeneral use
epoch-12 int8~4MBFastestMediumLowestExtreme speed requirements

Enabling Voice Wake-Up

Configuration File Settings

Edit config/config.json:

json
{
  "WAKE_WORD_OPTIONS": {
    "USE_WAKE_WORD": true,
    "MODEL_PATH": "models/zh",
    "NUM_THREADS": 4,
    "PROVIDER": "cpu",
    "MAX_ACTIVE_PATHS": 2,
    "KEYWORDS_SCORE": 1.8,
    "KEYWORDS_THRESHOLD": 0.2,
    "NUM_TRAILING_BLANKS": 1,
    "WAKE_WORD": "小爱同学",
    "WAKE_WORD_LANG": "zh"
  }
}

To switch to English wake words, simply change MODEL_PATH to models/en, WAKE_WORD to MOSS, and set WAKE_WORD_LANG to en. Regardless of the language chosen, WAKE_WORD must match one of the lines in the corresponding keywords.txt.

Configuration Parameter Details

ParameterDefault ValueDescriptionTuning Advice
USE_WAKE_WORDtrueEnable voice wake-up-
MODEL_PATH"models/zh"Model file directoryChinese: models/zh, English: models/en
NUM_THREADS4Number of processing threadsSet 6-8 on powerful machines
PROVIDER"cpu"Inference engineOptions: cpu, cuda, coreml
MAX_ACTIVE_PATHS2Number of search pathsLower = faster, higher = more accurate
KEYWORDS_SCORE1.8Keyword boost scoreHigher = fewer false positives, lower = more sensitive
KEYWORDS_THRESHOLD0.2Detection thresholdLower = more sensitive, higher = fewer false positives
NUM_TRAILING_BLANKS1Number of trailing blanksUsually keep at 1
WAKE_WORD"小爱同学"Wake word text (for UI display and logs)Must exist in keywords.txt
WAKE_WORD_LANG"zh"Wake word language (zh/en)Must match MODEL_PATH

Custom Wake Words

  1. Open the desktop UI and go to Settings → Wake Words (src/ui/gui/qml/windows/settings/WakeWordTab.qml).
  2. Check "Enable Wake Word" and type a Chinese or English phrase directly into the input field (e.g., "你好小智" or "Hey Moss").
  3. A "conversion preview" is displayed in real time during input; Chinese is automatically converted to pinyin, English uses BPE tokens, and language tags are shown.
  4. Click "Save Wake Word" and the system calls convert_wake_word() (src/ui/shared/models/settings_model.py:314-365):
    • Automatically detects language, updates WAKE_WORD, WAKE_WORD_LANG, and MODEL_PATH;
    • Writes the generated keyword to the corresponding directory (models/zh/keywords.txt or models/en/keywords.txt);
    • Keeps the configuration file and UI state in sync.

This method is suitable for a single primary wake word scenario; saving will overwrite the content of the target keywords.txt. If you need to keep multiple candidates or perform batch management, use "Manual Addition" below instead.

Currently Supported Wake Words

The repository ships with a set of Chinese and English wake words that work immediately after launch:

Chinese (models/zh/keywords.txt)

x iǎo ài t óng x ué @小爱同学

Set WAKE_WORD to 小爱同学 and WAKE_WORD_LANG to zh to wake directly; if you add custom words like "中国好助手", update these two fields as well.

English (models/en/keywords.txt)

▁MO S S @MOSS

English keywords use Sherpa-ONNX's SentencePiece/BPE tokens, with the initial indicating word start. Set MODEL_PATH: "models/en", WAKE_WORD: "MOSS", and WAKE_WORD_LANG: "en" to use.

Manually Adding New Wake Words (Advanced)

Method 1: Directly Edit the Keyword File

Edit the corresponding file based on the chosen language:

  • Chinese: Edit models/zh/keywords.txt, one wake word per line, using "pinyin + @original text" format;
  • English: Edit models/en/keywords.txt, one wake word per line, using SentencePiece tokens (example: ▁HE L LO ▁X I AO ▁Z H I @HELLO XIAOZHI). It is recommended to use the conversion scripts provided by Sherpa-ONNX or refer to existing examples to maintain capitalization and spacing.
# Format: pinyin breakdown @Chinese original text
x iǎo zh ì @小智
n ǐ h ǎo x iǎo zh ì @你好小智
j iā w éi s ī @贾维斯
k āi sh ǐ g ōng z uò @开始工作

After editing, remember to update WAKE_WORD to the new wake word text, and confirm that WAKE_WORD_LANG matches the directory, otherwise wake-up will not trigger. If you later use the Settings UI's "Save Wake Word" feature, the file will be overwritten with a single line.

Method 2: Use the Pinyin Conversion Tool

python
from pypinyin import lazy_pinyin, Style

def generate_keyword_line(text):
    pinyin_list = lazy_pinyin(text, style=Style.TONE3, neutral_tone_with_five=True)
    processed_pinyin = [py.rstrip('12345') for py in pinyin_list]
    pinyin_str = ' '.join(processed_pinyin)
    return f'{pinyin_str} @{text}'

# Generate new wake words
wake_words = ['小助手', '开始工作', '星期五']
for word in wake_words:
    print(generate_keyword_line(word))

Wake Word Selection Tips

  • Moderate length: 2-4 characters
  • Clear pronunciation: Avoid similar-sounding words
  • Strong uniqueness: Avoid common daily conversation words
  • Easy to say: Easy to remember and pronounce

Examples of Good Wake Words

- 你好小智    # 4 characters, unique, clear
- 贾维斯      # 3 characters, unique, tech-savvy
- 开始工作    # 4 characters, clear intent
- 小助手      # 3 characters, simple and easy to remember

Avoid Using

- 嗯         # Too short, easily triggered accidentally
- 你好       # Too common
- 请帮我做一个计划 # Too long
- 谢谢       # Everyday phrase

Usage

Startup Process

  1. Start the program:

    bash
    cd /Users/junsen/Desktop/workspace/py-xiaozhi
    python main.py
  2. Model loading:

    • The system automatically loads the Sherpa-ONNX model
    • Initializes the keyword detector
    • Enters wake word listening state
  3. Voice wake-up:

    • Clearly speak the configured wake word
    • The system automatically switches to the LISTENING state
    • Begins voice conversation

Usage Tips

Best Wake-Up Practices

  • Moderate volume: Normal speaking volume
  • Natural speed: Not too fast or too slow
  • Clear articulation: Pay special attention to tones
  • Quiet environment: Minimize background noise

Performance Optimization

Speed Optimization Configuration

json
{
  "WAKE_WORD_OPTIONS": {
    "NUM_THREADS": 6,           // Increase thread count
    "MAX_ACTIVE_PATHS": 1,      // Reduce search paths
    "KEYWORDS_THRESHOLD": 0.15, // Lower threshold for higher sensitivity
    "KEYWORDS_SCORE": 1.5       // Lower score for faster speed
  }
}

Accuracy Optimization Configuration

json
{
  "WAKE_WORD_OPTIONS": {
    "NUM_THREADS": 4,           // Moderate thread count
    "MAX_ACTIVE_PATHS": 3,      // Increase search paths
    "KEYWORDS_THRESHOLD": 0.25, // Higher threshold to reduce false positives
    "KEYWORDS_SCORE": 2.2       // Higher score for better accuracy
  }
}

Performance Monitoring

Check current performance:

python
# View statistics in the application
stats = wake_word_detector.get_performance_stats()
print(f"Engine: {stats['engine']}")
print(f"Threads: {stats['num_threads']}")
print(f"Detection threshold: {stats['keywords_threshold']}")
print(f"Running: {stats['is_running']}")

Troubleshooting

Common Issues

1. Wake Word Not Responding

Symptom: No response when speaking the wake word

Solutions:

bash
# Check configuration
grep -A 10 "WAKE_WORD_OPTIONS" config/config.json

# Check model files
ls -la models/

# Test functionality
python test_new_keywords.py

2. Slow Response

Symptom: Large delay in wake word recognition

Solutions:

json
{
  "WAKE_WORD_OPTIONS": {
    "KEYWORDS_THRESHOLD": 0.15,  // Lower threshold
    "NUM_THREADS": 6,            // Increase threads
    "MAX_ACTIVE_PATHS": 1        // Reduce search paths
  }
}

3. Frequent False Triggers

Symptom: Frequently triggers wake-up by mistake

Solutions:

json
{
  "WAKE_WORD_OPTIONS": {
    "KEYWORDS_THRESHOLD": 0.3,   // Raise threshold
    "KEYWORDS_SCORE": 2.5,       // Raise score
    "MAX_ACTIVE_PATHS": 3        // Increase search paths
  }
}

4. Model Loading Failure

Symptom: Model file errors on startup

Solutions:

bash
# Check file integrity
ls -la models/
file models/*.onnx
file models/tokens.txt

# Re-validate model
python test_new_keywords.py

Debug Commands

bash
# View system logs
tail -f logs/app.log | grep -i kws

# Monitor performance
top -p $(pgrep -f "python main.py")

# Test audio device
python -c "import sounddevice as sd; print(sd.query_devices())"

Advanced Configuration

Environment Adaptation

Quiet Environment (Office)

json
{
  "WAKE_WORD_OPTIONS": {
    "KEYWORDS_THRESHOLD": 0.15,
    "KEYWORDS_SCORE": 1.5,
    "MAX_ACTIVE_PATHS": 1
  }
}

Noisy Environment (Open Space)

json
{
  "WAKE_WORD_OPTIONS": {
    "KEYWORDS_THRESHOLD": 0.25,
    "KEYWORDS_SCORE": 2.5,
    "MAX_ACTIVE_PATHS": 3
  }
}

Integration with AEC

Voice wake-up integrates perfectly with echo cancellation (AEC):

json
{
  "AEC_OPTIONS": {
    "ENABLED": true,              // AEC provides clean audio for wake word
    "ENABLE_PREPROCESS": true     // Noise suppression improves detection accuracy
  },
  "WAKE_WORD_OPTIONS": {
    "USE_WAKE_WORD": true         // Use AEC-processed audio
  }
}

Performance Benchmarks

Expected performance under standard configuration:

MetricTargetDescription
Response Latency< 1sFrom speech to detection complete
Detection Accuracy> 95%Correctly recognizes set wake word
False Trigger Rate< 5%Frequency of erroneous triggers
CPU Usage< 30%Resource consumption during continuous operation
Memory Usage< 100MBModel and buffer memory usage

Summary

Sherpa-ONNX Voice Wake-Up Features:

  • High Accuracy: Deep learning-based end-to-end detection
  • Low Latency: Millisecond-level response speed
  • Low Resource: Lightweight model suitable for PC operation
  • Customizable: Supports custom wake words
  • Easy Integration: Perfect integration with existing audio processing

Now you can enjoy an intelligent, fast, and accurate voice wake-up experience!