Configuration Guide
Configuration System Overview
Configuration File Location
py-xiaozhi's configuration files are stored in the user data directory (not the project root), following each platform's conventions:
| Platform | Configuration Directory |
|---|---|
| Windows | C:\Users\<username>\AppData\Local\py-xiaozhi\config\ |
| macOS | ~/Library/Application Support/py-xiaozhi/config/ |
| Linux | ~/.local/share/py-xiaozhi/config/ |
<user data directory>/
├── config/
│ ├── config.json # Main configuration file (runtime config)
│ └── efuse.json # Device identity file (auto-generated)
├── logs/ # Log files
└── cache/ # Cache filesTip: Developers who need to customize the app name can modify
APP_NAMEinsrc/constants/system.pyto change the data directory name.
Configuration Hierarchy
Default Configuration Template
- Location:
DEFAULT_CONFIGinsrc/utils/config_manager.py - Purpose: Provides system default configuration values
- Usage: Template for auto-generating config files on first run
- Location:
Runtime Configuration File
- Location:
<user data directory>/config/config.json - Purpose: Stores user-customized configuration
- Usage: The actual configuration read at runtime
- Location:
Device Identity File
- Location:
<user data directory>/config/efuse.json - Purpose: Stores device unique identifier and activation status
- Usage: Device activation and identity verification
- Location:
Configuration Access Methods
The configuration system supports dot-separated path access for convenient retrieval and modification of nested configurations:
# Configuration access examples
from src.utils.config_manager import ConfigManager
config = ConfigManager.get_instance()
# Get network configuration
websocket_url = config.get_config("SYSTEM_OPTIONS.NETWORK.WEBSOCKET_URL")
# Get wake word configuration
wake_words = config.get_config("WAKE_WORD_OPTIONS.WAKE_WORDS")
# Update configuration
config.update_config("WAKE_WORD_OPTIONS.USE_WAKE_WORD", True)
config.update_config("CAMERA.VLapi_key", "your_api_key_here")
# Reload configuration
config.reload_config()System Configuration (SYSTEM_OPTIONS)
Basic System Configuration
{
"SYSTEM_OPTIONS": {
"CLIENT_ID": "auto-generated client ID",
"DEVICE_ID": "device MAC address",
"WINDOW_SIZE_MODE": "screen_100",
"NETWORK": {
"OTA_VERSION_URL": "https://api.tenclass.net/xiaozhi/ota/",
"WEBSOCKET_URL": "wss://api.tenclass.net/xiaozhi/v1/",
"WEBSOCKET_ACCESS_TOKEN": "access token",
"MQTT_INFO": {
"endpoint": "mqtt.server.com",
"client_id": "xiaozhi_client",
"username": "your_username",
"password": "your_password",
"publish_topic": "xiaozhi/commands",
"subscribe_topic": "xiaozhi/responses"
},
"ACTIVATION_VERSION": "v2",
"AUTHORIZATION_URL": "https://xiaozhi.me/"
}
}
}Configuration Item Descriptions
| Configuration Item | Type | Default Value | Description |
|---|---|---|---|
CLIENT_ID | String | Auto-generated | Unique client identifier |
DEVICE_ID | String | MAC address | Unique device identifier |
WINDOW_SIZE_MODE | String | "screen_100" | Window size mode |
OTA_VERSION_URL | String | Official OTA URL | OTA configuration retrieval URL |
WEBSOCKET_URL | String | Delivered by OTA | WebSocket server address |
WEBSOCKET_ACCESS_TOKEN | String | Delivered by OTA | WebSocket access token |
ACTIVATION_VERSION | String | "v2" | Activation protocol version (v1/v2) |
AUTHORIZATION_URL | String | "https://xiaozhi.me/" | Device authorization URL |
Switching Server Configuration
Switching to a Self-Hosted Server
- To use your own server, simply modify the OTA API address. The system will automatically obtain WebSocket connection information from the OTA server:
- Set
AUTHORIZATION_URLto your device binding backend (e.g., xiaozhi.me). Clicking it will automatically open this web page.
{
"SYSTEM_OPTIONS": {
"NETWORK": {
"OTA_VERSION_URL": "https://your-server.com/xiaozhi/ota/",
"AUTHORIZATION_URL": "https://xiaozhi.me/"
}
}
}Configuration Auto-Update Mechanism
On startup, the system automatically updates configuration through the following process:
- OTA Configuration Retrieval: Sends a POST request to
OTA_VERSION_URL - Auto Configuration Update: System automatically updates MQTT and WebSocket configuration
- Connection Establishment: Establishes connection using updated configuration
Related code locations:
- OTA configuration retrieval:
fetch_and_update_config()method insrc/core/ota.py - Configuration update:
update_websocket_config()andupdate_mqtt_config()methods insrc/core/ota.py
Disabling Configuration Auto-Update
If you don't need automatic configuration updates, comment out the relevant code in these locations:
1. Disable OTA Configuration Retrieval
Comment out the third phase in src/core/system_initializer.py:
# async def stage_3_ota_config(self):
# """
# Phase 3: OTA configuration retrieval.
# """
# # Comment out the entire method body2. Disable WebSocket Configuration Update
Comment out the update method in src/core/ota.py:
async def update_websocket_config(self, response_data):
"""
Update WebSocket configuration.
"""
# Comment out configuration update logic
return None3. Manually Configure WebSocket Connection
Directly configure fixed connection info in the configuration file:
{
"SYSTEM_OPTIONS": {
"NETWORK": {
"WEBSOCKET_URL": "wss://your-server.com/xiaozhi/v1/",
"WEBSOCKET_ACCESS_TOKEN": "your_fixed_token"
}
}
}Wake Word Configuration (WAKE_WORD_OPTIONS)
Sherpa-ONNX Voice Wake-Up Settings
{
"WAKE_WORD_OPTIONS": {
"USE_WAKE_WORD": true,
"MODEL_PATH": "models/zh",
"NUM_THREADS": 4,
"PROVIDER": "cpu",
"MAX_ACTIVE_PATHS": 2,
"KEYWORDS_SCORE": 1.8,
"KEYWORDS_THRESHOLD": 0.2,
"NUM_TRAILING_BLANKS": 1,
"WAKE_WORD": "你好小智",
"WAKE_WORD_LANG": "zh"
}
}Configuration Item Descriptions
| Configuration Item | Type | Default Value | Description |
|---|---|---|---|
USE_WAKE_WORD | Boolean | true | Enable voice wake-up |
MODEL_PATH | String | "models" | Sherpa-ONNX model file directory |
NUM_THREADS | Integer | 4 | Model inference threads (affects response speed) |
PROVIDER | String | "cpu" | Inference engine (cpu/cuda/coreml) |
MAX_ACTIVE_PATHS | Integer | 2 | Search path count (affects accuracy and speed) |
KEYWORDS_SCORE | Float | 1.8 | Keyword boost score (affects detection sensitivity) |
KEYWORDS_THRESHOLD | Float | 0.2 | Detection threshold (lower = more sensitive) |
NUM_TRAILING_BLANKS | Integer | 1 | Number of trailing blank tokens |
WAKE_WORD | String | "你好小智" | Wake word text |
WAKE_WORD_LANG | String | "zh" | Wake word language (zh/en) |
Model File Structure
models/
├── encoder.onnx # Encoder model (high-precision version included)
├── decoder.onnx # Decoder model
├── joiner.onnx # Joiner model
├── tokens.txt # Pinyin token mapping table
└── keywords.txt # Keyword configuration fileCustom Wake Words
Edit models/keywords.txt to add wake words:
# Format: pinyin breakdown @Chinese original
n ǐ h ǎo x iǎo zh ì @你好小智
j iā w éi s ī @贾维斯
x iǎo zh ù sh ǒu @小助手
k āi sh ǐ g ōng z uò @开始工作Performance Tuning
Speed-Priority Configuration
{
"WAKE_WORD_OPTIONS": {
"NUM_THREADS": 6,
"MAX_ACTIVE_PATHS": 1,
"KEYWORDS_THRESHOLD": 0.15,
"KEYWORDS_SCORE": 1.5
}
}Accuracy-Priority Configuration
{
"WAKE_WORD_OPTIONS": {
"NUM_THREADS": 4,
"MAX_ACTIVE_PATHS": 3,
"KEYWORDS_THRESHOLD": 0.25,
"KEYWORDS_SCORE": 2.2
}
}Camera Configuration (CAMERA)
Visual Recognition Settings
{
"CAMERA": {
"camera_index": 0,
"frame_width": 640,
"frame_height": 480,
"fps": 30,
"Local_VL_url": "https://open.bigmodel.cn/api/paas/v4/",
"VLapi_key": "your_zhipu_api_key",
"models": "glm-4v-plus"
}
}Configuration Item Descriptions
| Configuration Item | Type | Default Value | Description |
|---|---|---|---|
camera_index | Integer | 0 | Camera device index |
frame_width | Integer | 640 | Frame width |
frame_height | Integer | 480 | Frame height |
fps | Integer | 30 | Frame rate |
Local_VL_url | String | Zhipu API URL | Visual model API URL |
VLapi_key | String | "" | Zhipu API key |
models | String | "glm-4v-plus" | Visual model name |
Camera Testing
# Test camera functionality
python scripts/camera_scanner.py
# Test visual recognition in the program
Ask verbally: "What's in front of the camera?"Shortcut Configuration (SHORTCUTS)
Global Shortcut Settings
{
"SHORTCUTS": {
"ENABLED": true,
"MANUAL_PRESS": {
"modifier": "ctrl",
"key": "j",
"description": "Push-to-talk"
},
"AUTO_TOGGLE": {
"modifier": "ctrl",
"key": "k",
"description": "Auto conversation"
},
"ABORT": {
"modifier": "ctrl",
"key": "q",
"description": "Abort conversation"
},
"MODE_TOGGLE": {
"modifier": "ctrl",
"key": "m",
"description": "Switch mode"
},
"WINDOW_TOGGLE": {
"modifier": "ctrl",
"key": "w",
"description": "Show/hide window"
}
}
}Shortcut Descriptions
| Shortcut | Function | Description |
|---|---|---|
Ctrl+J | Push-to-Talk | Records while held, sends on release |
Ctrl+K | Auto Conversation | Toggle auto conversation mode on/off |
Ctrl+Q | Abort Conversation | Interrupt current conversation |
Ctrl+M | Switch Mode | Toggle between conversation modes |
Ctrl+W | Show/Hide Window | Show or hide the main window |
Acoustic Echo Cancellation Configuration (AEC_OPTIONS)
AEC Audio Processing Settings
{
"AEC_OPTIONS": {
"ENABLED": false,
"BUFFER_MAX_LENGTH": 200,
"FRAME_DELAY": 3,
"FILTER_LENGTH_RATIO": 0.4,
"ENABLE_PREPROCESS": true,
"MODE": "voice_processing"
}
}Configuration Item Descriptions
| Configuration Item | Type | Default Value | Description |
|---|---|---|---|
ENABLED | Boolean | false | Enable AEC echo cancellation |
BUFFER_MAX_LENGTH | Integer | 200 | Reference signal buffer size (in frames) |
FRAME_DELAY | Integer | 3 | Delay compensation frames (currently unused) |
FILTER_LENGTH_RATIO | Float | 0.4 | Filter length ratio (seconds), affects echo cancellation strength |
ENABLE_PREPROCESS | Boolean | true | Enable noise suppression preprocessing |
MODE | String | "voice_processing" | AEC processing mode |
AEC Feature Description
Echo Cancellation
- Eliminates echo from speaker playback picked up by the microphone
- Supports real-time bidirectional conversation without echo interference
Noise Suppression
- Suppresses background noise and environmental interference
- Improves speech recognition accuracy
Conversation Mode Impact
{
"AEC_OPTIONS": {
"ENABLED": true // When enabled: Real-time conversation mode (ListeningMode.REALTIME)
// When disabled: Turn-based conversation mode (ListeningMode.AUTO_STOP)
}
}Environment Optimization Suggestions
Small Room / Office Environment
{
"AEC_OPTIONS": {
"FILTER_LENGTH_RATIO": 0.2,
"BUFFER_MAX_LENGTH": 150
}
}Large Room / Meeting Room Environment
{
"AEC_OPTIONS": {
"FILTER_LENGTH_RATIO": 0.6,
"BUFFER_MAX_LENGTH": 300
}
}Noisy Environment
{
"AEC_OPTIONS": {
"FILTER_LENGTH_RATIO": 0.8,
"ENABLE_PREPROCESS": true,
"BUFFER_MAX_LENGTH": 400
}
}AEC Feature Testing
# Test echo cancellation effect
# 1. Enable AEC, speak while playing music
python main.py # Should have no echo when AEC is enabled
# 2. Disable AEC for comparison
# Set "ENABLED": false in config file
python main.py # May have echo when AEC is disabledPerformance Parameter Details
Filter Length Calculation
Actual filter length = sample rate (16000Hz) × FILTER_LENGTH_RATIO
Example: FILTER_LENGTH_RATIO = 0.4 → filter length = 6400 samples (0.4 seconds)Parameter Effects
- Filter length ↑: Echo cancellation effectiveness ↑, CPU usage ↑
- Buffer size ↑: Stability ↑, memory usage ↑
- Preprocessing enabled: Noise suppression ↑, slight latency ↑
Audio Device Configuration (AUDIO_DEVICES)
Audio Input/Output Settings
{
"AUDIO_DEVICES": {
"input_device_id": null,
"input_device_name": null,
"output_device_id": null,
"output_device_name": null,
"input_sample_rate": null,
"output_sample_rate": null,
"input_channels": 1,
"output_channels": 2,
"opus_output_sample_rate": 24000,
"frame_duration": 20
}
}Configuration Item Descriptions
| Configuration Item | Type | Default | Description |
|---|---|---|---|
input_device_id | Integer | null | Input device ID (auto-detected) |
input_device_name | String | null | Input device name |
output_device_id | Integer | null | Output device ID (auto-detected) |
output_device_name | String | null | Output device name |
input_sample_rate | Integer | null | Input sample rate (auto-detected) |
output_sample_rate | Integer | null | Output sample rate (auto-detected) |
input_channels | Integer | 1 | Input channel count |
output_channels | Integer | 2 | Output channel count |
opus_output_sample_rate | Integer | 24000 | Opus decode sample rate: 24000 (official) or 16000 (third-party) |
frame_duration | Integer | 20 | Audio frame duration (ms): 20 (low latency) / 40 (balanced) / 60 (low CPU) |
Sample Rate and Frame Duration Details
Opus Decode Sample Rate
24000: Used by the official server, better audio quality16000: Used by third-party servers, better compatibility
Frame Duration Selection
Frame duration primarily affects input encoding and device callback frequency. The audio frame duration returned by the server is auto-detected by the client (via Opus TOC byte parsing) and does not need manual matching.
20ms: Low latency, suitable for real-time conversation (x86 default)40ms: Balanced mode, balanced latency and performance60ms: Low CPU usage, suitable for Raspberry Pi and other performance-constrained devices
Audio Device Auto-Detection
The system automatically detects available audio devices on startup. To manually specify devices, configure the device ID and name:
{
"AUDIO_DEVICES": {
"input_device_id": 2,
"input_device_name": "MacBook Air Microphone",
"output_device_id": 1,
"output_device_name": "MacBook Air Speakers"
}
}Logging Configuration (LOGGING)
Logging System Settings
{
"LOGGING": {
"LEVEL": "INFO",
"FORMAT_TYPE": "colored",
"ENABLE_CONSOLE": true,
"ENABLE_FILE": true,
"ENABLE_ERROR_FILE": true,
"ENABLE_JSON_FILE": false,
"ENABLE_ASYNC": false,
"ENABLE_SENSITIVE_FILTER": true,
"MAX_BYTES": 10485760,
"BACKUP_COUNT": 30,
"ROTATION_WHEN": "midnight",
"THIRD_PARTY_LEVELS": {
"urllib3": "WARNING",
"websockets": "WARNING",
"asyncio": "WARNING",
"paho": "WARNING",
"PIL": "WARNING"
}
}
}Configuration Item Descriptions
| Configuration Item | Type | Default Value | Description |
|---|---|---|---|
LEVEL | String | "INFO" | Log level: DEBUG/INFO/WARNING/ERROR/CRITICAL |
FORMAT_TYPE | String | "colored" | Output format: colored/json/simple |
ENABLE_CONSOLE | Boolean | true | Enable console output |
ENABLE_FILE | Boolean | true | Enable file logging |
ENABLE_ERROR_FILE | Boolean | true | Enable separate error log file |
ENABLE_JSON_FILE | Boolean | false | Enable JSON format log file |
ENABLE_ASYNC | Boolean | false | Enable async logging |
ENABLE_SENSITIVE_FILTER | Boolean | true | Enable sensitive information filtering |
MAX_BYTES | Integer | 10485760 | Maximum bytes per log file (10MB) |
BACKUP_COUNT | Integer | 30 | Number of log backups to retain |
ROTATION_WHEN | String | "midnight" | Log rotation timing: midnight/H/D |
THIRD_PARTY_LEVELS | Object | Third-party library log level configuration |
Log Level Descriptions
| Level | Description |
|---|---|
| DEBUG | Detailed debug info, for development |
| INFO | General runtime info (default) |
| WARNING | Warning messages |
| ERROR | Error messages |
| CRITICAL | Critical errors |
Development Debug Configuration
{
"LOGGING": {
"LEVEL": "DEBUG",
"FORMAT_TYPE": "colored",
"ENABLE_CONSOLE": true,
"ENABLE_FILE": true
}
}Production Environment Configuration
{
"LOGGING": {
"LEVEL": "WARNING",
"FORMAT_TYPE": "json",
"ENABLE_CONSOLE": false,
"ENABLE_FILE": true,
"ENABLE_JSON_FILE": true,
"ENABLE_ASYNC": true
}
}Protocol Configuration Details
WebSocket Protocol Configuration
WebSocket connection information is typically delivered automatically by the OTA server and does not require manual configuration:
{
"SYSTEM_OPTIONS": {
"NETWORK": {
"WEBSOCKET_URL": "wss://your-server.com/xiaozhi/v1/",
"WEBSOCKET_ACCESS_TOKEN": "your_access_token"
}
}
}Configuration Notes:
- URL must start with
ws://orwss:// - Supports IP addresses or domain names
- Default port is 8000, adjustable based on server configuration
- Access token is used for authentication
- Usually auto-configured by the OTA server, no manual setup required
MQTT Protocol Configuration
{
"SYSTEM_OPTIONS": {
"NETWORK": {
"MQTT_INFO": {
"endpoint": "mqtt.server.com",
"port": 1883,
"client_id": "xiaozhi_client_001",
"username": "your_username",
"password": "your_password",
"publish_topic": "xiaozhi/commands",
"subscribe_topic": "xiaozhi/responses",
"qos": 1,
"keep_alive": 60
}
}
}
}Configuration Notes:
endpoint: MQTT server addressport: Typically 1883 (unencrypted) or 8883 (TLS encrypted)client_id: Unique client identifierqos: Quality of Service level (0-2)keep_alive: Heartbeat interval (seconds)
Device Activation Configuration
Activation Version Details
{
"SYSTEM_OPTIONS": {
"NETWORK": {
"ACTIVATION_VERSION": "v2",
"AUTHORIZATION_URL": "https://xiaozhi.me/"
}
}
}Version Differences:
- v1: Simplified activation process, no verification code required
- v2: Full activation process with verification code
Device Identity File (efuse.json)
{
"serial_number": "SN-E3E1F618-902e16dbe116",
"hmac_key": "b5bf012dd518080532f928b70ed958799f34f9224e80dd4128795a70a5baca24",
"activation_status": false,
"mac_address": "00:11:22:33:44:55",
"device_fingerprint": {
"cpu_info": "...",
"memory_info": "...",
"disk_info": "..."
}
}Field Descriptions:
serial_number: Device serial numberhmac_key: Device verification keyactivation_status: Activation statusmac_address: Device MAC addressdevice_fingerprint: Device fingerprint information
Configuration Management Tips
1. Find Configuration File Location
# macOS
open ~/Library/Application\ Support/py-xiaozhi/config/
# Linux
xdg-open ~/.local/share/py-xiaozhi/config/
# Windows (enter in File Explorer address bar)
%LOCALAPPDATA%\py-xiaozhi\config2. Configuration File Generation
# First run auto-generates configuration
python main.py
# Regenerate default configuration (delete config file in user data directory)
# macOS/Linux example:
rm ~/Library/Application\ Support/py-xiaozhi/config/config.json
python main.py3. Configuration Backup and Restore
# macOS/Linux example:
CONFIG_DIR=~/Library/Application\ Support/py-xiaozhi/config
# Backup configuration
cp "$CONFIG_DIR/config.json" "$CONFIG_DIR/config.json.bak"
# Restore configuration
cp "$CONFIG_DIR/config.json.bak" "$CONFIG_DIR/config.json"Configuration File Template
Complete Configuration Example
{
"SYSTEM_OPTIONS": {
"CLIENT_ID": "12345678-1234-1234-1234-123456789012",
"DEVICE_ID": "00:11:22:33:44:55",
"WINDOW_SIZE_MODE": "screen_100",
"NETWORK": {
"OTA_VERSION_URL": "https://api.tenclass.net/xiaozhi/ota/",
"WEBSOCKET_URL": "wss://api.tenclass.net/xiaozhi/v1/",
"WEBSOCKET_ACCESS_TOKEN": "your_access_token",
"MQTT_INFO": {
"endpoint": "mqtt.server.com",
"client_id": "xiaozhi_client",
"username": "your_username",
"password": "your_password",
"publish_topic": "xiaozhi/commands",
"subscribe_topic": "xiaozhi/responses"
},
"ACTIVATION_VERSION": "v2",
"AUTHORIZATION_URL": "https://xiaozhi.me/"
}
},
"WAKE_WORD_OPTIONS": {
"USE_WAKE_WORD": true,
"MODEL_PATH": "models/zh",
"NUM_THREADS": 4,
"PROVIDER": "cpu",
"MAX_ACTIVE_PATHS": 2,
"KEYWORDS_SCORE": 1.8,
"KEYWORDS_THRESHOLD": 0.2,
"NUM_TRAILING_BLANKS": 1,
"WAKE_WORD": "你好小智",
"WAKE_WORD_LANG": "zh"
},
"CAMERA": {
"camera_index": 0,
"frame_width": 640,
"frame_height": 480,
"fps": 30,
"Local_VL_url": "https://open.bigmodel.cn/api/paas/v4/",
"VLapi_key": "your_zhipu_api_key",
"models": "glm-4v-plus"
},
"SHORTCUTS": {
"ENABLED": true,
"MANUAL_PRESS": {
"modifier": "ctrl",
"key": "j",
"description": "Push-to-talk"
},
"AUTO_TOGGLE": {
"modifier": "ctrl",
"key": "k",
"description": "Auto conversation"
},
"ABORT": {
"modifier": "ctrl",
"key": "q",
"description": "Abort conversation"
},
"MODE_TOGGLE": {
"modifier": "ctrl",
"key": "m",
"description": "Switch mode"
},
"WINDOW_TOGGLE": {
"modifier": "ctrl",
"key": "w",
"description": "Show/hide window"
}
},
"AEC_OPTIONS": {
"ENABLED": false,
"BUFFER_MAX_LENGTH": 200,
"FRAME_DELAY": 3,
"FILTER_LENGTH_RATIO": 0.4,
"ENABLE_PREPROCESS": true,
"MODE": "voice_processing"
},
"AUDIO_DEVICES": {
"input_device_id": null,
"input_device_name": null,
"output_device_id": null,
"output_device_name": null,
"input_sample_rate": null,
"output_sample_rate": null,
"input_channels": 1,
"output_channels": 2,
"opus_output_sample_rate": 24000,
"frame_duration": 20
},
"LOGGING": {
"LEVEL": "INFO",
"FORMAT_TYPE": "colored",
"ENABLE_CONSOLE": true,
"ENABLE_FILE": true,
"ENABLE_ERROR_FILE": true,
"ENABLE_JSON_FILE": false,
"ENABLE_ASYNC": false,
"ENABLE_SENSITIVE_FILTER": true,
"MAX_BYTES": 10485760,
"BACKUP_COUNT": 30,
"ROTATION_WHEN": "midnight",
"THIRD_PARTY_LEVELS": {
"urllib3": "WARNING",
"websockets": "WARNING",
"asyncio": "WARNING",
"paho": "WARNING",
"PIL": "WARNING"
}
}
}