Skip to content

Configuration Guide

Configuration System Overview

Configuration File Location

py-xiaozhi's configuration files are stored in the user data directory (not the project root), following each platform's conventions:

PlatformConfiguration Directory
WindowsC:\Users\<username>\AppData\Local\py-xiaozhi\config\
macOS~/Library/Application Support/py-xiaozhi/config/
Linux~/.local/share/py-xiaozhi/config/
<user data directory>/
├── config/
│   ├── config.json      # Main configuration file (runtime config)
│   └── efuse.json       # Device identity file (auto-generated)
├── logs/                # Log files
└── cache/               # Cache files

Tip: Developers who need to customize the app name can modify APP_NAME in src/constants/system.py to change the data directory name.

Configuration Hierarchy

  1. Default Configuration Template

    • Location: DEFAULT_CONFIG in src/utils/config_manager.py
    • Purpose: Provides system default configuration values
    • Usage: Template for auto-generating config files on first run
  2. Runtime Configuration File

    • Location: <user data directory>/config/config.json
    • Purpose: Stores user-customized configuration
    • Usage: The actual configuration read at runtime
  3. Device Identity File

    • Location: <user data directory>/config/efuse.json
    • Purpose: Stores device unique identifier and activation status
    • Usage: Device activation and identity verification

Configuration Access Methods

The configuration system supports dot-separated path access for convenient retrieval and modification of nested configurations:

python
# Configuration access examples
from src.utils.config_manager import ConfigManager
config = ConfigManager.get_instance()

# Get network configuration
websocket_url = config.get_config("SYSTEM_OPTIONS.NETWORK.WEBSOCKET_URL")

# Get wake word configuration
wake_words = config.get_config("WAKE_WORD_OPTIONS.WAKE_WORDS")

# Update configuration
config.update_config("WAKE_WORD_OPTIONS.USE_WAKE_WORD", True)
config.update_config("CAMERA.VLapi_key", "your_api_key_here")

# Reload configuration
config.reload_config()

System Configuration (SYSTEM_OPTIONS)

Basic System Configuration

json
{
  "SYSTEM_OPTIONS": {
    "CLIENT_ID": "auto-generated client ID",
    "DEVICE_ID": "device MAC address",
    "WINDOW_SIZE_MODE": "screen_100",
    "NETWORK": {
      "OTA_VERSION_URL": "https://api.tenclass.net/xiaozhi/ota/",
      "WEBSOCKET_URL": "wss://api.tenclass.net/xiaozhi/v1/",
      "WEBSOCKET_ACCESS_TOKEN": "access token",
      "MQTT_INFO": {
        "endpoint": "mqtt.server.com",
        "client_id": "xiaozhi_client",
        "username": "your_username",
        "password": "your_password",
        "publish_topic": "xiaozhi/commands",
        "subscribe_topic": "xiaozhi/responses"
      },
      "ACTIVATION_VERSION": "v2",
      "AUTHORIZATION_URL": "https://xiaozhi.me/"
    }
  }
}

Configuration Item Descriptions

Configuration ItemTypeDefault ValueDescription
CLIENT_IDStringAuto-generatedUnique client identifier
DEVICE_IDStringMAC addressUnique device identifier
WINDOW_SIZE_MODEString"screen_100"Window size mode
OTA_VERSION_URLStringOfficial OTA URLOTA configuration retrieval URL
WEBSOCKET_URLStringDelivered by OTAWebSocket server address
WEBSOCKET_ACCESS_TOKENStringDelivered by OTAWebSocket access token
ACTIVATION_VERSIONString"v2"Activation protocol version (v1/v2)
AUTHORIZATION_URLString"https://xiaozhi.me/"Device authorization URL

Switching Server Configuration

Switching to a Self-Hosted Server

  • To use your own server, simply modify the OTA API address. The system will automatically obtain WebSocket connection information from the OTA server:
  • Set AUTHORIZATION_URL to your device binding backend (e.g., xiaozhi.me). Clicking it will automatically open this web page.
json
{
  "SYSTEM_OPTIONS": {
    "NETWORK": {
      "OTA_VERSION_URL": "https://your-server.com/xiaozhi/ota/",
      "AUTHORIZATION_URL": "https://xiaozhi.me/"
    }
  }
}

Configuration Auto-Update Mechanism

On startup, the system automatically updates configuration through the following process:

  1. OTA Configuration Retrieval: Sends a POST request to OTA_VERSION_URL
  2. Auto Configuration Update: System automatically updates MQTT and WebSocket configuration
  3. Connection Establishment: Establishes connection using updated configuration

Related code locations:

  • OTA configuration retrieval: fetch_and_update_config() method in src/core/ota.py
  • Configuration update: update_websocket_config() and update_mqtt_config() methods in src/core/ota.py

Disabling Configuration Auto-Update

If you don't need automatic configuration updates, comment out the relevant code in these locations:

1. Disable OTA Configuration Retrieval

Comment out the third phase in src/core/system_initializer.py:

python
# async def stage_3_ota_config(self):
#     """
#     Phase 3: OTA configuration retrieval.
#     """
#     # Comment out the entire method body

2. Disable WebSocket Configuration Update

Comment out the update method in src/core/ota.py:

python
async def update_websocket_config(self, response_data):
    """
    Update WebSocket configuration.
    """
    # Comment out configuration update logic
    return None

3. Manually Configure WebSocket Connection

Directly configure fixed connection info in the configuration file:

json
{
  "SYSTEM_OPTIONS": {
    "NETWORK": {
      "WEBSOCKET_URL": "wss://your-server.com/xiaozhi/v1/",
      "WEBSOCKET_ACCESS_TOKEN": "your_fixed_token"
    }
  }
}

Wake Word Configuration (WAKE_WORD_OPTIONS)

Sherpa-ONNX Voice Wake-Up Settings

json
{
  "WAKE_WORD_OPTIONS": {
    "USE_WAKE_WORD": true,
    "MODEL_PATH": "models/zh",
    "NUM_THREADS": 4,
    "PROVIDER": "cpu",
    "MAX_ACTIVE_PATHS": 2,
    "KEYWORDS_SCORE": 1.8,
    "KEYWORDS_THRESHOLD": 0.2,
    "NUM_TRAILING_BLANKS": 1,
    "WAKE_WORD": "你好小智",
    "WAKE_WORD_LANG": "zh"
  }
}

Configuration Item Descriptions

Configuration ItemTypeDefault ValueDescription
USE_WAKE_WORDBooleantrueEnable voice wake-up
MODEL_PATHString"models"Sherpa-ONNX model file directory
NUM_THREADSInteger4Model inference threads (affects response speed)
PROVIDERString"cpu"Inference engine (cpu/cuda/coreml)
MAX_ACTIVE_PATHSInteger2Search path count (affects accuracy and speed)
KEYWORDS_SCOREFloat1.8Keyword boost score (affects detection sensitivity)
KEYWORDS_THRESHOLDFloat0.2Detection threshold (lower = more sensitive)
NUM_TRAILING_BLANKSInteger1Number of trailing blank tokens
WAKE_WORDString"你好小智"Wake word text
WAKE_WORD_LANGString"zh"Wake word language (zh/en)

Model File Structure

bash
models/
├── encoder.onnx      # Encoder model (high-precision version included)
├── decoder.onnx      # Decoder model
├── joiner.onnx       # Joiner model
├── tokens.txt        # Pinyin token mapping table
└── keywords.txt      # Keyword configuration file

Custom Wake Words

Edit models/keywords.txt to add wake words:

# Format: pinyin breakdown @Chinese original
n ǐ h ǎo x iǎo zh ì @你好小智
j iā w éi s ī @贾维斯
x iǎo zh ù sh ǒu @小助手
k āi sh ǐ g ōng z uò @开始工作

Performance Tuning

Speed-Priority Configuration

json
{
  "WAKE_WORD_OPTIONS": {
    "NUM_THREADS": 6,
    "MAX_ACTIVE_PATHS": 1,
    "KEYWORDS_THRESHOLD": 0.15,
    "KEYWORDS_SCORE": 1.5
  }
}

Accuracy-Priority Configuration

json
{
  "WAKE_WORD_OPTIONS": {
    "NUM_THREADS": 4,
    "MAX_ACTIVE_PATHS": 3,
    "KEYWORDS_THRESHOLD": 0.25,
    "KEYWORDS_SCORE": 2.2
  }
}

Camera Configuration (CAMERA)

Visual Recognition Settings

json
{
  "CAMERA": {
    "camera_index": 0,
    "frame_width": 640,
    "frame_height": 480,
    "fps": 30,
    "Local_VL_url": "https://open.bigmodel.cn/api/paas/v4/",
    "VLapi_key": "your_zhipu_api_key",
    "models": "glm-4v-plus"
  }
}

Configuration Item Descriptions

Configuration ItemTypeDefault ValueDescription
camera_indexInteger0Camera device index
frame_widthInteger640Frame width
frame_heightInteger480Frame height
fpsInteger30Frame rate
Local_VL_urlStringZhipu API URLVisual model API URL
VLapi_keyString""Zhipu API key
modelsString"glm-4v-plus"Visual model name

Camera Testing

bash
# Test camera functionality
python scripts/camera_scanner.py

# Test visual recognition in the program
Ask verbally: "What's in front of the camera?"

Shortcut Configuration (SHORTCUTS)

Global Shortcut Settings

json
{
  "SHORTCUTS": {
    "ENABLED": true,
    "MANUAL_PRESS": {
      "modifier": "ctrl",
      "key": "j",
      "description": "Push-to-talk"
    },
    "AUTO_TOGGLE": {
      "modifier": "ctrl",
      "key": "k",
      "description": "Auto conversation"
    },
    "ABORT": {
      "modifier": "ctrl",
      "key": "q",
      "description": "Abort conversation"
    },
    "MODE_TOGGLE": {
      "modifier": "ctrl",
      "key": "m",
      "description": "Switch mode"
    },
    "WINDOW_TOGGLE": {
      "modifier": "ctrl",
      "key": "w",
      "description": "Show/hide window"
    }
  }
}

Shortcut Descriptions

ShortcutFunctionDescription
Ctrl+JPush-to-TalkRecords while held, sends on release
Ctrl+KAuto ConversationToggle auto conversation mode on/off
Ctrl+QAbort ConversationInterrupt current conversation
Ctrl+MSwitch ModeToggle between conversation modes
Ctrl+WShow/Hide WindowShow or hide the main window

Acoustic Echo Cancellation Configuration (AEC_OPTIONS)

AEC Audio Processing Settings

json
{
  "AEC_OPTIONS": {
    "ENABLED": false,
    "BUFFER_MAX_LENGTH": 200,
    "FRAME_DELAY": 3,
    "FILTER_LENGTH_RATIO": 0.4,
    "ENABLE_PREPROCESS": true,
    "MODE": "voice_processing"
  }
}

Configuration Item Descriptions

Configuration ItemTypeDefault ValueDescription
ENABLEDBooleanfalseEnable AEC echo cancellation
BUFFER_MAX_LENGTHInteger200Reference signal buffer size (in frames)
FRAME_DELAYInteger3Delay compensation frames (currently unused)
FILTER_LENGTH_RATIOFloat0.4Filter length ratio (seconds), affects echo cancellation strength
ENABLE_PREPROCESSBooleantrueEnable noise suppression preprocessing
MODEString"voice_processing"AEC processing mode

AEC Feature Description

Echo Cancellation

  • Eliminates echo from speaker playback picked up by the microphone
  • Supports real-time bidirectional conversation without echo interference

Noise Suppression

  • Suppresses background noise and environmental interference
  • Improves speech recognition accuracy

Conversation Mode Impact

json
{
  "AEC_OPTIONS": {
    "ENABLED": true  // When enabled: Real-time conversation mode (ListeningMode.REALTIME)
                     // When disabled: Turn-based conversation mode (ListeningMode.AUTO_STOP)
  }
}

Environment Optimization Suggestions

Small Room / Office Environment

json
{
  "AEC_OPTIONS": {
    "FILTER_LENGTH_RATIO": 0.2,
    "BUFFER_MAX_LENGTH": 150
  }
}

Large Room / Meeting Room Environment

json
{
  "AEC_OPTIONS": {
    "FILTER_LENGTH_RATIO": 0.6,
    "BUFFER_MAX_LENGTH": 300
  }
}

Noisy Environment

json
{
  "AEC_OPTIONS": {
    "FILTER_LENGTH_RATIO": 0.8,
    "ENABLE_PREPROCESS": true,
    "BUFFER_MAX_LENGTH": 400
  }
}

AEC Feature Testing

bash
# Test echo cancellation effect
# 1. Enable AEC, speak while playing music
python main.py  # Should have no echo when AEC is enabled

# 2. Disable AEC for comparison
# Set "ENABLED": false in config file
python main.py  # May have echo when AEC is disabled

Performance Parameter Details

Filter Length Calculation

Actual filter length = sample rate (16000Hz) × FILTER_LENGTH_RATIO
Example: FILTER_LENGTH_RATIO = 0.4 → filter length = 6400 samples (0.4 seconds)

Parameter Effects

  • Filter length ↑: Echo cancellation effectiveness ↑, CPU usage ↑
  • Buffer size ↑: Stability ↑, memory usage ↑
  • Preprocessing enabled: Noise suppression ↑, slight latency ↑

Audio Device Configuration (AUDIO_DEVICES)

Audio Input/Output Settings

json
{
  "AUDIO_DEVICES": {
    "input_device_id": null,
    "input_device_name": null,
    "output_device_id": null,
    "output_device_name": null,
    "input_sample_rate": null,
    "output_sample_rate": null,
    "input_channels": 1,
    "output_channels": 2,
    "opus_output_sample_rate": 24000,
    "frame_duration": 20
  }
}

Configuration Item Descriptions

Configuration ItemTypeDefaultDescription
input_device_idIntegernullInput device ID (auto-detected)
input_device_nameStringnullInput device name
output_device_idIntegernullOutput device ID (auto-detected)
output_device_nameStringnullOutput device name
input_sample_rateIntegernullInput sample rate (auto-detected)
output_sample_rateIntegernullOutput sample rate (auto-detected)
input_channelsInteger1Input channel count
output_channelsInteger2Output channel count
opus_output_sample_rateInteger24000Opus decode sample rate: 24000 (official) or 16000 (third-party)
frame_durationInteger20Audio frame duration (ms): 20 (low latency) / 40 (balanced) / 60 (low CPU)

Sample Rate and Frame Duration Details

Opus Decode Sample Rate

  • 24000: Used by the official server, better audio quality
  • 16000: Used by third-party servers, better compatibility

Frame Duration Selection

Frame duration primarily affects input encoding and device callback frequency. The audio frame duration returned by the server is auto-detected by the client (via Opus TOC byte parsing) and does not need manual matching.

  • 20ms: Low latency, suitable for real-time conversation (x86 default)
  • 40ms: Balanced mode, balanced latency and performance
  • 60ms: Low CPU usage, suitable for Raspberry Pi and other performance-constrained devices

Audio Device Auto-Detection

The system automatically detects available audio devices on startup. To manually specify devices, configure the device ID and name:

json
{
  "AUDIO_DEVICES": {
    "input_device_id": 2,
    "input_device_name": "MacBook Air Microphone",
    "output_device_id": 1,
    "output_device_name": "MacBook Air Speakers"
  }
}

Logging Configuration (LOGGING)

Logging System Settings

json
{
  "LOGGING": {
    "LEVEL": "INFO",
    "FORMAT_TYPE": "colored",
    "ENABLE_CONSOLE": true,
    "ENABLE_FILE": true,
    "ENABLE_ERROR_FILE": true,
    "ENABLE_JSON_FILE": false,
    "ENABLE_ASYNC": false,
    "ENABLE_SENSITIVE_FILTER": true,
    "MAX_BYTES": 10485760,
    "BACKUP_COUNT": 30,
    "ROTATION_WHEN": "midnight",
    "THIRD_PARTY_LEVELS": {
      "urllib3": "WARNING",
      "websockets": "WARNING",
      "asyncio": "WARNING",
      "paho": "WARNING",
      "PIL": "WARNING"
    }
  }
}

Configuration Item Descriptions

Configuration ItemTypeDefault ValueDescription
LEVELString"INFO"Log level: DEBUG/INFO/WARNING/ERROR/CRITICAL
FORMAT_TYPEString"colored"Output format: colored/json/simple
ENABLE_CONSOLEBooleantrueEnable console output
ENABLE_FILEBooleantrueEnable file logging
ENABLE_ERROR_FILEBooleantrueEnable separate error log file
ENABLE_JSON_FILEBooleanfalseEnable JSON format log file
ENABLE_ASYNCBooleanfalseEnable async logging
ENABLE_SENSITIVE_FILTERBooleantrueEnable sensitive information filtering
MAX_BYTESInteger10485760Maximum bytes per log file (10MB)
BACKUP_COUNTInteger30Number of log backups to retain
ROTATION_WHENString"midnight"Log rotation timing: midnight/H/D
THIRD_PARTY_LEVELSObjectThird-party library log level configuration

Log Level Descriptions

LevelDescription
DEBUGDetailed debug info, for development
INFOGeneral runtime info (default)
WARNINGWarning messages
ERRORError messages
CRITICALCritical errors

Development Debug Configuration

json
{
  "LOGGING": {
    "LEVEL": "DEBUG",
    "FORMAT_TYPE": "colored",
    "ENABLE_CONSOLE": true,
    "ENABLE_FILE": true
  }
}

Production Environment Configuration

json
{
  "LOGGING": {
    "LEVEL": "WARNING",
    "FORMAT_TYPE": "json",
    "ENABLE_CONSOLE": false,
    "ENABLE_FILE": true,
    "ENABLE_JSON_FILE": true,
    "ENABLE_ASYNC": true
  }
}

Protocol Configuration Details

WebSocket Protocol Configuration

WebSocket connection information is typically delivered automatically by the OTA server and does not require manual configuration:

json
{
  "SYSTEM_OPTIONS": {
    "NETWORK": {
      "WEBSOCKET_URL": "wss://your-server.com/xiaozhi/v1/",
      "WEBSOCKET_ACCESS_TOKEN": "your_access_token"
    }
  }
}

Configuration Notes:

  • URL must start with ws:// or wss://
  • Supports IP addresses or domain names
  • Default port is 8000, adjustable based on server configuration
  • Access token is used for authentication
  • Usually auto-configured by the OTA server, no manual setup required

MQTT Protocol Configuration

json
{
  "SYSTEM_OPTIONS": {
    "NETWORK": {
      "MQTT_INFO": {
        "endpoint": "mqtt.server.com",
        "port": 1883,
        "client_id": "xiaozhi_client_001",
        "username": "your_username",
        "password": "your_password",
        "publish_topic": "xiaozhi/commands",
        "subscribe_topic": "xiaozhi/responses",
        "qos": 1,
        "keep_alive": 60
      }
    }
  }
}

Configuration Notes:

  • endpoint: MQTT server address
  • port: Typically 1883 (unencrypted) or 8883 (TLS encrypted)
  • client_id: Unique client identifier
  • qos: Quality of Service level (0-2)
  • keep_alive: Heartbeat interval (seconds)

Device Activation Configuration

Activation Version Details

json
{
  "SYSTEM_OPTIONS": {
    "NETWORK": {
      "ACTIVATION_VERSION": "v2",
      "AUTHORIZATION_URL": "https://xiaozhi.me/"
    }
  }
}

Version Differences:

  • v1: Simplified activation process, no verification code required
  • v2: Full activation process with verification code

Device Identity File (efuse.json)

json
{
  "serial_number": "SN-E3E1F618-902e16dbe116",
  "hmac_key": "b5bf012dd518080532f928b70ed958799f34f9224e80dd4128795a70a5baca24",
  "activation_status": false,
  "mac_address": "00:11:22:33:44:55",
  "device_fingerprint": {
    "cpu_info": "...",
    "memory_info": "...",
    "disk_info": "..."
  }
}

Field Descriptions:

  • serial_number: Device serial number
  • hmac_key: Device verification key
  • activation_status: Activation status
  • mac_address: Device MAC address
  • device_fingerprint: Device fingerprint information

Configuration Management Tips

1. Find Configuration File Location

bash
# macOS
open ~/Library/Application\ Support/py-xiaozhi/config/

# Linux
xdg-open ~/.local/share/py-xiaozhi/config/

# Windows (enter in File Explorer address bar)
%LOCALAPPDATA%\py-xiaozhi\config

2. Configuration File Generation

bash
# First run auto-generates configuration
python main.py

# Regenerate default configuration (delete config file in user data directory)
# macOS/Linux example:
rm ~/Library/Application\ Support/py-xiaozhi/config/config.json
python main.py

3. Configuration Backup and Restore

bash
# macOS/Linux example:
CONFIG_DIR=~/Library/Application\ Support/py-xiaozhi/config

# Backup configuration
cp "$CONFIG_DIR/config.json" "$CONFIG_DIR/config.json.bak"

# Restore configuration
cp "$CONFIG_DIR/config.json.bak" "$CONFIG_DIR/config.json"

Configuration File Template

Complete Configuration Example

json
{
  "SYSTEM_OPTIONS": {
    "CLIENT_ID": "12345678-1234-1234-1234-123456789012",
    "DEVICE_ID": "00:11:22:33:44:55",
    "WINDOW_SIZE_MODE": "screen_100",
    "NETWORK": {
      "OTA_VERSION_URL": "https://api.tenclass.net/xiaozhi/ota/",
      "WEBSOCKET_URL": "wss://api.tenclass.net/xiaozhi/v1/",
      "WEBSOCKET_ACCESS_TOKEN": "your_access_token",
      "MQTT_INFO": {
        "endpoint": "mqtt.server.com",
        "client_id": "xiaozhi_client",
        "username": "your_username",
        "password": "your_password",
        "publish_topic": "xiaozhi/commands",
        "subscribe_topic": "xiaozhi/responses"
      },
      "ACTIVATION_VERSION": "v2",
      "AUTHORIZATION_URL": "https://xiaozhi.me/"
    }
  },
  "WAKE_WORD_OPTIONS": {
    "USE_WAKE_WORD": true,
    "MODEL_PATH": "models/zh",
    "NUM_THREADS": 4,
    "PROVIDER": "cpu",
    "MAX_ACTIVE_PATHS": 2,
    "KEYWORDS_SCORE": 1.8,
    "KEYWORDS_THRESHOLD": 0.2,
    "NUM_TRAILING_BLANKS": 1,
    "WAKE_WORD": "你好小智",
    "WAKE_WORD_LANG": "zh"
  },
  "CAMERA": {
    "camera_index": 0,
    "frame_width": 640,
    "frame_height": 480,
    "fps": 30,
    "Local_VL_url": "https://open.bigmodel.cn/api/paas/v4/",
    "VLapi_key": "your_zhipu_api_key",
    "models": "glm-4v-plus"
  },
  "SHORTCUTS": {
    "ENABLED": true,
    "MANUAL_PRESS": {
      "modifier": "ctrl",
      "key": "j",
      "description": "Push-to-talk"
    },
    "AUTO_TOGGLE": {
      "modifier": "ctrl",
      "key": "k",
      "description": "Auto conversation"
    },
    "ABORT": {
      "modifier": "ctrl",
      "key": "q",
      "description": "Abort conversation"
    },
    "MODE_TOGGLE": {
      "modifier": "ctrl",
      "key": "m",
      "description": "Switch mode"
    },
    "WINDOW_TOGGLE": {
      "modifier": "ctrl",
      "key": "w",
      "description": "Show/hide window"
    }
  },
  "AEC_OPTIONS": {
    "ENABLED": false,
    "BUFFER_MAX_LENGTH": 200,
    "FRAME_DELAY": 3,
    "FILTER_LENGTH_RATIO": 0.4,
    "ENABLE_PREPROCESS": true,
    "MODE": "voice_processing"
  },
  "AUDIO_DEVICES": {
    "input_device_id": null,
    "input_device_name": null,
    "output_device_id": null,
    "output_device_name": null,
    "input_sample_rate": null,
    "output_sample_rate": null,
    "input_channels": 1,
    "output_channels": 2,
    "opus_output_sample_rate": 24000,
    "frame_duration": 20
  },
  "LOGGING": {
    "LEVEL": "INFO",
    "FORMAT_TYPE": "colored",
    "ENABLE_CONSOLE": true,
    "ENABLE_FILE": true,
    "ENABLE_ERROR_FILE": true,
    "ENABLE_JSON_FILE": false,
    "ENABLE_ASYNC": false,
    "ENABLE_SENSITIVE_FILTER": true,
    "MAX_BYTES": 10485760,
    "BACKUP_COUNT": 30,
    "ROTATION_WHEN": "midnight",
    "THIRD_PARTY_LEVELS": {
      "urllib3": "WARNING",
      "websockets": "WARNING",
      "asyncio": "WARNING",
      "paho": "WARNING",
      "PIL": "WARNING"
    }
  }
}