py-xiaozhi
English | ็ฎไฝไธญๆ
Project Introduction
py-xiaozhi is a Python-based Xiaozhi voice client, designed to learn coding and experience AI voice interaction without hardware requirements. This repository is ported from xiaozhi-esp32.
Demo

Features
๐ฏ Core AI Capabilities
- AI Voice Interaction: Supports voice input and recognition, enabling intelligent human-computer interaction with natural conversation flow
- Visual Multimodal: Supports image recognition and processing, providing multimodal interaction capabilities and image content understanding
- Intelligent Wake-up: Supports multiple wake word activation for hands-free interaction (configurable)
- Continuous Dialogue Mode: Implements seamless conversation experience, enhancing user interaction fluidity
๐ง MCP Tools Ecosystem
- System Control Tools: System status monitoring, application management, volume control, device management
- Calendar Management Tools: Full-featured calendar system with create, query, update, delete events, intelligent categorization and reminders
- Timer Tools: Countdown timer functionality with delayed MCP tool execution and parallel task management
- Music Player Tools: Online music search and playback with playback controls, lyrics display, and local cache management
- 12306 Query Tools: 12306 railway ticket query with train tickets, transfer queries, and route information
- Search Tools: Web search and content retrieval with Bing search integration and intelligent content parsing
- Recipe Tools: Rich recipe database with search, category browsing, and intelligent recommendations
- Map Tools: Amap services with geocoding, route planning, nearby search, and weather queries
- Bazi Fortune Tools: Traditional Chinese fortune-telling with Bazi calculation, marriage analysis, and lunar calendar queries
- Camera Tools: Image capture and AI analysis with photo recognition and intelligent Q&A
๐ IoT Device Integration
- Device Management Architecture: Unified device management based on Thing pattern with asynchronous property and method calls
- Smart Home Control: Supports lighting, volume, temperature sensors, and other device control
- State Synchronization: Real-time status monitoring with incremental updates and concurrent state retrieval
- Extensible Design: Modular device drivers, easy to add new device types
๐ต Advanced Audio Processing
- Multi-level Audio Processing: Supports Opus codec and real-time resampling
- Voice Activity Detection: VAD detector for intelligent interruption with real-time voice activity monitoring
- Wake Word Detection: Sherpa-ONNX-based offline speech recognition with multiple wake words and pinyin matching
- Audio Stream Management: Independent input/output streams with stream rebuild and error recovery
- Audio Echo Cancellation: Integrated WebRTC audio processing module providing high-quality echo cancellation
- System Audio Recording: Supports system audio recording with audio loopback processing
๐ฅ๏ธ User Interface
- Graphical Interface: Modern PyQt5-based GUI with Xiaozhi expressions and text display for enhanced visual experience
- Command Line Mode: CLI support suitable for embedded devices or GUI-less environments
- System Tray: Background operation support with integrated system tray functionality
- Global Hotkeys: Global hotkey support for improved usability
- Settings Interface: Complete settings management interface with configuration customization
๐ Security & Stability
- Encrypted Audio Transmission: WSS protocol support ensuring audio data security and preventing information leakage
- Device Activation System: Dual v1/v2 protocol activation with automatic verification code and device fingerprint handling
- Error Recovery: Complete error handling and recovery mechanisms with reconnection support
๐ Cross-platform Support
- System Compatibility: Compatible with Windows 10+, macOS 10.15+, and Linux systems
- Protocol Support: WebSocket and MQTT dual protocol communication support
- Multi-environment Deployment: GUI and CLI dual modes adapting to different deployment environments
- Platform Optimization: Audio and system control optimization for different platforms
๐ง Developer Friendly
- Modular Architecture: Clean code structure with clear responsibility separation for secondary development
- Async First: Event-driven architecture based on asyncio for high-performance concurrent processing
- Configuration Management: Hierarchical configuration system with dot notation access and dynamic updates
- Logging System: Complete logging and debugging support
- API Documentation: Detailed code documentation and usage guides
System Requirements
Basic Requirements
- Python Version: 3.9 - 3.12
- Operating System: Windows 10+, macOS 10.15+, Linux
- Audio Devices: Microphone and speaker devices
- Network Connection: Stable internet connection (for AI services and online features)
Recommended Configuration
- Memory: At least 4GB RAM (8GB+ recommended)
- Processor: Modern CPU with AVX instruction set support
- Storage: At least 2GB available disk space (for model files and cache)
- Audio: Audio devices supporting 16kHz sampling rate
Optional Feature Requirements
- Voice Wake-up: Requires downloading Sherpa-ONNX speech recognition models
- Camera Features: Requires camera device and OpenCV support
Read This First
- Carefully read ้กน็ฎๆๆกฃ for startup tutorials and file descriptions
- The main branch has the latest code; manually reinstall pip dependencies after each update to ensure you have new dependencies
Zero to Xiaozhi Client (Video Tutorial)
Technical Architecture
Core Architecture Design
- Event-Driven Architecture: Based on asyncio asynchronous event loop, supporting high-concurrency processing
- Layered Design: Clear separation of application layer, protocol layer, device layer, and UI layer
- Singleton Pattern: Core components use singleton pattern to ensure unified resource management
- Plugin System: MCP tool system and IoT devices support plugin-based extension
Key Technical Components
- Audio Processing: Opus codec, WebRTC echo cancellation, real-time resampling, system audio recording
- Speech Recognition: Sherpa-ONNX offline models, voice activity detection, wake word recognition
- Protocol Communication: WebSocket/MQTT dual protocol support, encrypted transmission, auto-reconnection
- Configuration System: Hierarchical configuration, dot notation access, dynamic updates, JSON/YAML support
Performance Optimization
- Async First: Full system asynchronous architecture, avoiding blocking operations
- Memory Management: Smart caching, garbage collection
- Audio Optimization: 5ms low-latency processing, queue management, streaming transmission
- Concurrency Control: Task pool management, semaphore control, thread safety
Security Mechanisms
- Encrypted Communication: WSS/TLS encryption, certificate verification
- Device Authentication: Dual protocol activation, device fingerprint recognition
- Access Control: Tool permission management, API access control
- Error Isolation: Exception isolation, fault recovery, graceful degradation
Development Guide
Project Structure
py-xiaozhi/
โโโ main.py # Application main entry (CLI argument handling)
โโโ src/
โ โโโ application.py # Application core logic
โ โโโ audio_codecs/ # Audio codecs
โ โ โโโ aec_processor.py # Audio echo cancellation processor
โ โ โโโ audio_codec.py # Audio codec base class
โ โ โโโ system_audio_recorder.py # System audio recorder
โ โโโ audio_processing/ # Audio processing modules
โ โ โโโ vad_detector.py # Voice activity detection
โ โ โโโ wake_word_detect.py # Wake word detection
โ โโโ core/ # Core components
โ โ โโโ ota.py # Over-the-air update module
โ โ โโโ system_initializer.py # System initializer
โ โโโ display/ # Display interface abstraction layer
โ โโโ iot/ # IoT device management
โ โ โโโ thing.py # Device base class
โ โ โโโ thing_manager.py # Device manager
โ โ โโโ things/ # Concrete device implementations
โ โโโ mcp/ # MCP tool system
โ โ โโโ mcp_server.py # MCP server
โ โ โโโ tools/ # Various tool modules
โ โโโ protocols/ # Communication protocols
โ โโโ utils/ # Utility functions
โ โโโ views/ # UI view components
โโโ libs/ # Third-party native libraries
โ โโโ libopus/ # Opus audio codec library
โ โโโ webrtc_apm/ # WebRTC audio processing module
โ โโโ SystemAudioRecorder/ # System audio recording tool
โโโ config/ # Configuration file directory
โโโ models/ # Speech model files
โโโ assets/ # Static resource files
โโโ scripts/ # Auxiliary scripts
โโโ requirements.txt # Python dependency package list
โโโ build.json # Build configuration file
Development Environment Setup
# Clone project
git clone https://github.com/huangjunsen0406/py-xiaozhi.git
cd py-xiaozhi
# Install dependencies
pip install -r requirements.txt
# Code formatting
./format_code.sh
# Run program - GUI mode (default)
python main.py
# Run program - CLI mode
python main.py --mode cli
# Specify communication protocol
python main.py --protocol websocket # WebSocket (default)
python main.py --protocol mqtt # MQTT protocol
Core Development Patterns
- Async First: Use
async/await syntax, avoid blocking operations
- Error Handling: Complete exception handling and logging
- Configuration Management: Use
ConfigManager for unified configuration access
- Test-Driven: Write unit tests to ensure code quality
Extension Development
- Add MCP Tools: Create new tool modules in
src/mcp/tools/ directory
- Add IoT Devices: Inherit from
Thing base class to implement new devices
- Add Protocols: Implement
Protocol abstract base class
- Add Interfaces: Extend
BaseDisplay to implement new UI components
State Transition Diagram
+----------------+
| |
v |
+------+ Wake/Button +------------+ | +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING |
+------+ +------------+ +------------+
^ |
| | Voice Recognition Complete
| +------------+ v
+--------- | SPEAKING | <-----------------+
Playback +------------+
Complete
Contribution Guidelines
We welcome issue reports and code contributions. Please ensure you follow these specifications:
- Code style complies with PEP8 standards
- PR submissions include appropriate tests
- Update relevant documentation
Community and Support
Thanks to the Following Open Source Contributors
In no particular order
Xiaoxia
zhh827
SmartArduino-Li Honggang
HonestQiao
vonweller
Sun Weigong
isamu2025
Rain120
kejily
Radio bilibili Jun
Cyber Intelligence
Sponsorship Support
Thanks to All Sponsors โค๏ธ
Whether it's API resources, device compatibility testing, or financial support, every contribution makes the project more complete
Project Statistics

License
MIT License