Sin descripción

1 Ramas

liuq ce5b2cf93a 点击视频不会退出窗口		hace 3 meses
assets	95d491e160 更新	hace 3 meses
documents	95d491e160 更新	hace 3 meses
libs	95d491e160 更新	hace 3 meses
scripts	95d491e160 更新	hace 3 meses
src	ce5b2cf93a 点击视频不会退出窗口	hace 3 meses
.flake8	95d491e160 更新	hace 3 meses
.gitignore	95d491e160 更新	hace 3 meses
LICENSE	95d491e160 更新	hace 3 meses
README.en.md	95d491e160 更新	hace 3 meses
README.md	95d491e160 更新	hace 3 meses
authorize_python_access.sh	95d491e160 更新	hace 3 meses
build.json	95d491e160 更新	hace 3 meses
checke_opus.sh	95d491e160 更新	hace 3 meses
entitlements.plist	95d491e160 更新	hace 3 meses
format_code.bat	95d491e160 更新	hace 3 meses
format_code.sh	95d491e160 更新	hace 3 meses
main.py	95d491e160 更新	hace 3 meses
pyproject.toml	95d491e160 更新	hace 3 meses
requirements.txt	6505ae2748 更新库	hace 3 meses
requirements_mac.txt	95d491e160 更新	hace 3 meses
run.bat	d0c111a8b3 更新启动脚本	hace 3 meses
小智.spec	6505ae2748 更新库	hace 3 meses

py-xiaozhi

English | 简体中文

Project Introduction

py-xiaozhi is a Python-based Xiaozhi voice client, designed to learn coding and experience AI voice interaction without hardware requirements. This repository is ported from xiaozhi-esp32.

Demo

Bilibili Demo Video

Features

🎯 Core AI Capabilities

AI Voice Interaction: Supports voice input and recognition, enabling intelligent human-computer interaction with natural conversation flow
Visual Multimodal: Supports image recognition and processing, providing multimodal interaction capabilities and image content understanding
Intelligent Wake-up: Supports multiple wake word activation for hands-free interaction (configurable)
Continuous Dialogue Mode: Implements seamless conversation experience, enhancing user interaction fluidity

🔧 MCP Tools Ecosystem

System Control Tools: System status monitoring, application management, volume control, device management
Calendar Management Tools: Full-featured calendar system with create, query, update, delete events, intelligent categorization and reminders
Timer Tools: Countdown timer functionality with delayed MCP tool execution and parallel task management
Music Player Tools: Online music search and playback with playback controls, lyrics display, and local cache management
12306 Query Tools: 12306 railway ticket query with train tickets, transfer queries, and route information
Search Tools: Web search and content retrieval with Bing search integration and intelligent content parsing
Recipe Tools: Rich recipe database with search, category browsing, and intelligent recommendations
Map Tools: Amap services with geocoding, route planning, nearby search, and weather queries
Bazi Fortune Tools: Traditional Chinese fortune-telling with Bazi calculation, marriage analysis, and lunar calendar queries
Camera Tools: Image capture and AI analysis with photo recognition and intelligent Q&A

🏠 IoT Device Integration

Device Management Architecture: Unified device management based on Thing pattern with asynchronous property and method calls
Smart Home Control: Supports lighting, volume, temperature sensors, and other device control
State Synchronization: Real-time status monitoring with incremental updates and concurrent state retrieval
Extensible Design: Modular device drivers, easy to add new device types

🎵 Advanced Audio Processing

Multi-level Audio Processing: Supports Opus codec and real-time resampling
Voice Activity Detection: VAD detector for intelligent interruption with real-time voice activity monitoring
Wake Word Detection: Sherpa-ONNX-based offline speech recognition with multiple wake words and pinyin matching
Audio Stream Management: Independent input/output streams with stream rebuild and error recovery
Audio Echo Cancellation: Integrated WebRTC audio processing module providing high-quality echo cancellation
System Audio Recording: Supports system audio recording with audio loopback processing

🖥️ User Interface

Graphical Interface: Modern PyQt5-based GUI with Xiaozhi expressions and text display for enhanced visual experience
Command Line Mode: CLI support suitable for embedded devices or GUI-less environments
System Tray: Background operation support with integrated system tray functionality
Global Hotkeys: Global hotkey support for improved usability
Settings Interface: Complete settings management interface with configuration customization

🔒 Security & Stability

Encrypted Audio Transmission: WSS protocol support ensuring audio data security and preventing information leakage
Device Activation System: Dual v1/v2 protocol activation with automatic verification code and device fingerprint handling
Error Recovery: Complete error handling and recovery mechanisms with reconnection support

🌐 Cross-platform Support

System Compatibility: Compatible with Windows 10+, macOS 10.15+, and Linux systems
Protocol Support: WebSocket and MQTT dual protocol communication support
Multi-environment Deployment: GUI and CLI dual modes adapting to different deployment environments
Platform Optimization: Audio and system control optimization for different platforms

🔧 Developer Friendly

Modular Architecture: Clean code structure with clear responsibility separation for secondary development
Async First: Event-driven architecture based on asyncio for high-performance concurrent processing
Configuration Management: Hierarchical configuration system with dot notation access and dynamic updates
Logging System: Complete logging and debugging support
API Documentation: Detailed code documentation and usage guides

System Requirements

Basic Requirements

Python Version: 3.9 - 3.12
Operating System: Windows 10+, macOS 10.15+, Linux
Audio Devices: Microphone and speaker devices
Network Connection: Stable internet connection (for AI services and online features)

Recommended Configuration

Memory: At least 4GB RAM (8GB+ recommended)
Processor: Modern CPU with AVX instruction set support
Storage: At least 2GB available disk space (for model files and cache)
Audio: Audio devices supporting 16kHz sampling rate

Optional Feature Requirements

Voice Wake-up: Requires downloading Sherpa-ONNX speech recognition models
Camera Features: Requires camera device and OpenCV support

Read This First

Carefully read 项目文档 for startup tutorials and file descriptions
The main branch has the latest code; manually reinstall pip dependencies after each update to ensure you have new dependencies

Zero to Xiaozhi Client (Video Tutorial)

Technical Architecture

Core Architecture Design

Event-Driven Architecture: Based on asyncio asynchronous event loop, supporting high-concurrency processing
Layered Design: Clear separation of application layer, protocol layer, device layer, and UI layer
Singleton Pattern: Core components use singleton pattern to ensure unified resource management
Plugin System: MCP tool system and IoT devices support plugin-based extension

Key Technical Components

Audio Processing: Opus codec, WebRTC echo cancellation, real-time resampling, system audio recording
Speech Recognition: Sherpa-ONNX offline models, voice activity detection, wake word recognition
Protocol Communication: WebSocket/MQTT dual protocol support, encrypted transmission, auto-reconnection
Configuration System: Hierarchical configuration, dot notation access, dynamic updates, JSON/YAML support

Performance Optimization

Async First: Full system asynchronous architecture, avoiding blocking operations
Memory Management: Smart caching, garbage collection
Audio Optimization: 5ms low-latency processing, queue management, streaming transmission
Concurrency Control: Task pool management, semaphore control, thread safety

Security Mechanisms

Encrypted Communication: WSS/TLS encryption, certificate verification
Device Authentication: Dual protocol activation, device fingerprint recognition
Access Control: Tool permission management, API access control
Error Isolation: Exception isolation, fault recovery, graceful degradation

Development Guide

Project Structure

py-xiaozhi/
├── main.py                     # Application main entry (CLI argument handling)
├── src/
│   ├── application.py          # Application core logic
│   ├── audio_codecs/           # Audio codecs
│   │   ├── aec_processor.py    # Audio echo cancellation processor
│   │   ├── audio_codec.py      # Audio codec base class
│   │   └── system_audio_recorder.py  # System audio recorder
│   ├── audio_processing/       # Audio processing modules
│   │   ├── vad_detector.py     # Voice activity detection
│   │   └── wake_word_detect.py # Wake word detection
│   ├── core/                   # Core components
│   │   ├── ota.py             # Over-the-air update module
│   │   └── system_initializer.py # System initializer
│   ├── display/                # Display interface abstraction layer
│   ├── iot/                    # IoT device management
│   │   ├── thing.py           # Device base class
│   │   ├── thing_manager.py   # Device manager
│   │   └── things/            # Concrete device implementations
│   ├── mcp/                    # MCP tool system
│   │   ├── mcp_server.py      # MCP server
│   │   └── tools/             # Various tool modules
│   ├── protocols/              # Communication protocols
│   ├── utils/                  # Utility functions
│   └── views/                  # UI view components
├── libs/                       # Third-party native libraries
│   ├── libopus/               # Opus audio codec library
│   ├── webrtc_apm/            # WebRTC audio processing module
│   └── SystemAudioRecorder/   # System audio recording tool
├── config/                     # Configuration file directory
├── models/                     # Speech model files
├── assets/                     # Static resource files
├── scripts/                    # Auxiliary scripts
├── requirements.txt            # Python dependency package list
└── build.json                  # Build configuration file

Development Environment Setup

# Clone project
git clone https://github.com/huangjunsen0406/py-xiaozhi.git
cd py-xiaozhi

# Install dependencies
pip install -r requirements.txt

# Code formatting
./format_code.sh

# Run program - GUI mode (default)
python main.py

# Run program - CLI mode
python main.py --mode cli

# Specify communication protocol
python main.py --protocol websocket  # WebSocket (default)
python main.py --protocol mqtt       # MQTT protocol

Core Development Patterns

Async First: Use async/await syntax, avoid blocking operations
Error Handling: Complete exception handling and logging
Configuration Management: Use ConfigManager for unified configuration access
Test-Driven: Write unit tests to ensure code quality

Extension Development

Add MCP Tools: Create new tool modules in src/mcp/tools/ directory
Add IoT Devices: Inherit from Thing base class to implement new devices
Add Protocols: Implement Protocol abstract base class
Add Interfaces: Extend BaseDisplay to implement new UI components

State Transition Diagram

                        +----------------+
                        |                |
                        v                |
+------+  Wake/Button  +------------+   |   +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING  |
+------+              +------------+       +------------+
   ^                                            |
   |                                            | Voice Recognition Complete
   |          +------------+                    v
   +--------- |  SPEAKING  | <-----------------+
     Playback +------------+
     Complete

Contribution Guidelines

We welcome issue reports and code contributions. Please ensure you follow these specifications:

Code style complies with PEP8 standards
PR submissions include appropriate tests
Update relevant documentation

Community and Support

Thanks to the Following Open Source Contributors

In no particular order

Xiaoxia zhh827 SmartArduino-Li Honggang HonestQiao vonweller Sun Weigong isamu2025 Rain120 kejily Radio bilibili Jun Cyber Intelligence

Sponsorship Support

Thanks to All Sponsors ❤️

Whether it's API resources, device compatibility testing, or financial support, every contribution makes the project more complete

Project Statistics

License

MIT License

README.en.md