liuq 95d491e160 更新 4 tháng trước cách đây
..
linux 95d491e160 更新 4 tháng trước cách đây
macos 95d491e160 更新 4 tháng trước cách đây
windows 95d491e160 更新 4 tháng trước cách đây
README.md 95d491e160 更新 4 tháng trước cách đây
__init__.py 95d491e160 更新 4 tháng trước cách đây

README.md

WebRTC Audio Processing Python Wrapper

Python ctypes bindings for WebRTC Audio Processing Module, providing echo cancellation, noise suppression, automatic gain control, and other audio processing features.

Features

  • Echo Cancellation (AEC): Removes acoustic echo from audio signals
  • Noise Suppression: Reduces background noise
  • Automatic Gain Control (AGC): Maintains consistent audio levels
  • High-pass Filter: Removes low-frequency noise
  • Transient Suppression: Reduces keyboard clicks and other transient sounds
  • Platform Support: Linux, macOS, and Windows with automatic library loading

Installation

The wrapper automatically detects your platform and loads the appropriate native library:

  • Linux: linux/{arch}/libwebrtc_apm.so
  • macOS: macos/{arch}/libwebrtc_apm.dylib
  • Windows: windows/{arch}/libwebrtc_apm.dll

Supported architectures: x64, arm64, x86

Quick Start

from webrtc_apm import WebRTCAudioProcessing, create_default_config
import ctypes
import numpy as np

# Initialize audio processing
apm = WebRTCAudioProcessing()

# Configure with echo cancellation and noise suppression
config = create_default_config()
config.echo.enabled = True
config.noise_suppress.enabled = True
config.high_pass.enabled = True

# Apply configuration
apm.apply_config(config)

# Create stream configurations for 16kHz mono audio
sample_rate = 16000
num_channels = 1
capture_config = apm.create_stream_config(sample_rate, num_channels)
render_config = apm.create_stream_config(sample_rate, num_channels)

# Set echo delay
apm.set_stream_delay_ms(50)

# Process audio frames
frame_size = 160  # 10ms at 16kHz
capture_audio = np.random.randint(-1000, 1000, frame_size, dtype=np.int16)
render_audio = np.random.randint(-500, 500, frame_size, dtype=np.int16)

# Convert to ctypes arrays
capture_buffer = (ctypes.c_short * frame_size)(*capture_audio)
render_buffer = (ctypes.c_short * frame_size)(*render_audio)
processed_capture = (ctypes.c_short * frame_size)()
processed_render = (ctypes.c_short * frame_size)()

# Process render stream (echo reference)
apm.process_reverse_stream(render_buffer, render_config, render_config, processed_render)

# Process capture stream (apply processing)
apm.process_stream(capture_buffer, capture_config, capture_config, processed_capture)

# Clean up
apm.destroy_stream_config(capture_config)
apm.destroy_stream_config(render_config)

Configuration Options

Echo Cancellation

config.echo.enabled = True
config.echo.mobile_mode = False  # Use full AEC (not mobile version)
config.echo.enforce_high_pass_filtering = True

Noise Suppression

config.noise_suppress.enabled = True
config.noise_suppress.noise_level = 2  # 0=Low, 1=Moderate, 2=High, 3=VeryHigh

Automatic Gain Control (AGC1)

config.gain_control1.enabled = True
config.gain_control1.controller_mode = 0  # 0=AdaptiveAnalog, 1=AdaptiveDigital, 2=FixedDigital
config.gain_control1.target_level_dbfs = 3
config.gain_control1.compression_gain_db = 9
config.gain_control1.enable_limiter = True

High-pass Filter

config.high_pass.enabled = True
config.high_pass.apply_in_full_band = True

API Reference

Classes

WebRTCAudioProcessing

Main audio processing class.

Methods:

  • create_stream_config(sample_rate, num_channels) - Create stream configuration
  • destroy_stream_config(config_handle) - Destroy stream configuration
  • apply_config(config) - Apply processing configuration
  • process_stream(src, src_config, dest_config, dest) - Process capture audio
  • process_reverse_stream(src, src_config, dest_config, dest) - Process render audio
  • set_stream_delay_ms(delay_ms) - Set echo delay in milliseconds

Config

Configuration structure with all processing options.

Functions

create_default_config()

Returns a Config object with sensible default values.

Enums

  • DownmixMethod: How to convert multi-channel to mono
  • NoiseSuppressionLevel: Noise suppression intensity
  • GainController1Mode: AGC operating mode
  • ClippingPredictorMode: Clipping prediction algorithm

Examples

See example.py for comprehensive usage examples including:

  • Basic echo cancellation and noise suppression
  • AGC configuration
  • Full processing pipeline setup
  • Audio frame processing loop

Audio Format Requirements

  • Sample Rate: 8000, 16000, 32000, or 48000 Hz
  • Channels: 1 (mono) or 2 (stereo)
  • Sample Format: 16-bit signed integer (int16)
  • Frame Size: 10ms worth of samples (e.g., 160 samples at 16kHz)

Error Handling

All processing functions return status codes:

  • 0: Success
  • Non-zero: Error occurred
result = apm.process_stream(src, src_config, dest_config, dest)
if result != 0:
    print(f"Processing failed with code: {result}")

Performance Notes

  • Process audio in 10ms frames for optimal performance
  • Reuse stream configurations when possible
  • Call process_reverse_stream() before process_stream() for best echo cancellation
  • Set appropriate stream delay based on your audio system latency

Platform-Specific Notes

Linux

  • Requires the shared library libwebrtc_apm.so
  • Works on x64 and ARM64 architectures

macOS

  • Uses dynamic library libwebrtc_apm.dylib
  • Supports both Intel (x64) and Apple Silicon (arm64)

Windows

  • Requires libwebrtc_apm.dll
  • Supports x64 and x86 architectures
  • May need Visual C++ Redistributable

Troubleshooting

Library not found: Ensure the native library is in the correct platform subdirectory.

Processing errors: Check that audio format matches stream configuration (sample rate, channels).

Poor echo cancellation: Verify render stream is processed before capture stream and stream delay is set correctly.

High CPU usage: Use 10ms frames and appropriate sample rates (16kHz recommended for voice).