WaveSync-shad

Purpose
Installation
Usage
      master/sender
      slave/receiver
            raspi "disp"
            android/termux
            windows 10
Options
      additional
      original
Code
      Terms
      Timing
      Code structure
            encountered locations
Termux issues
      time syncing
            python ntplib
            non-termux environments
      callback-based playback
            issues
            callback mode issue
FILES
TODO

Purpose

Sometimes it is necessary to perform audio playback from multiple computers. Bringing sound to several rooms is a common application.

For this purpose, there is a Python3 script available via pip, called Wavesync.

Available via GitHub (https://github.com/blaa/WaveSync), or in most distributions via pip.

The script takes in audio via a unix pipe from PulseAudio server, chops it to packets with size usually bound by local network's MTU, attaches header with the time when the sample is supposed to be played on the client, sends it to each client via unicast and to everyone who wants via multicast.

It requires tight clock synchronization between master and slave(s). This is usually done by NTP, but for this a third-party daemon has to be running as a root - not always possible.

It uses PortAudio for audio output, via PyAudio library.

Installation

Usually sufficient: pip3 install wavesync

Depends on pyaudio, which is a wrapper for PortAudio library. If it complains about a missing library (.so or .dll), it is this one.

Usage

assuming master and NTP server are at 10.0.0.100

master/sender

in pulseaudio server, in /etc/pulse/default.pa, set the unix socket sink:

load-module module-null-sink sink_name=wsync

 load-module module-simple-protocol-unix rate=44100 format=s16le channels=2 record=true source=wsync.monitor socket=/tmp/music.source

restart pulseaudio

pulseaudio -k

pulseaudio -D

start the master

 wavesync --tx /tmp/music.source --channel laptop:1111 --channel disp:1111 --channel phone:1111 --latency 1000

slave/receiver

raspi "disp"

ordinary

wavesync --rx --channel 0.0.0.0:1111 --tolerance 200 --ntp=10.0.0.100

callback (test)

wavesync --rx --channel 0.0.0.0:1111 --tolerance 200 --ntp=10.0.0.100 --callback --device-index 0

android/termux

REQUIRES callback, has weirdly large NTP jitter, so NTP tolerance --ntptol is higher; sink-latency tuned for simultaneous play

 wavesync --rx --channel 0.0.0.0:1111 --buffer-size 2000 --tolerance=200 --sink-latency=90 --ntp=10.0.0.100 --ntptol 15 --callback

windows 10

 python C:\Users\user\AppData\Local\Programs\Python\Python38\Scripts\wavesync --rx --channel 0.0.0.0:1111 --tolerance=20 --buffer-size 2000 --ntp=10.0.0.100

Options

additional

--ntp <server> - NTP server to annoy with frequent queries
--ntpint <seconds> - NTP query interval, default 10 seconds
--ntptol <msec> - NTP tolerance (below smooth/average, above reset averaging buffer and assume sudden change of base clock)
--ntpdq - doublequery the NTP - issue two queries in quick succession, pick the result with lower delay; the second one can often have a substantially faster response

--callback - set callback playback interface - MUST with android/termux, optional and so far somewhat worse quality elsewhere

All additional options are primarily for --rx mode. NTP server is preferred to run on the --tx machine, in order to have intrinsic synchronization there.

--list - lists available sound devices (for --device-index option), filters data and tries to show them nicely
--listv - lists available sound devices, verbosely, as-is in the dictionaries; should work if --list crashes

original

--tx <socket> - transmitter/sender/master mode, read data from /path/to/socket (must be provided and fed with data by pulseaudio server)

audio:
--latency <msec> - delay added to timestamp ("NOW+latency"), when the packet has to be played; default 1000
--rate <Hz> - samplerate, default 44100
--24bits - select 24bit samples instead of 16
--channels <chan> - number of audio channels, default=2 (ordinary stereo)
--local-play - run also --rx, feed it with locally generated packets
network:
--channel <address:port> - where to send the UDP packet stream; can be repeated multiple times, can be unicast/multicast/broadcast; caution, multicast/broadcast poorly work with wifi
--payload-size <bytes> - UDP payload size (default 1472, equals 1468 bytes of payload, or 367 16bit stereo frames, 8.32 milliseconds at 44.1 kHz); make smaller if neeed
--compress <level> - enable lossless compression, level 1-9; doesn't help much, good for many unicast streams
--no-loop - don't loop multicast packets to self
--ttl <TTL> - time-to-live for multicast packets, default 2
--broadcast - broadcast UDP packets (with broadcast destination address?) - warning, clogs the network

--rx - receiver/client/slave mode

--channel <address:port> - receiving channel, where to bind the listening port; can be 0.0.0.0:port for any local network device
--device-index <number> - number of local output device, see pyaudio/portaudio specs (TODO: list of devices)
--sink-latency <msec> - compensation for local pipe latency, feed packets to the output device at NOW+sendlatency-sinklatency time; use to adjust sync between different devices, or to adjust for speed of sound
--tolerance <msec> - play error tolerance, default 15 msec; increase if there are too many related errors
--buffer-size <frames> - local output buffer in frames, default 8192; decrease if there are certain sync problems

Code

Terms

sample - individual number for the instantaneous value of the amplitude in the channel
frame - several samples at the same time, each for one channel
chunk - a set of frames that fits into one packet

Timing

Intrinsic delays (at 44.1 kHz, 2 channels, 16 bits):

sample - 22.7 microseconds
chunk (367 frames) - 8.32 milliseconds, a sizable amount of time

Code structure

Wavesync is a python program, based on libwavesync. Usually lives in /usr/lib/python<version>/site-packages/libwavesync/ directory.

It consists of several files (modified files in bold):

audio_config.py
chunk_player.py - the important one; takes the individual chunks from the chunk_queue, feeds them to sound device at timestamped times; blocking vs callback modifications go here
chunk_queue.py - deque based chunk queue
cli.py - commandline-based configuration, process loop initialization
cli_args.py - commandline arguments parsing
lib.py - zero-sized file
packetizer.py - transmits packets from input queue
receiver.py - receive packets from UDP socket, decode headers, put into chunk queue
sample_reader.py - reads packets from network, puts them to receive queue
tests.py
time_machine.py - handles the local time queries; libntp enhancements go here
version.py - only version number, 2.1.0 here

encountered locations

/usr/miniconda3/lib/python<version>/site-packages/libwavesync
/usr/lib/python<version>/site-packages/libwavesync/
/data/data/com.termux/files/usr/lib/python<version>/site-packages/libwavesync/
C:\Users\<username>\AppData\Local\Programs\Python\Python<version>\Lib\site-packages\libwavesync

Termux issues

The software was never intended to be run on android phones. It can be done via python interpreter in Termux. It however has several issues.

time syncing

The packet timestamps are related to absolute time from the local clock. The sync is done in time_machine.py module, by calling datetime.utcnow().timestamp() (returns a float).

The local clock syncing has to be done externally, usually by NTP or PTP.

This introduces dependency on write access to local clock, which eg. on unrooted Android is not always feasible.

In android, there is no direct write access to the system clock. Without root, a NTP query can not change the clock, only can tell the offset. Fortunately, that is all that's needed; a known offset can be added to the local clock result in the time_machine.now() call.

A workaround was done, by running process's own NTP client in its own thread. NTP query in userspace can be done via eg. python ntplib library. Ideally this is done against a local server, on local LAN.

Due to appaling drift of clocks and jitter of queries, fairly frequent queries have to be done; once per 10 seconds was chosen to easily observe the drifts, and for rapid resynchronizations in case another clock-syncing mechanism acts and changes the clock by too much in too short time.

The jitter of the queries was addressed by keeping a local stock of ten last queries, and returning averaged result.

The step change caused by ntpd kicking in desynchronizes the client. This is detected by next ntplib NTP query; an above-limit difference against the average is taken as such step synchro and the buffer is forgotten.

If the NTP server is specified on the commandline, the queries are run in a separate thread, using a javascript-like recursive threading.Timer call.

python ntplib

https://pypi.org/project/ntplib/ - userspace process for ntp queries

import ntplib

c=ntplib.NTPClient()

resp=c.request('0.pool.ntp.org',version=3)

print(resp.offset)

non-termux environments

The NTP queries work on other platforms. Tested successfully on Windows 10, and on raspberry pi raspbian, both over wifi.

callback-based playback

To facilitate the precision-synced playback, a constant output delay has to be maintained. This is achieved by management of the size of the output queue/buffer.

The original software uses blocking writes, with pulseaudio as a backend.

The playback loop that sends data to the output device relies on stream.get_write_available(). This call returns the space in the device buffer that can be written to immediately, without blocking. This allows manageable delaying of the output and dropping chunks that are coming faster than the device can play them. However, in termux variant of the portaudio library this call always returns, drumrolls please, zero. So the loop was stuck, dropping all the packets and complaining, quote, "Hey, the output is STUCK!".

First attempt was just removing the condition. Voila, the playback started. Aaaaaand, the now non-dropped packets caused the stream to lag compared to the rest of the devices. Ooooops. Back to the drawing board.

Next attempt was rewriting the code from blocking writes to callbacks. Have own queue with depth that can be monitored easily.

The first sub-attempt failed. The callback happened once, then it died. Adding a conditional stream.start_stream() did not work. Adding stream.stop_stream() before the start stream made it work but AWFULLY choppy.

After the first callback, both stream.is_active() and stream.is_stopped() gave False. Hint hint...

The net search said nothing. The documentation was silent about this symptom. Turned out that the length of the data block sent in the callback has to match the size specified in the frames_per_buffer call. OOPS. Constant was changed from self.buffer_size (8192) for the blocking-writes buffer to 367 (number of frames per packets - 2 channels, 2 bytes per sample, 1468 data bytes per packet).

A FIFO queue had to be added; the raw sending was jittery and the callback was picking up the same block twice sometimes. As each packet is about 8 milliseconds of sound, even few levels of queue quickly add to noticeable delay. A Python "queue" library was chosen for this task, level was set to 3 packets in queue and aggressive probabilistic dropping afterwards, and the play callback throws out every other chunk if the queue length exceeds a limit. This got the playback to manageable performance.

Frequent short drop-outs persist. Uneven packet delivery and some out-of-order packets are suspected. TODO, queue sorting by timestamp.

Over longer-term playback, slight delay tends to accumulate. Restart helps. Suspected something within Portaudio or termux's Pulseaudio. Possible workaround involves taking code from play-audio, or (better) a minimal streamer with callbacks.

Disconnect-reconnect the stream at a start of a detected silence period seems to help.

The callback output option can be selected by argument --callback, instead of the default blocking-write output.

issues

The data sometimes come in bursts. The data are also sometimes consumed in bursts. This requires aggressively managing the local queue length, even for the cost of some data loss and artefacts - at usual setting every packet/chunk takes over 8 milliseconds.

The write-to-local-queue call that replaces the write-to-buffer call is increasing the queue length.

As the data are consumed in roughly the rate they come in, the queue increase tends to stick for a long time. If over-the-limit length is detected, every other chunk gets discarded instead of output.

Sometimes packets come in different order. Such packets in an unsorted queue would cause drop-outs. As of now they are detected and discarded.

On termux, the blocking call doesn't work at all. The callback was written to address exactly this.

Callback mode can suffer from underruns.

callback mode issue

The playback can be a little choppy/uneven. This happens on both android and raspi (less on the latter).

On raspi, on the jack connector (no HDMI), the callback playback is uselessly choppy with the default output. --device-index 0 helps greatly.

On windows 10, the callback mode behaves surprisingly well. May be a good wifi on the test machine.

FILES

original version: https://github.com/blaa/WaveSync

libwavesync-shad.tar.gz - modified source code, without the "wavesync" command (a short python thing in default installation that calls this anyway)
wavesync-shad.patch - patch file, differences against original wavesync v2.1.0
libwavesync/ - files in directory

for review:

for case:

wavesync - the caller command

TODO

PTP, more accurate option; simple to add to time_machine code, tacked to the already running NTP
callback mode queue - find why it is so uneven/choppy/packetlossing; get more familiar with the callback calling scheme
packet reordering - simplest form, check last item in deque when the next one is to be added, if timestamps are not in order swap these two
more testing, on both wired and wireless platforms
on tx side, find out how to select or filter input streams to clients - eg. left channel to one client, right to another, stereo to yet another, combined mono filtered for low frequencies to subwoofer...

If you have any comments or questions about the topic, please let me know here:
Your name:
Your email:
Spambait Leave this empty! Only spambots enter stuff here.
Feedback: