Sometimes it is necessary to perform audio playback from multiple computers. Bringing sound to several rooms is a common application.
For this purpose, there is a Python3 script available via pip, called Wavesync.
Available via GitHub (https://github.com/blaa/WaveSync), or in most distributions via pip.
The script takes in audio via a unix pipe from PulseAudio server, chops it to packets with size usually bound by local network's MTU, attaches header with the time when the sample is supposed to be played on the client, sends it to each client via unicast and to everyone who wants via multicast.
It requires tight clock synchronization between master and slave(s). This is usually done by NTP, but for this a third-party daemon has to be running as a root - not always possible.
It uses PortAudio for audio output, via PyAudio library.
Usually sufficient: pip3 install wavesync
Depends on pyaudio, which is a wrapper for PortAudio library. If it complains about a missing library (.so or .dll), it is this one.
assuming master and NTP server are at 10.0.0.100
in pulseaudio server, in /etc/pulse/default.pa, set the unix socket sink:
load-module module-null-sink sink_name=wsync
load-module module-simple-protocol-unix rate=44100 format=s16le channels=2 record=true source=wsync.monitor socket=/tmp/music.source
restart pulseaudio
pulseaudio -k
pulseaudio -D
start the master
wavesync --tx /tmp/music.source --channel laptop:1111 --channel disp:1111 --channel phone:1111 --latency 1000
ordinary
wavesync --rx --channel 0.0.0.0:1111 --tolerance 200 --ntp=10.0.0.100
callback (test)
wavesync --rx --channel 0.0.0.0:1111 --tolerance 200 --ntp=10.0.0.100 --callback --device-index 0
REQUIRES callback, has weirdly large NTP jitter, so NTP tolerance --ntptol is higher; sink-latency tuned for simultaneous play
wavesync --rx --channel 0.0.0.0:1111 --buffer-size 2000 --tolerance=200 --sink-latency=90 --ntp=10.0.0.100 --ntptol 15 --callback
python C:\Users\user\AppData\Local\Programs\Python\Python38\Scripts\wavesync --rx --channel 0.0.0.0:1111 --tolerance=20 --buffer-size 2000 --ntp=10.0.0.100
All additional options are primarily for --rx mode. NTP server is preferred to run on the --tx machine, in order to have intrinsic synchronization there.
Intrinsic delays (at 44.1 kHz, 2 channels, 16 bits):
Wavesync is a python program, based on libwavesync. Usually lives in /usr/lib/python<version>/site-packages/libwavesync/ directory.
It consists of several files (modified files in bold):
The software was never intended to be run on android phones. It can be done via python interpreter in Termux. It however has several issues.
The packet timestamps are related to absolute time from the local clock. The sync is done in time_machine.py module, by calling datetime.utcnow().timestamp() (returns a float).
The local clock syncing has to be done externally, usually by NTP or PTP.
This introduces dependency on write access to local clock, which eg. on unrooted Android is not always feasible.
In android, there is no direct write access to the system clock. Without root, a NTP query can not change the clock, only can tell the offset. Fortunately, that is all that's needed; a known offset can be added to the local clock result in the time_machine.now() call.
A workaround was done, by running process's own NTP client in its own thread. NTP query in userspace can be done via eg. python ntplib library. Ideally this is done against a local server, on local LAN.
Due to appaling drift of clocks and jitter of queries, fairly frequent queries have to be done; once per 10 seconds was chosen to easily observe the drifts, and for rapid resynchronizations in case another clock-syncing mechanism acts and changes the clock by too much in too short time.
The jitter of the queries was addressed by keeping a local stock of ten last queries, and returning averaged result.
The step change caused by ntpd kicking in desynchronizes the client. This is detected by next ntplib NTP query; an above-limit difference against the average is taken as such step synchro and the buffer is forgotten.
If the NTP server is specified on the commandline, the queries are run in a separate thread, using a javascript-like recursive threading.Timer call.
https://pypi.org/project/ntplib/ - userspace process for ntp queries
import ntplib
c=ntplib.NTPClient()
resp=c.request('0.pool.ntp.org',version=3)
print(resp.offset)
The NTP queries work on other platforms. Tested successfully on Windows 10, and on raspberry pi raspbian, both over wifi.
To facilitate the precision-synced playback, a constant output delay has to be maintained. This is achieved by management of the size of the output queue/buffer.
The original software uses blocking writes, with pulseaudio as a backend.
The playback loop that sends data to the output device relies on stream.get_write_available(). This call returns the space in the device buffer that can be written to immediately, without blocking. This allows manageable delaying of the output and dropping chunks that are coming faster than the device can play them. However, in termux variant of the portaudio library this call always returns, drumrolls please, zero. So the loop was stuck, dropping all the packets and complaining, quote, "Hey, the output is STUCK!".
First attempt was just removing the condition. Voila, the playback started. Aaaaaand, the now non-dropped packets caused the stream to lag compared to the rest of the devices. Ooooops. Back to the drawing board.
Next attempt was rewriting the code from blocking writes to callbacks. Have own queue with depth that can be monitored easily.
The first sub-attempt failed. The callback happened once, then it died. Adding a conditional stream.start_stream() did not work. Adding stream.stop_stream() before the start stream made it work but AWFULLY choppy.
After the first callback, both stream.is_active() and stream.is_stopped() gave False. Hint hint...
The net search said nothing. The documentation was silent about this symptom. Turned out that the length of the data block sent in the callback has to match the size specified in the frames_per_buffer call. OOPS. Constant was changed from self.buffer_size (8192) for the blocking-writes buffer to 367 (number of frames per packets - 2 channels, 2 bytes per sample, 1468 data bytes per packet).
A FIFO queue had to be added; the raw sending was jittery and the callback was picking up the same block twice sometimes. As each packet is about 8 milliseconds of sound, even few levels of queue quickly add to noticeable delay. A Python "queue" library was chosen for this task, level was set to 3 packets in queue and aggressive probabilistic dropping afterwards, and the play callback throws out every other chunk if the queue length exceeds a limit. This got the playback to manageable performance.
Frequent short drop-outs persist. Uneven packet delivery and some out-of-order packets are suspected. TODO, queue sorting by timestamp.
Over longer-term playback, slight delay tends to accumulate. Restart helps. Suspected something within Portaudio or termux's Pulseaudio. Possible workaround involves taking code from play-audio, or (better) a minimal streamer with callbacks.
Disconnect-reconnect the stream at a start of a detected silence period seems to help.
The callback output option can be selected by argument --callback, instead of the default blocking-write output.
The data sometimes come in bursts. The data are also sometimes consumed in bursts. This requires aggressively managing the local queue length, even for the cost of some data loss and artefacts - at usual setting every packet/chunk takes over 8 milliseconds.
The write-to-local-queue call that replaces the write-to-buffer call is increasing the queue length.
As the data are consumed in roughly the rate they come in, the queue increase tends to stick for a long time. If over-the-limit length is detected, every other chunk gets discarded instead of output.
Sometimes packets come in different order. Such packets in an unsorted queue would cause drop-outs. As of now they are detected and discarded.
On termux, the blocking call doesn't work at all. The callback was written to address exactly this.
Callback mode can suffer from underruns.
The playback can be a little choppy/uneven. This happens on both android and raspi (less on the latter).
On raspi, on the jack connector (no HDMI), the callback playback is uselessly choppy with the default output. --device-index 0 helps greatly.
On windows 10, the callback mode behaves surprisingly well. May be a good wifi on the test machine.
original version: https://github.com/blaa/WaveSync
for review:
for case: