StreamWriter 高级用法

本教程展示了如何使用 torchaudio.io.StreamWriter 来播放音频和视频。

本教程使用硬件设备，因此无法在不同的操作系统之间移植。

本教程在 MacBook Pro（M1，2020）上编写和测试。

本教程需要 FFmpeg 库。有关详细信息，请参阅 FFmpeg 依赖。

TorchAudio 动态加载系统中已安装的兼容 FFmpeg 库。支持的格式类型（媒体格式、编码器、编码器选项等）取决于这些库。

要检查可用的设备、复用器和编码器，您可以使用以下命令
ffmpeg -muxers
ffmpeg -encoders
ffmpeg -devices
ffmpeg -protocols

准备

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

from torchaudio.io import StreamWriter

from torchaudio.utils import download_asset

AUDIO_PATH = download_asset("tutorial-assets/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav")
VIDEO_PATH = download_asset(
    "tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4"
)

设备可用性

StreamWriter 利用 FFmpeg 的 IO 抽象层，将数据写入到媒体设备，如扬声器和图形用户界面。

要将数据写入设备，请在 StreamWriter 的构造函数中提供 format 选项。

不同的操作系统会有不同的设备选项，其可用性取决于 FFmpeg 的实际安装情况。

要检查哪些设备可用，可以使用 ffmpeg -devices 命令。

“audiotoolbox”（扬声器）和“sdl”（视频 GUI）是可用的选项。

$ ffmpeg-devices
...
Devices:
 D. = Demuxing supported
 .E = Muxing supported
 *-
  E audiotoolbox    AudioToolbox output device
 D  avfoundation    AVFoundation input device
 D  lavfi           Libavfilter virtual input device
  E opengl          OpenGL output
  E sdl,sdl2        SDL2 output device

有关哪些设备在哪些操作系统上可用，请查看官方 FFmpeg 文档。https://ffmpeg.org/ffmpeg-devices.html

播放音频

通过提供 format="audiotoolbox" 选项，StreamWriter 将数据写入扬声器设备。

# Prepare sample audio
waveform, sample_rate = torchaudio.load(AUDIO_PATH, channels_first=False, normalize=False)
num_frames, num_channels = waveform.shape

# Configure StreamWriter to write to speaker device
s = StreamWriter(dst="-", format="audiotoolbox")
s.add_audio_stream(sample_rate, num_channels, format="s16")

# Write audio to the device
with s.open():
    for i in range(0, num_frames, 256):
        s.write_audio_chunk(0, waveform[i : i + 256])

写入“audiotoolbox”是一个阻塞操作，但它不会等待音频播放完成。在音频播放期间，设备必须保持打开状态。

以下代码会在音频写入后立即关闭设备，而此时播放尚未完成。添加 time.sleep() 将有助于保持设备打开，直到播放完成。
with s.open():
    s.write_audio_chunk(0, waveform)

播放视频

要播放视频，您可以使用 format="sdl" 或 format="opengl"。同样，您需要一个启用了相应集成的 FFmpeg 版本。可用的设备可以通过 ffmpeg -devices 来检查。

在这里，我们使用 SDL 设备 (https://ffmpeg.org/ffmpeg-devices.html#sdl)。

# note:
#  SDL device does not support specifying frame rate, and it has to
#  match the refresh rate of display.
frame_rate = 120
width, height = 640, 360

为此，我们定义了一个辅助函数，将视频加载任务委托给后台线程并分块处理。

running = True


def video_streamer(path, frames_per_chunk):
    import queue
    import threading

    from torchaudio.io import StreamReader

    q = queue.Queue()

    # Streaming process that runs in background thread
    def _streamer():
        streamer = StreamReader(path)
        streamer.add_basic_video_stream(
            frames_per_chunk, format="rgb24", frame_rate=frame_rate, width=width, height=height
        )
        for (chunk_,) in streamer.stream():
            q.put(chunk_)
            if not running:
                break

    # Start the background thread and fetch chunks
    t = threading.Thread(target=_streamer)
    t.start()
    while running:
        try:
            yield q.get()
        except queue.Empty:
            break
    t.join()

现在我们开始进行视频流传输。按下“Q”键将停止视频。

write_video_chunk 调用将阻塞，直到 SDL 完成视频播放。

# Set output device to SDL
s = StreamWriter("-", format="sdl")

# Configure video stream (RGB24)
s.add_video_stream(frame_rate, width, height, format="rgb24", encoder_format="rgb24")

# Play the video
with s.open():
    for chunk in video_streamer(VIDEO_PATH, frames_per_chunk=256):
        try:
            s.write_video_chunk(0, chunk)
        except RuntimeError:
            running = False
            break

[code]

视频流

目前，我们已经探讨了如何向硬件设备写入数据。此外，还有一些其他方法可用于视频流传输。

RTMP（实时消息传输协议）

使用 RMTP 协议，您可以将媒体流（视频和/或音频）传输到单个客户端。这不需要硬件设备，但需要一个独立的播放器。

要使用 RMTP，请在 StreamWriter 构造函数的 dst 参数中指定协议和路径，然后在打开目标时传递 {"listen": "1"} 选项。

StreamWriter 将监听端口并等待客户端请求视频。open 调用会阻塞，直到收到请求为止。

s = StreamWriter(dst="rtmp://localhost:1935/live/app", format="flv")
s.add_audio_stream(sample_rate=sample_rate, num_channels=num_channels, encoder="aac")
s.add_video_stream(frame_rate=frame_rate, width=width, height=height)

with s.open(option={"listen": "1"}):
    for video_chunk, audio_chunk in generator():
        s.write_audio_chunk(0, audio_chunk)
        s.write_video_chunk(1, video_chunk)

[code]

UDP（用户数据报协议）

使用 UDP，您可以将媒体（视频和/或音频）流式传输到套接字。这不需要硬件设备，但需要一个独立的播放器。

与 RTMP 不同，流媒体和客户端进程是分离的。流媒体进程并不知晓客户端进程的存在。

s = StreamWriter(dst="udp://localhost:48550", format="mpegts")
s.add_audio_stream(sample_rate=sample_rate, num_channels=num_channels, encoder="aac")
s.add_video_stream(frame_rate=frame_rate, width=width, height=height)

with s.open():
    for video_chunk, audio_chunk in generator():
        s.write_audio_chunk(0, audio_chunk)
        s.write_video_chunk(1, video_chunk)

[代码]

标签: torchaudio.io

下载 Python 源代码: streamwriter_advanced.py

下载 Jupyter 笔记本: streamwriter_advanced.ipynb