static_pages/content/posts/webrtc-and-gstreamer.md
2024-08-05 01:07:32 +00:00

138 lines
8.7 KiB
Markdown

---
title: "My Trials with WebRTC and Gstreamer"
draft: false
---
# But why?
The most important part of any software, is the purpose - or at least a goal. If you don't have a goal, just write a
```rust
fn main() {
while true {}
}
```
and call it a day. You made the computer do something!
So what is the goal here? I have a rust program, that uses [Tauri](https://tauri.app) as the front end,
and I have a raspberry pi with a Hailo8 accelerator attached to it, that I need the camera input from.
The Tauri/controller needs to display the video feed from the raspberry pi so that the end-user can
see what the ML model is seeing.
## But why WebRTC?
Well, I'm really just using a web-browser for my frontend, so I need a web-ready video streaming technology
that actually does smart scaling and all the hard stuff I don't want to deal with.
And I thought it would be easier than rolling my own. ~~and I'm unsure about that now, but sunk cost fallacy + learning something
new is pretty compelling~~
## What is WebRTC
A browser-standardized and implemented data communication layer primarily used for peer-to-peer (or p2p) video and audio connections.
What does this mean? You pass some information to the browser with a javascript API, and your video element magically starts receiving
video and audio! Compensating for network status in a way that focuses on real-time video over consistent video. Sounds great!
## An early warning
I thought it would be as easy as "there's my destination browser, try to start a connection", and I was wrong.
WebRTC is an incredibly flexible system. Here's a quote from the [Mozilla Documentation](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/Session_lifetime#information_exchanged_during_signaling)
(right above where this link takes you to):
> It's also worth noting that the channel for performing signaling doesn't even need to be over the network. One peer can output a data object that can be printed out, physically carried (on foot or by carrier pigeon) to another device, entered into that device, and a response then output by that device to be returned on foot, and so forth, until the WebRTC peer connection is open. It'd be very high latency but it could be done.
And when you combine this with gstreamer, too many hours were lost in the making of this blog post.
# Signaling
I'm sure most readers know that the internet is a very large, untamed landscape of legacy systems, and petabytes of information transfer.
I'm sure most readers are also familier with certain issues caused by trying to get computer A to talk to computer B.
And WebRTC basically said "that's a can of worms we aren't going to try to standardize", and gave us all the interfaces to implement that
part ourselves.
What this means, is that you need a 'signaling' server to be able to connect two WebRTC endpoints. This server handles most setup communication
for the WebRTC clients (because this is p2p, there is no computer 'in-charge' of the WebRTC connection) until the clients have established
a connection.
What I'm about to detail is just a summarization of the [Mozilla documention](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/Connectivity) which is
definitely worth a read if you want to do this yourself.
But long-story short, one client gets prepped for starting a connection by
1. enumerating the data streams it wants to transmit
2. creating a description of itself
3. handing the message to the server and saying "send this to the other person please"
"But wait!" you might say. "This is *what* the internet does!" And you would be correct. Except, in this case, setting up this communication-enabling server is an
excersize for the reader.
The server then sends the message to the other client, client B, who takes it, reads it, and:
1. enumerates its own data streams
2. creates a description of itself
3. sends a "yes, I would like to start a WebRTC connection" back to the server
and this continues as the two clients nail down specifics like "What's your IP address?" and "what media formats can your provide? I'll let you know which of those I want"
and the networking classic "Well shoot. You're behind a NAT. Let's figure this out".
So as you might guess, this is where most of my time is going to get spent!
# Gstreamer's Gst-WebRTCSink
For those unfamiliar with Gstreamer, all you really need to understand is that it basically a wrapper for connecting all of the Gstreamer elements.
You take a bunch of Gstreamer elements, and tell Gstreamer to connect them together into a "data pipeline" that happens to be audio and visual data.
(this ignores pipeline management and clock timing, and event buses it actually does)
There are components for taking video and video from webcams. There are components for changing the framerate and resolution. You can apply audio and visual effects in real time!
But most importantly (here) is that there is a plugin for "plug and play" WebRTC serving. It's part of the repository of rust gstreamer elements over at [gst-plugins-rs](https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs).
Specifically the 'net/webrtc' folder.
And it gives a simple usage example that requires `cargo`, `npm`, `npm install webpack`, and obviously the streamer `gst-launch-1.0` utility,
and three terminal windows. This was really my first "wait, I thought this would be easy" moment.
Some of you are probably want me to add comments to this post just so you can say "skill issue". That may be true! I've never done
internet protocol implemention, so most of this seems rather complicated when I just wanted a raspberry pi to stream video to a computer
screen. But it's also a great learning experience! And I will take it as such. By finding as many how-to-guides will get me mostly working.
I learn best with something that I can iterate on and learn the fundementals of over time isntead of front-loading the spec into my brain.
The Gstreamer `webrtcsink` repo-page specifically has it's own signaling server available, with examples and everything! Execpt! Well... You need to know
the IP of said signaling server on the startup of the gstreamer pipeline, which doesn't allow for any of the "you handle message passing" that WebRTC
was designed to provide... For this, you need to create a `Signaler` object that implements an interface as defined in [the example](https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/blob/main/net/webrtc/examples/webrtcsink-custom-signaller/signaller/imp.rs).
So I'm going to start there. Can't be too hard, right? ... right?
The idea is that by implementing it yourself, you can integrate it with existing message passing software, like that websocket I know you are already using (dont' worry,
I'm using one too).
There's just one hiccup. That link that I referenced with an example? It uses relative links to the rest of the workspace it is in, and google, duckduckgo, github, and sourcegraph, all could find
no implmentation of this example in the wild (at least with most of the keywords I was using. I did eventually find [this github repo](https://github.com/Eyevinn/srt-whep) that uses it),
and I spent at least 3 hours just to relized I couldn't use a direct github Cargo import, and instead needed to use the `gst-plugin-webrtc` crate that google couldn't find.
Because the docs.rs page is broken and doesn't build...
#### Gstreamer crate tanget
As a quick aside, I would like to mention that the gstreamer crate system is both kinda neat, but mostly a pain to work with.
There is a module for just about everything that could be modularized. Just for my "take video imput, output it over webrtc", I need these four crates.
```toml
gst-plugin-webrtc = "0.13.0"
gstreamer = { version = "0.23.0", features = ["v1_22"] }
gstreamer-sdp = { version = "0.23.0", features = ["v1_22"] }
gstreamer-webrtc = { version = "0.23.0", features = ["v1_22"] }
```
That's not what my issue is though. My issue is that it took me 2-3 hours to discover most of these crates existed!
Because there is not centralized list of "these are the plugins, their crates, their included features" and such, when google fails, it becomes
almost impossible to unearth them.
I'm not saying this as a critique of the Gstreamer ecosystem, but as someone with trauma.
# Now for the meaty part
So I've finally gotten a compiling [(See here)](https://github.com/Nickiel12/gst-webrtc-example-signaller), now to actually implement the signalling server, right!
well... Uhm, so. Here's the thing. I just need to be able to set up a tauri <video> tag to point to a video source, right? Wouldn't it be nice if there was a nice and easy standard for this so I didn't
have to implement my own WebRTC signalling server? Enter WHIP.
# The rug pull
So, yeah. I'm just going to use [MediaMTX](https://github.com/bluenviron/mediamtx) until that fails, with a gstreamer rstp source.
Bye!