NUBOMEDIA: FP7/2007-2013 GA-610576

NUBOMEDIA and WebRTC Data Channels: multi-sensory multimedia

  • WebRTC Data Channels

One of the objectives of the NUBOMEDIA project is to generalize the notion of multimedia to be more than audio and video. This means that we want applications to be able to exchange data captured from sensors other than cameras and microphones. Given that we wish these sensors to have any kind of nature, this data needs to be opaque, in the sense that we cannot assume anything about its format or semantics.

For providing this type of feature, we need some kind of transport mechanism suitable for communicating that opaque data with the same properties (i.e. low-latency, privacy, security, etc.) than the audiovisual data itself. Creating such transport mechanism is tricky, and requires dealing with a number of complex details. Fortunately, latest tendencies in RTC multimedia are providing a solution which may be suitable for NUBOMEDIA objectives: WebRTC Data Channels. 

WebRTC is a framework promoted by the World Wide Web Consortium (W3C) whose main target is to provide browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. It specifies a protocol that provides support for direct interactive rich communication using audio, video, and data between two peers web-browsers without the need of setting up intermediate servers. WebRTC comprises three main APIs: getUserMedia, PeerConnection and DataChannels. The former defines an API that allows browsers to get access to camera and microphone, the second API is focused in setting up audio and video calls and the last one copes with how data can be shared between two peers.

WebRTC Data Channels is composed by a set of standards that define the mechanisms required for peers to connect to each other in order to interchange opaque arbitrary data. It makes use of the Stream Control Transmission Protocol - SCTP. SCTP is a reliable, general-purpose transport layer protocol for being used on IP networks. It provides stable, ordered delivery of data between two endpoints (much like TCP) and also preserves data message boundaries (like UDP). However, unlike TCP and UDP, SCTP offers such advantages as multi-homing and multi-streaming capabilities. WebRTC Data Channels forces the usage of SCTP encapsulated in DTLS. Details concerning encapsulation of SCTP over DTLS is defined here.

The SCTP was originally defined to run on top of the network protocols IPv4 or IPv6. Because SCTP is used over DTLS, it is agnostic about the protocols being used beneath DTLS so explicit IP addresses can not be used in the SCTP control chunks, as a result, the SCTP association is single homed. In order to provide a NAT traversal solution together with confidentiality, SCTP over DTLS is encapsulated over ICE. In the WebRTC context, the SCTP association will be set up when the two endpoints of the WebRTC PeerConnection agree on opening it, this can be done typically by means of an exchange of SDPs. Details of the signaling plane are out of the scope of WebRTC standards so a signaling server to coordinate the establishment of the data channel between the peers will be required.

The Stream Control Transmission Protocol (SCTP)-Based Media Transport in the Session Description Protocol document describes how SCTP associations can be set up by using the Session Description Protocol (SDP), it also defines the new SDP Media Description protocol identifiers (proto values) used. Nowadays, at the time of this writing, WebRTC Data Channels are supported by Chrome 25, Opera 18 and Firefox 22 and above.

As can be observed from the discussion above, WebRTC Data Channels provide a suitable transport mechanism for implementing NUBOMEDIA multi-sensory mechanism. However, there is a small subtlety that requires further attention: synchronization. At the time of this writing, no initiatives at the IETF RTCWEB WG are forecasted for creating a suitable mechanism capable of providing synchronization between the audio/visual RTP packets and the Data Channels opaque data. Audio and video are commonly synchronized using RTP timestamps, as specified in RFC 3550. However, Data Channel opaque data format does not contain any kind of timestamp in its current definition. Only limited efforts where invested trying to define a mechanism for dynamically mapping Data Channel information to a specific RTP stream identified through an SSRC. Due to this, the only possible way of achieving approximate synchronization is by using directly reception timestamps, but this is far from being useful in may applications (e.g. augmented reality) where sensor data might require to be synchronized to the video at a frame level.

For this reason, a mechanism for providing WebRTC Data Channel to RTP data synchronization may be more than interesting. However, this is a complex issue due to the uncertain nature of the Data Channel opaque data. In principle, one may think that the adding of a timestamp in the Data Channel data protocol, for playing the role of RTP timestamps, might be enough. However, this may not be the case given that the sensor data might not be continuous (as the audio and video are), but only generated at specific instants, which may be separated in time. In that case, the creation of applications involving data synchronized with the audio and video media information may be very sensitive to different delays of the transport protocol (e.g. Data Channel information coming later than video information). This may drive to the need of adding further latencies on the application to guarantee that Data Channel information is not present for a given timestamp or to add specific flags to the RTP packets for indicating the presence of data information associated to them.

In any case, what is certain is that WebRTC Data Channel synchronization with RTP media information is not trivial, but would enable a next generation of applications combining audiovisual content with sensor data. We are looking forward to the state-of-the-art evolutions at the IETF RTCWEB WG.