How does live streaming work

live-streaming

In this article, we explain the various aspects of live streaming and the technology behind it. 

Table of contents: 

  • What is live streaming and how it works 
  • Recording the event
  • Encoder
  • Protocols – RTMP, RTSP, SDP and HLS 
  • Ingesting to the media server 
  • Adaptive bitrate streaming
  • Delivering content to the viewers
  • Content delivery network 
  • Decoding 

Live streaming is streaming video to the audience real-time without first storing it. 

1. Livestreaming workflow 

Step 1: 

The beginning of the workflow marks with compressing the raw video and audio files recorded at the event with encoders using a codec. 

An encoder can either be built-in with a camera or a hardware device like “Tricaster”, “Clearcaster” and “TeraDek” or computer software like “OBS studio” and “Wirecast” or even a mobile app. 

An encoder has 2 segments in it. While the above-mentioned codec is the first segment, packaging the compressed data into a container to be transferred to the media server is the second segment. 

Once the codec compresses the video and audio files, the second segment of the encoder packages both of them into a “video container format” to be delivered across the internet. This involves putting together all components of the stream-compressed video and audio data and metadata-into a file format.  

Step 2:

This packaged stream file is transported to a media server which is either a device in the premises or cloud-based application software. Usually, the data is transported to a media server using certain protocols like RTMP, HLS, MPEG-DASH.

Step 3:

Employing the cloud streaming service or streaming server software, the media server takes care of transcoding, transrating, transizing the data making the video to be supported on a wide range of devices.  

Although this could have been done by the encoder itself, it adds more load on the encoder slowing it down. Thus, the compressed main data stream is encoded and transported to the media server. It is in this media server that this received a single stream of data that gets more renditions.  

These renditions are about transcoding the video file to different lower resolutions, transrating the video into different bitrates, or transmuxing into different scalable formats. These renditions are important as it makes the video to be streamed on various devices. 

Step 4: 

Once transcoded, the media server needs to transport it to the viewer device. But, the distance is a big factor because it can create latency in streaming, and put additional load on the media server. This is solved by a network of servers installed globally called content delivery networks (CDN). 

CDN is a network of servers installed all around the globe strategically in order to achieve latency less streaming all over.

Step 5: 

This CDN takes care of delivering the video file to all devices with minimal latency and buffering. The browser or the application on the user device decodes this video file stream allowing you to enjoy the video. 

Along with this workflow, it is important to employ the right tools at each stage. Let’s look deeper into that. 

2. Recording the event

Capturing the video and audio at the event marks the beginning of the streaming. 

Capturing Video

Most of the digital cameras these days range from capturing 8K, 4K, and full HD resolutions with high bitrates. They require HDMI or SDI ports to transfer the captured data as they can handle large amounts of data. Lately, SDI ports have taken over HDMI for video streaming for its high bandwidth capabilities. 

If the production standard is high, a network of multiple cameras can be used to record the event by switching between each camera manually or using a software or a combination of both. 

However, individual live streamers prefer high-end flagship smartphones which provide 4K recording at as high as 60 frames per second. 

If the video stream is more dependent on the streaming speed but not much on the production, IP cameras make the best choice. These cameras use the RTSP protocol that supports less latency streaming.  

Usually, IP cameras are used to video stream conferences or for surveillance as they do not require encoding devices separately as they can directly telecast by live transcoding. 

Capturing Audio

The audio captured during the video stream is transferred to the encoder directly using XLR cables. These cables possess a high sound to noise ratio thus providing very clear and crisp audio. 

3. Encoder

The encoder, as aforementioned, is the second stage of the video stream that directly receives the raw captured audio and video signals. 

The encoder has two aspects to deal with Video encoding using a codec and packaging the file. 

Video encoding using a codec

Video encoding is a process of converting the raw recorded audio and video to a digital content compatible with a wide range of devices. 

This process is very important as the compression should be as efficient as possible so that loss in data should be as minimal as possible with smaller file sizes. 

Codec

Codec is an Acronym for Coding-decoding or Compression-Decompression. While the first step in it, the compression is done in an encoder, the latter one is done by the application on the user device. 

A codec is a software that applies algorithms to tightly compress the data making it suitable for further transmission. There are different codecs for the recorded audio and video information. 

Recommended video codec

Video codecs compress the data so much that there is a loss of data in the process. This happens with every encoder. 

However, there have been constant improvements made to the codec in order to achieve the high quality of the video file. Usually, the quality is compensated for lossy data by high bitrate, high-quality playback at decent file size. 

There are a variety of codecs that exclusively provide high quality or better playback or low latency. 

Currently, the widely supported codec is H.264 or Advanced video coding(AVC). Although its successor H.265 is gaining popularity, it is yet to be widely supported. 

Here are a few video codecs with their benefits and limitations. 

Video codecs

Benefits

Limitations 

H.264/AVC

widely supported; high-quality video 

not best in compression

H.265/HEVC

supports 8K resolution; half the compression than H.264 

longer encoding time; not yet widely accepted

VP9

royalty-free

already obsolete 

VVC

improvements upon H.265 

royalty issues

 

Audio codecs

Just like video codecs, audio codecs have also been getting improvised making the quality higher at better bitrates and smaller file size. 

AAC, the Advanced Audio Coding the most widely adopted codec standard currently overtaking MP3. 

 

Audio codec

Benefits

Limitations

AAC

most widely adopted;  High quality with lesser noise

not the best standard

MP3

widely supported 

not as advanced as AAC

Opus 

open-source; Better quality than AAC 

not widely adopted yet

Vorbis

open-source alternative to MP3 and AAC  

less advanced

 

While compressing the video, it’s important to consider the frame rate, keyframe interval and bit rates for better streaming quality. 

However, the video stream involves transcoding in the media server using owned software or cloud-managed service. 

Once the compression is done, packaging the content and protocols used to ingest into the media server marks the next stage in the video streaming as aforementioned. 

Packaging

Once the codec compresses the video and audio, the encoder packages them into a container format called “Video container format” or wrapper. 

This is the process of bringing the video and audio data together into one file format easy to be transferred to the media server. 

This holds all components of the compressed data-audio and video-along with the metadata.  The most common container formats are .mp4, .wmv and .mov

4. Live streaming protocols

The data packaged in the container is transferred to the media server using a certain protocol-a set of rules governing the data travel between devices. 

RTMP and RTSP protocols

These were the traditional protocols used for video streaming for their low latency. However, they require a dedicated streaming server. 

Although these protocols have minimal latency of 5 seconds or less, they are being overtaken by HTTP based protocol (HLS) for progressive downloads as chunks and adaptive streaming. 

This minimal latency and reliability make it the better one over HLS for data transfer between the encoder and the media server. But, the same can cause buffering and establishing dedicated media servers while delivering to viewers. 

Video streamed with RTMP protocol needs an installed Adobe plugin for the video to be viewed on the device. This was also restricting the number of viewers as the plugin wasn’t built-in for all the browsers. 

SDP Protocol

SDP protocol or Session description protocol is mainly used for webRTC where real-time communication like a video call or IP telephonic call is to be established. 

SDP protocol is embedded with and works along with the Session Initiation Protocol (SIP). while SIP establishes the initiation of the communication between browsers or applications, SDP works in order to transfer the live information between the devices. 

However, this cannot handle large sizes of data and hence the latency is very less. 

HTTP based protocols (HLS)

Developed by Apple, the HLS is about progressive downloads in which data is transferred as chunks. These chunks encapsulate a few seconds lasting video files. By the time this video chunk is viewed, a playback of chunks is kept ready downloaded to be streamed thus minimizing the buffering time. 

However, the latency created in HLS ranges from 10-45 seconds.

Also, HLS has adaptive bitrate streaming. It simply means the video stream gets adapted to the bandwidth of connection allowing the best viewing experience for that bandwidth. 

However, this needs transcoding which is taken care of at the media server. Although the encoder is equipped with transcoding, it adds more to its load slowing it down. 

Moreover, transporting multiple streams of data all the way from the encoder to the media server and to CDN costs more than transferring a single stream of data from the encoder to the media server where transcoding can be done and streamed in multiple renditions.  

Most preferred Protocols

Considering the benefits each protocol brings to the table, most of the software and hardware encoders employ RTMP protocol for data transfer from the encoder to the media server and prefer HLS for transporting data to CDN(Content delivery network) and viewers. 

If the device is an IP camera, RTSP is the preferred protocol. For webRTC and browser, SDP protocol is used along with SIP. 

5. Ingesting to the media server

The data packaged into the container is ingested into the media server following the protocols as mentioned above. 

This media server can actually do a lot with the video file ingested, as mentioned below

 

Transcoding

 

Transcoding is the process in which the media server re-encodes the ingested already encoded file using different codecs. 

The reason for transcoding is that not every device, browser, or application is equipped with decoding the codec used by the encoder. Hence, in transcoding, the codec is changed accordingly to reach a wide range of devices and browsers. 

This is done by decompressing the compressed video and then recompressing using another codec and packaging it into containers. Transcoding by the media server has sub-categories as follows 

  • Transrating

When a video file is compressed in an encoder, the bitrate and the file size of the video file is also chosen. Usually, high bitrate is preferred at a reasonably smaller file size in order to maintain high-quality video content. 

However, varying bandwidth of the internet connection of the viewer can disrupt video streaming. Solving this, in transrating, video files with multiple bitrates are generated thus streaming them in accordance with the varying bandwidth. 

This prevents buffering and allows undisruptive video streaming experience. 

 

  • Transizing

 

As not every device has the same screen resolution and size, the media server transizes the video file into multiple resolutions accommodating the video stream in multiple screens. 

 

Transmuxing

 

In transmuxing, the container format of the compressed video itself is converted to another- transmuxing the container format from .mp4 to .fmp4 can be considered as an example for that. 

Not every device has the capacity to process a high-quality container format. Transmuxing allows such various other devices as well to support the streaming.

As transmuxing involves just the conversion of the container formats, it doesn’t require much processing power and the media server can handle it quite easily. 

However, as most of the devices lately are supporting mp4, the most common format, the range of audience the transmuxing adds has come down. 

These techniques of transcoding, transrating, transizing, and transmuxing add tremendous advantage to video streaming in the form of Adaptive Bitrate Streaming(ABS). Let’s check this out in detail

6. Adaptive Bitrate Streaming

All the varieties of renditions the media server created using transcoding and transmuxing do magic in the video streaming. Having the single received encoded stream to be multi-channeled in multiple renditions is very important for bufferless and uninterrupted streaming. 

The bandwidth of the internet connection of the audience may not remain constant throughout the video streaming. 

If the streaming is restricted to only one quality, bitrates, and file size, any sudden fall in the bandwidth can cause buffering and streaming interruptions for the viewer. Similarly, a sudden rise in the bandwidth doesn’t get proper use of the connection available to have the best quality of the video streamed. 

Adaptive bitrate streaming offers a smart solution for this.

In the ABS, the video being streamed adapts to the varying bandwidth of the internet connected to the device thus outputting the best quality video stream accordingly.

The video is streamed in the form of chunks with each chunk lasting 2-10 seconds. However, as all the renditions of a video stream are always being created – during the video streaming process, as fresh video content is captured and processed – in the server, considering the internet bandwidth available, the device downloads the next chunk of data accordingly. 

When the bandwidth is small, smaller bitrate video is streamed at a decent quality and when higher bandwidth is available, high-quality video at high bitrates is streamed. 

This enables bufferless, undisruptive live video streaming by switching between different renditions available. 

7. Delivering content to the viewers

Delivering content all over the globe with near real-time viewing experience is very important to gain a wide range of audiences.

Having a single media server handling the content delivery can create huge latency and buffering in addition to adding the load on the media server. 

Hence, solving this, a network of delivery content servers are made use of. CDS is a smartly and strategically established network of servers all around the globe. 

8. Content Delivery Network

Solving the internet traffic, latency, and buffering problems, the CDN streams the content – using all the renditions created in the media server – to all over the world reaching a wide range of audiences. 

The advantages CDN offers are: 

 

  • Speed: CDNs are super fast in delivering the content to the viewer device, as the device is closer and as the renditions are available with it. 
  • Security: CDN help from DDoS(Distributed Denial of service) attacks. This also helps for better security from the data breaches. 
  • Viewership: This faster and reliable way of transporting data from a server closer to the viewer provides greater viewership. 
  • Minimal buffering: CDN assures streaming with very minimal buffering as the distance from the viewer is shorter. 
  • Quality: Delivering data from a closer distance to the audience offers better video and audio quality as well. 

9. Decoding on the device

According to the bandwidth of the internet available, the corresponding best video file rendition is streamed on a device following a specific protocol. Most widely HLS is the adopted protocol for its features like ABS and high-quality streaming. 

MPEG-DASH is a promising, and not yet widely adopted protocol that has better advantages over HLS like being codec agnostic and the ability to keep up with second-to-second fluctuations. This protocol has features to overtake HLS. 

Once the video stream reaches the application or browser on the user’s device, that platform itself decodes the video file and outputs it on the device’s screen. 

All the aspects involved starting from recording the best video and audio until the ABS and CDN are extremely important in order to achieve a minimal latency and buffering less video stream. 

If taken care of each of these aspects, your video stream could be so qualitative that it can gain a large number of audience all over the globe. 

 

Krishna Kishore

Krishna Kishore