Google Cast Protocol: Overview
By Romain Picard on Monday 4 July 2016, 10:00 - Permalink
Since the availability of the Chromecast in 2013, and the public release of its SDK in 2014, only few information was published about the protocol used by this device: Google Cast. This article is the first of a series that will explain the technical aspects of this protocol. So let's dive into the wonderful and dark world of Google Cast.
As we will see during all these articles, google-cast is both a very well designed product from the technical side and a completely locked technology. Google wants to keep this technology as closed as possible so that only receivers blessed by Google can be implemented. This is probably because they consider google-cast as one of the most promising ways to get control of the TV screen. All the security features used in google-cast are probably one of the reasons why there are so few information available about it on the internet. One should consider these articles as an attempt to document google-cast.
Chromecast And Google Cast
Before going into the details, let’s start with a reminder on what is Google Cast and some terminology. The Chromecast is a device that is used to run an application that is cast from another device. The Chromecast is a receiver device. As of today there are few receiver devices available: The Chromecast 1, the Chromecast 2, the Chromecast audio, the AndroidTv certified devices, and a couple of Google Cast Audio speakers. The device that is used to cast the content is a sender device. Sender devices can be any Android or iOS device, or a PC running Chrome.
Casting is the action of running/viewing an application of the sender device on the receiver device. This means that when you cast an application, there are two applications running : The sender application is running on the sender device, and the receiver application is running on the receiver device.
This is quite different from other protocols of the same family. In AirPlay only media content can be sent to the receiver. The content is either directly sent to the receiver, either provided as a URI that can be retrieved by the receiver. In both cases no application is running on the receiver. DIAL is more similar to google-cast (In fact the first version of google-cast was just DIAL) but it only allows to discover devices and start applications.
With google-cast, there is no restriction on the type of information that flows between the sender and the receiver applications. Moreover this communication is bi-directional (The receiver can send information to the sender). Google Cast is the name of the protocol that is used to communicate between the sender and the receiver application.
By consequence it means that a google-cast application is distributed between a sender part and a receiver part. The sender part can run on several operating systems while the receiver part is always an html application running on a chrome browser.
Since the application is distributed between two devices, both parts must communicate together. The following figure shows the high level design of this communication:
The sender and receiver devices communicate together on the LAN for two things: receiver device discovery, and receiver device control. The discovery part allows the sender devices to discover all receiver devices present on the LAN. The control part allows the sender to control the receiver and get feedback from the receiver.
The sender and receiver applications are identified with an application id. This application id is common to the sender and receiver applications. The application id is unique to each google-cast application and is provided by google when the application is registered. The list of all registered applications and their information is available in an application registry on the cloud. The receiver device uses this registry to know how to run an application.
To start a receiver application, the sender device sends a start request to the receiver device with a given application id. Once the sender and receiver applications are running, they can communicate via the control channel. However, google-cast is very permissive on this control channel. If they want, the applications can also communicate via any other way such as cloud services specific to the application, or a LAN protocol such as http or websockets.
To do all these communications between the devices and the applications, many protocols are used. The following figure is a more detailed view of the whole stack:
Most of the communications done between the sender and receiver devices is done via a unique tls persistent connection. This connections acts like a venturi funnel to transport messages: Several logical communication channels go through this single connection. The messages on this connections are serialized with protobuf.
The sender device contains a google-cast sender service. As of today, there are 3 implementations of it, on the 3 supported hosts: Android, iOS, and Chrome. On Android this service is implemented in google play services, partially on top of the media router framework. On iOS this is implemented in a library embedded in the sender application. On Chrome, it is part of Chrome’s code. The sender service is responsible for:
- Discovering the receiver devices.
- Connecting to the receiver device selected by the user.
- Authenticating the receiver device.
- Sending the application start request to the receiver device.
- Maintaining communication between the sender application and the receiver device.
The receiver device contains a google-cast receiver service. The receiver service is responsible for:
- Exposing the presence of the device on the LAN.
- Accepting connection requests from sender devices.
- Responding to authentication challenges sent by the sender device.
- Starting the receiver application, identified by an application id.
- Maintaining communication between the receiver application and the sender device.
The receiver service retrieves technical information about the receiver application, via the application registry. These characteristics are stored in the application's manifest. The manifest contains the URL of the application if it is a web application, or its path if it is a native application. Web applications are running in a Chrome browser, and communicate with the google-cast receiver service via a websocket connection.
As seen before, many protocols are used for the whole protocol stack. Most of them are standards, others are proprietary. here is a summary of the different protocols used:
- Discovery: mDNS.
- Persistant connection: TLS.
- Receiver Authentication: public key based challenge.
- Messaging serialization: Protobuf.
- Receiver application: HTML application running in Chrome.
- Receiver application to service communication: WebSocket.
The usage of each of these protocols will be detailed later. But this already shows some complexity that is typical to "modern" communication frameworks : A heavy use of existing standards with some proprietary add-ons, all cooperating to implement a higher level service. AllJoyn - an IoT communication bus - is very similar in its design.
To Be Continued
This is all for today. This introduction article should already provide a good overview of how the google-cast protocol works. I will continue with each step of communication between the sender and the receiver device. Go to the next article to see how discovery is done.