Why XMPP is the most relevant messaging protocol for IoT and home automation platforms

Choosing a protocol to communicate between a platform and some clients is not an easy task. Depending on the type services that are exposed, the features of the clients, and the operating system running on the clients, there are many choices available. Still when you look at all existing protocols and standards, you will probably end up to the conclusion than none of them is exactly matching your needs. Then you either have to develop your own solution or use the one that has most of the features you need.

Let me start right know with the conclusion of this article : Developing your own solution is the bad choice, unless you are developing a messaging protocol. Otherwise you probably want to spend as much time as possible developing your service, and as less as possible writing tools and utility code. This is my current situation with my smart home solution : I have a lot of ideas that I want to prototype and I cannot spend weeks to develop yet another messaging protocol. So how can we make the right choice ? As usual by following these steps and answering each of them:

What you want to do ?
What are the features you need ?
What are the available solutions ?
What is the most appropriate one ?

This article details all these steps for a home automation system and edLeak as a distributed system.

Use Cases

The smart home ecosystem. I already described it. It is composed of 4 types of devices, 3 of them being connected via internet. The mobile device and the concentrator are connected to the cloud platform. The protocol that links them must be able to connect several mobile devices to several concentrator devices that were previously associated. It should also be possible to connect the mobile device directly to the concentrators. This is required in case the cloud platform is not available or when the mobile device is in the LAN and there is no need to use the platform.

Distributed edLeak. This is something I want to do since some time. Monitoring several processes in edLeak is currently feasible but not obvious since you have to run the http server on a dedicated port for each process. The following architecture would be much better:

With this architecture there is a single server on the system to which each process that is being monitored can register an instance of an edKit service. In this case we could put the broker either on the device or on a dedicated server. This would allow to remote debug a device present anywhere in the world:

Requirements

In order to implement these use-case, we need the following features to be provided by the network protocol:

Almost real time communication. The communication between the devices (e.g. a smartphone sending a command to the concentrator) must be fast. The term « almost real time » is used here because no strict real-time is needed. However from the user perception it must looks like it happens immediately. This probably means that actions must be performed within 200ms, at most within 500ms. More than that will be perceived as unresponsive by a user.

User authentication. Unless the system is running in a trusted network (which is usually the case for edLeak), each user must be authenticated before he can access any service. By extension this means that some user provisioning system is required, probably via another protocol. In the case of IoT and home automation a user is not necessarily a human: It can be a machine such as the concentrator.

Communication encryption. Communication must be encrypted to avoid third parties to monitor them. This is especially important in the smart home use-case where personal information is continuously transmitted.

multi-user. If the service allows several users to communicate together then they must previously be linked together. This allows « friends» to talk together while preventing other people to interact with them.

Remote procedure call (RPC). This is the first communication pattern needed for machine to machine communication. A RPC allows a caller to send a request to a callee. The callee handles the request and send back an answer to the caller. This is typically needed to turn a light on, and confirm that the light is effectively switched on.

Publish subscribe. This is the second communication pattern that is needed. It allows a user to publish an event for a specified topic. This event is then delivered to all other users that previously subscribed to this topic. This is typically needed when the temperature is changing on a sensor and the connected clients need to update their UI while the heat regulation system also monitors these changes. Another use case of this pattern are notifications that must be sent to several clients.

Routing to a user. This is the fact that a request can not only be sent to a service, but also to a specific user. In the case of home automation or edLeak several users will implement the same services. However we want to be able to execute one method of a service on a specific instance of this service: When I want to switch on the light of the sitting room the request must be sent exactly to the user handling this device, not another one.

Peer to peer connection (optional). A peer to peer connection is the fact that two users are directly connected together. The other (and mostly used) way to connect them is by using a « router » between them. This latter solution is usually easier to implement due to NAT, firewalls, and proxies that make peer to peer connections more complex.

HTML5 friendly (optional). As far as possible I try to write the mobile client of the smart home system in HTML/javascript. So the messaging system should be usable directly from javascript running in a browser. In any case it has to be usable from android and iOS, i.e in java and objective-C.

Now that the requirements are exposed and before we choose a protocol, we can continue by exploring the existing solutions on 2 major technical aspects that will help selecting a messaging protocol: Network transport and message serialization.

Network Transports

There are several ways to transport the messages between the clients and the services. Here are the most common ones.

HTTP(S). Until recently this was the obvious choice when providing a network platform. Ajax is based on HTTP requests and was at the origin of the web 2.0. However HTTP is based on request/response handled by a server. While this is fine for applications that just query some information, this becomes problematic when notifications can be received by an application at any time. Solutions like polling, long polling and BOSH exist to implement such use-cases but they are inefficient.

TCP. Using a direct TCP connection is the simplest way to go, yet very efficient. It provides a reliable stream connection between two peers. It is supported by all operating systems. However it cannot be used from an html application.

TLS. If encryption and or authentication is needed, TLS can be added to TCP. Adding TLS support to existing TCP based services can be very easy via proxy softwares like stunnel.

Unix Socket. If the communication is local to the machine, then there is no need for a network transport. If you are using a unix based operating system then unix sockets are the way to go. Many IPC are based on unix sockets : DBUS and wayland are two famous ones. However if you are running an Android based system you will certainly use the Binder instead of a unix socket based IPC. Also KDBUS may become the good choice on linux once available.

Web sockets bring TCP like connections to web browsers. For a long time they were quite hard to use because the specifications changed and it was supported almost only by chrome. Now it is supported by most web browsers and so becomes the only choice when you need bi-directional efficient communication in an HTML application.

Network Message serialization

Now let’s see what are the main solutions to serialize the messages on the wire.

XML. This was the format used by almost everybody in HTML apps 10 years ago. Web services and SOAP are based on it. Its strength is the modularity it allows. However being text based and very verbose, it is inefficient to generate and parse.

JSON has replaced XML in a lot of situations. It is also text based but is much lighter than XML. Most web APIs are now using JSON.

Protobuf. I discovered protobuf while studying the googlecast protocol. I found it really clean not because of its binary format but because it provides nice tools to use it. The paradigm used by protobuf is clearly to me the one to follow : Get a good binary serialization format, and provide the tools needed to use it in as much environments as possible. A protobuf message is written in a dedicated IDL. Then you use a compiler to generate/parse such messages from either C++, Java, or Python.

Msgpack. Last but not least, msgpack has all the benefits of protobuf, but supports even more programming languages. This is clearly the serialization format I currently use when I need such a feature because I know that it will be interoperable with softwares written in almost any programming language.

Binary message serialization is a hot topic these days. Databases now allow to put semi structured data via some binary json variations, and we often see news of new binary message specifications. They almost all compare themselves to protobuf, often claiming that they are faster. For me the right choice is not the faster one, but the one that will allow me to talk with as much other software as possible. It you reach a point where any of these binary formats are your main performance bottleneck then you can consider yourself lucky because you will be processing billions of events per second on a very successful platform.

Existing protocols

XMPP. Originally known as jabber, this protocol evolved from a chat software to a very generic decentralized message protocol standardized by the IETF. The protocol is composed of a core that defines the base elements of a communication, and many extensions (XEPs) that specify other features such as multi user chat or peer to peer communication. XMPP is based on a XML stream flowing on a persistent connection. The transport is based on TCP, or preferably TLS so that the transmission is secured. On top of that it is also possible to use a BOSH connection so that HTTP clients can connect to the server. Some servers also support web socket. With XMPP all communications are done between users. each user is identified with a jabber id (JID). Obviously a user is not necessarily a human, but can be a machine. Two users can communicate with each other only once they accepted to be friends.

zeromq (0mq). This protocol is more a toolbox than a full featured messaging solution. It provides a set of APIs that allows to associate message queues to a socket. It is possible to implement different distributed design patterns by combining sockets and message queues. Zeromq is used in many projects, is very well documented, and has been ported on many programming languages. Zeromq can use TCP or unix socket a a transport layer. Some implementations also support web sockets.

wamp. I discovered this protocol recently while looking for information about Msgpack. This protocol implements RPC and publish subscribe with messages serialized either with JSON or Msgpack. There are client implementations available for many languages, and few servers.

thrift was developed by Facebook and is now hosted by the apache foundation. Thrift is a code generator that can be used to generate RPC code from an IDL. Several transport layers are supported, as well as many languages

mqtt was specified as a machine to machine publish/subscribe protocol. It is standardized by OASIS and several servers and client implementations are available for different languages. However it is not adapted for RPC.

The choice

Based on the requirements and the existing protocols, XMPP is the clear winner. This is the only protocol that provides all required features. Moreover there are many implementations (open source and commercial) so that you can find one adapted to your environment : erlang, java, C/C++, and even lua. The point that made me look at other possible alternatives is the fact that is is based on XML. Since I have an embedded development background, it is quite hard for me to admit that an XML based messaging protocol is the most adapted one ! However I cannot afford to waste some time to implement my own home made protocol that would use binary messages. For the moments I only need to route a bunch of messages per hour so performance is certainly not an issue. Moreover since big companies as google and Facebook used it for several years, I still have some time before scalability becomes an issue. Another point that must be considered with XMPP is that unless you use openfire (edit: or ejabberd) as a server the only way to do XMPP in an HTML application is via BOSH. BOSH is just not viable in real life on mobile connections : latency is way too big !

The key feature of XMPP that is lacking in all other available solutions is multi users messaging. This allows to connect very easily a mobile device to several concentrators without having to write a dedicated routing service on the platform. Moreover there are still very little protocols that support both RPC and pub/sub. Most protocols/software advertised as dedicated to IoT only support pub/sub which is not enough for my use-cases.

As a last note keep in mind that this is my choice only for network communication between more than two users. When I need peer to peer IPC then I currently use msgpack on unix sockets. When I need an IPC message bus then I use DBUS (on linux) or the Binder (on android). And if I one day want to implement a service running on a private network where only communication with the platform is needed, then zeromq or thrift would probably be my choices.

I hope that this tour on existing solutions might help other people to select the protocol adapted to a use-case. Finding and selecting them is time consuming because there are many many technologies available...which is better than having to implement everything yourself.