I have been using the mobile application of my smart home system for several months, and the latency fear has come true : Connecting to the XMPP server from a mobile network can be very slow. Here is a first attempt to understand why, and how this may be improved.
In order to measure latency, something more accurate than it is slow is needed. As a reminder, here is the architecture of the system:
There are three systems involved, each one running on a different kind of network. The aim is to measure two things:
- The XMPP server login duration.
- The IQ receive/answer duration.
More than measuring these times, the interesting information is how many time is spent on each part of the network.
- A Wiko cink+ running Android 4.1.1.
- An iPhone 5C running iOS 8.4.
The application is running in Chrome on the Android device, and on Safari on the iOS device. Tests are done in two locations in France: Near La Defense in an urban environment where 4G network is available, and in the countryside where only 3G is available. Both devices use the Orange network.
The login action requires a network access up to the XMPP server. However the IQ action sends and receives data up to the concentrator located in the home. Ideally I would like to measure the time spend on each part, but this is quite complex to do with simple tools. The first - and main - issue is that measuring network packets on the mobile network cannot be done without hacks : Wireshark and tcpdump are available only on rooted devices, and they seem to work only on WIFI connections. This just stops any hope for accurate measures on the mobile network.
However since the app is web based, measuring network timings and exchanges can be done with webkit's web inspector. Details on the setup are available on the chrome developer site. This feature is available on Android Chrome, Android WebView and iOS Safari. It allows to watch the communication between the mobile app and the XMPP server.
Another, lower level tool is available in chrome via an internal URL: chrome://net-internals/#sockets. This tool allows to monitor all sockets opened by chrome and see all TCP exchanges. On the top of the page select view live socket, this displays the list of all opened socket. Clicking on a socket displays the details of the TCP or UDP sessions.
Estimating Round Time Trip Count
The connection to the XMPP server involves the following steps :
- Resolve the domain name via DNS
- Establish the TCP connection to the server
- Secure the connection via TLS
- Upgrade the connection to web socket
- Establish the XMPP connection
Each of these steps require several RTT between the client and the server.
DNS: A DNS query to resolve the IP address of a domain name is done with a query message, answered with a query response. So the DNS RTT count is 1.The already resolved DNS addresses are cached locally. Typically the DNS local cache duration is 24 hour on chrome macos x. So only one DNS query per day will be done on a given domain name. An overview of the DNS protocol is available here.
TCP: The TCP three-way handshake consists of a syn, syn/ack, and ack packets. so TCP RTT is 1 since no answer is expected to the last ack. An overview of the handshake is available here:
TLS: The TLS handshake is more complex. A overview of it is available here. Once the TCP handshake is done, 4 TLS messages are sent : 2 hello messages and 2 cipher negotiation messages. This is 2 RTT.An abbreviated TLS handshake is also possible when the client previously connected to the server. In this case the previous session id is used to establish the connection in 1 RTT.
Web-socket: If the connection is done via plain TCP, then the XMPP handshake can start. However if web sockets are used, the established connection to the HTTP server must be upgraded to a web socket one. In order to do this the client sends an HTTP web socket upgrade request, and an upgrade answer is received. More information is available here. This is 1 RTT.
The XMPP handshake involves multiple steps:
- Client sends stream header
- Server sends stream header + features
- Client sends authentication credentials
- Server acknowledges credentials
- Server sends a new stream header
- Client requests a resource
- Server acknowledges the resource
So a XMPP handshake should require only 3 RTT. However, based on chrome network monitoring, the login process requires 6 RTT on my system. This seems to be due to the authentication that requires several exchanges between the client and the server.
BOSH: BOSH was specified before web sockets were available. It aimed to provide a bi-directionnal connection between an HTTP client and a server by using 2 long polling connections. When using BOSH, each time a request/answer is done, a new HTTPS request is needed. So this is really not efficient since TCP and TLS handshakes are needed for each request. Latency is however limited since the connection is established before the client has to send a request.So with BOSH the RTT count is much higher than with TCP/TLS or websockets.
Round Time Trip Count
Based on the previous information, we can consider than the total RTT count for the login is 1 DNS + 1 TCP + 2 TLS + 1 WebSocket + 6 XMPP = 11.
Since an IQ is just a request/answser, its RTT count is obviously 1. However the IQ has to go through the XMPP server that routes it to the concentrator. The concentrator then sends the answer back to the XMPP server that routes it to the test application. So the IQ goes through the mobile and wired networks. Each of them have a different latency. Since I want to measure only the mobile network latency, we need to differentiate the time spent on each network.
An estimation of the RTT time between the server and the concentrator is done by running the test application on the wired network of the concentrator : In this case the application and the concentrator run on the same network. So their RTT time is considered equal. So The RTT time between the server and the concentrator is estimated as half the duration of the IQ.
The wired network IQ duration is measured on a laptop running the test application on chrome, on the same router than the concentrator. The average IQ duration is 120ms on this setup. So the RTT time of this network is 120/2=60ms. So 60ms must be removed from the mobile network IQ measures.
The measures of the test application running on the mobile network are displayed in a scatter-matrix. This should ease the detection of a correlation between times and other parameters:
Only 25 measures were done. This is too small to make accurate conclusions but it already shows interesting results: The first one is than login times are between 2 and 6 seconds, except few outliers. This means that the login time is always long : More than 3 second is an eternity when you just want to close the blinds. Then almost all outliers occurred on the countryside. This may be due to the network that is less reliable. The average login time with WebSocket is just over 3 seconds (not considering the outliers).
The IQ times are much more equally distributed. There is only one outlier, and BOSH seems as efficient as WebSockets. There seem to be two patterns : One with times below 1s and another one with times between 1 and 3 seconds. The average IQ time is 968ms. This means that 968-60= 908ms is spent on the mobile network on average to process an IQ.
Since the login needs 11 RTT, the 3 seconds login time average means that the average RTT time of the login process is 3000/11 = 272ms. On the other side since the IQ needs only 1 RTT, the average RTT time of an IQ is 908ms. So the measures tell that the RTT time for an IQ is three times more than the one of login ! More investigations are needed to fully explain this, but :
- It is highly probable that latency is not constant on mobile networks, especially on 3G ones : The iq/network matrix shows that values are much less distributed on the 4G network than on hsdpa.
- The TCP congestion control algorithm may benefit to login and hurt IQ.
Now that a first set of measures are available, investigations and experiments can be done to improve the situation. The most important point is to remove the outliers since they completely break the user experience. Avoiding BOSH should help a lot. Then I will try a native XMPP client : This will remove the WebSocket RTT, and the socket and sending behavior can be more easily tweaked. Also some tests should be done with the mobile network of other telcos to see if there are some differences.