On Saturday and Sunday at the IETF there always is a Hackathon. This time at IETF-98 members of the MAMI consoritum were working on the implementation of the “connection identifier” in DTLS, a fairly recent proposal Hannes Tschofenig, Nikos Mavrogiannopoulos, and Thomas Fossati brought to the TLS working group.
The problem this proposal addresses is that an end-to-end DTLS session may silently die because an on-path Network Address Translation (NAT) middlebox dropped state after a (relatively) short period of quiescence. There is known trouble with UDP, as transport used by DTLS, that in-network state for this kind of traffic tends to vanish much more quickly than its TCP counterpart. As an example, the default timeout settings in the latest Linux kernel are 5 days for TCP and only 3 minutes for UDP — that is, three orders of magnitude!
Obviously, there is a good reason for that: since UDP is connectionless, layer 4 devices have no way to possibly track a “connection” other than deep inspecting the flow, which is a pretty expensive activity. So it’s simpler for it to leave the onus of proving that a given UDP 4-tuple has an associated connection to the endpoints, by forcing them to regularly move bytes across. This state of affairs is clearly far from ideal for DTLS because, when a timeout happens in the NAT box, the victim endpoints need to re-negotiate a new crypto session context.
While this is generally annoying, it becomes even more nasty in cases where the client is a resource constrained, battery operated IoT device that woke up from its sleep cycle only to find its session doesn’t work anymore… The second nuisance is that the state that is dropped on the NAT box immediately becomes dead state in the server, consuming precious resources in vain. So, one popular workaround is to create synthetic “keep-alive” traffic, for example using the TLS heartbeat extension. This technique is a) not very robust (choosing the right keep-alive clocking depends on many external factors), and b) certainly, not affordable at smaller device scales, where waking up the “thing” to keep the NAT binding happy has the potential to quickly drain its battery.
Our solution – provide a connection identifier! We propose to add a 32-bits blob that does one very simple thing: it decouples the DTLS session from the underlying 4-tuple, making it possible for the endpoints to dispatch incoming UDP traffic to the correct crypto session independently of any change in the underlying UDP address.
Sounds simple, right? Yes, conceptually — apart from the birthday paradox hitting hard at large scales (see https://tools.ietf.org/html/draft-mavrogiannopoulos-tls-cid-00#section-4). What we discovered at the Hackathon is that backporting it to an existing stack (ARM’s mbedTLS) can be more difficult than expected if one needs to maintain API compatibility.
Another practical complication is signaling the wire format change to the receiving endpoint so that it can parse the incoming frame correctly. This is easy when you can make breaking changes (for example, when transitioning from 1.2 to 1.3). Not as much if you have to maintain backwards compatibility with existing and deployed versions of the protocol. We have been discussing a couple of possible solutions around this issue — namely: using the Version field, or moving to an extensible record layer format. We are still undecided on what “the right” approach would be.
Note that there are interesting privacy implications related to using a visible and potentially long-term identifier due to the obvious linkability properties of such a construct. Even if not all the use cases are problematic in this respect, some of them are, and thus we designed (but not yet implemented) a privacy friendly connection identifier based on HMAC-based One-time Password (HOTP) which can be rotated at client’s will at any point in time.
Enough with the babbling! If you read up to this point you will be glad to know the hackathon judges rewarded our herculean effort with the first prize. Yay!