page =
url = https://blog.cryptographyengineering.com
a few thoughts on cryptographic engineering – some …
This tracing capability has been a major issue in India due to several cases of misinformation content that led to brutal mob attacks . The ostensible goal of the new legislation is to make it possible for police to track down those who originate or disseminate this content. Put simply, what the authorities say they want is a means to identify a piece of content (for example, a video or a meme) that has gone to a large group of people, and then trace the content back to the WhatsApp account that originally sent it.
I don’t plain to weigh in on whether this policy is a good idea or viable on the merits, nor is it in my wheelhouse to say whether the Indian government is being forthright in their reasons for demanding this capability. (I will express a very grave degree of skepticism that this approach will catch any criminal who is remotely malicious and takes steps to cover their tracks.) In this post I mostly want to talk about the technology implications for encrypted messaging services, and what tracing features might mean for end-to-end encrypted systems like WhatsApp.
This turns out to be a big problem for encrypted communication systems like WhatsApp, in which end-to-end encryption protects the confidentiality of content even from the gaze of the service provider. In WhatsApp, all messages (as well as file attachments) are encrypted directly from sender to recipient, using an encryption key that WhatsApp doesn’t possess. With a few engineering caveats,* tracing content in these systems is very difficult.
But difficult is not the same thing as impossible. A recent post by WhatsApp makes the case that tracing is fundamentally impossible to implement securely in an end-to-end encrypted system. While this claim seems intuitively correct, it’s also kind of unsatisfying. After all, “impossible” is a strong word, and it’s highly dependent on which assumptions you’re making. The problem with imprecise claims is that they invite arguments — and indeed WhatsApp’s claim has has already been disputed by some in the field.
The confidentiality of “who sent what” : while the content itself may not be secret, the fact that a given user transmitted a piece of content is still quite sensitive. If I send you a political meme — perhaps the one at right, poking fun at the King of Sweden — then I might not care very much about the secrecy of the meme itself . But I sure might want to hide the fact that I sent it, to avoid retribution by a totalitarian Swedish government.** Proper end-to-end encryption is supposed to protect this sort of expression.
In short: traceability can really screw with the “who sent what” side of content confidentiality. It is a fairly harmless thing for, say, the tyrannical Swedish government to learn that specific memes about the King of Sweden exist. It is very different for them to know that I’ve been sending a lot of them to a specific group of friends.
There are at least two proposals that I’ve seen for adding traceability to E2EE communications schemes, and both start from similar assumptions. They both rely on making changes to the end-users’ client software to ensure that tracing information is stored in “escrow” at the provider every time content is sent from one user to another.
One proposal is academic , and it takes something like the following “strawman” approach:
They will use this key to encrypt a record that contains (A) the content (or a hash of it) and (B) the sender and receiver identities. They will store the encrypted record on WhatsApp’s servers as a kind of “key escrow.” Critically, at this point they will not send WhatsApp the key K .
The sender will transmit the record encryption key K to its recipient, using end-to-end encryption.
When the next user forwards the same content on to another user, it will repeat steps (1-4) and it will also send all the keys generated by previous users.
Now if the police receive a copy of some viral content on an account they control, they will have a list of encryption keys that correspond to everyone in the forwarding chain for that content. They can just go back to WhatsApp with a subpoena, obtain the encrypted records, and use the chain of keys to decrypt them. This will reveal the entire forwarding path back to the originator.
Of course sending thousands of keys along with each forwarded message is kind of a drag, so there are some efficiency optimizations one can use to compress this information. For example, each time a user forwards a message they can store the previous user’s encryption key inside the encrypted record they escrow with WhatsApp. That means if police get one key — corresponding to the last record in a chain — they can decrypt the escrow record, and then they will obtain the key for the previous record in the chain. They can repeat this process until the entire forwarding chain is “unzipped”.
A lazy diagram (at right) shows how this process might work with three participants. Essentially the whole thing is a form of key escrow, with WhatsApp acting as the escrow authority. If the police get included in any chain at all, they (Eve in this diagram) can subpoena WhatsApp to trace the chain back to originator.
Of course, this is a very simple strawman explanation of the ideas: for a more fully-specified (academic) proposal, you can see this paper by Tyagi, Miers and Ristenpart . Not only does it support path traceback, but it also lets you figure out who else the message was forwarded to! The cryptography is a bit more optimized, but the security guarantees are roughly the same.
A second proposal by Dr V. Kamakoti of IIT Madras is far simpler: it essentially requires each person who originates new content into the network (as opposed to forwarding it) to attach a “watermark” to the content that identifies the account ID of the sender. This also assumes a trustworthy WhatsApp client, of course. Presumably that watermark could be encrypted using a key stored at WhatsApp, so this tracing will at least require the provider’s involvement.
Well if you’re ok with the fact that police can determine the identity of every single person who forwarded a piece of viral content regardless of whether they’re not the originator of that content then, I guess, nothing.
That’s the essence of what the Tyagi, Miers and Ristenpart proposal offers, and frankly I’m not particularly ok with it. Even if I accepted the logic that we should have the means to trace “content originators” — the actual justification governments have offered for building systems like this one — I surely would not want to reveal every random user account that happened to forward the content . That seems like a recipe for persecuting innocent people.
The approaches I describe above rely on a critical assumption: that all participants in the system are going to behave honestly — that is, everyone will run the official WhatsApp client, which will contain logic designed to store an escrow record on WhatsApp’s servers. Nobody in this system will try to bypass this system by running an unofficial client, or by hacking their client to disable the escrow logic.
If you’re willing to make such a strong assumption, why bother with the complicated Tyagi, Miers and Ristenpart proposal? Why not just use the Kamakoti proposal: modify your WhatsApp client to add a small “watermark” to each fresh non-forwarded media file. After all: once you’ve assumed that everyone is running an honest client, you can assume that the content originator will be too — can’t you? This approach would still reveal a lot of information to police, but it wouldn’t reveal the identity of every random person who forwarded the content.
My guess is that Tyagi, Miers and Ristenpart have an answer to this that boils down to something like “maybe you can’t trust the originator to run the correct client, but loads of other people will be running it.” To me this invites a much more detailed discussion about what security assumptions you’re making, and how “bad” the bad guys really are.
Notes : * Some messaging systems implement attachment forwarding by passing a pointer to an existing file that is stored on their servers. This is a nice storage optimization, since it avoids the need to make and store a full duplicate copy of each object whenever the user hits “forward”. The downside of this approach is that it makes tracing relatively easy for providers, since they can see exactly which users accessed a given file. Such optimizations are inimical to private systems and really should be avoided.
** All claims about the Swedish government are fictionalized.