page = a few thoughts on cryptographic engineering – some …
url = https://blog.cryptographyengineering.com
A first thing to keep in mind is that content tracing is not really a natural feature for messaging system. That is, tracing content poses a challenge even for completely unencrypted messaging apps: the kind where service providers can see all data traveling through the system. Tracing back to a content originator requires that the provider must be able to identify a file received at some end-user’s account, and then “chase” the content backwards through time through each user account that forwarded it. Even this description doesn’t quite respect the difficulty of the problem: not every user literally hits a “forward” button to send content onward. Many will save and re-upload a file, which breaks the forwarding chain and can even produce a slightly different file — thanks to the magic of digital compression.
From the user’s side, E2EE systems are supposed to maintain the confidentiality of user communications. Confidentiality is a broad term and can mean a lot of things. In this case it has two specific flavors that are relevant, with names that I just made up now:
I wanted to illustrate this post with memes about the Swedish monarchy. Unfortunately, it turns out that Swedish Monarchy memes basically suck.
The confidentiality of “who sent what” : while the content itself may not be secret anymore, the fact that a given user transmitted a piece of content is still quite sensitive. If I send you a political meme — perhaps the one at right, poking fun at the King of Sweden — then I might not care very much about the secrecy of the meme itself . But I sure might want to hide the fact that I sent it to you, to avoid retribution by the pro-monarchy Swedish government. Proper end-to-end encryption is supposed to protect this sort of expression.
In short: traceability systems don’t really harm the confidentiality of the “content” itself, since viral memes themselves aren’t usually a big secret. What it does impact is the “who sent what” side of confidentiality. It is a fairly harmless thing for, say, the tyrannical Swedish government to learn that specific memes about the King of Sweden exist. It is very different for them to know that I’ve been sending a lot of them to a specific group of friends.
Finally I need to clarify one more thing, since discussions with colleagues have made me realize that it is not obvious. Information revealed about “who sent what” in an E2E system is not the same as metadata. I feel stupid having to point this out, but metadata (information about data that we can’t easily hide from providers, such as the list of contacts you’ve communicated with) is a very different thing. WhatsApp might inevitably learn that I texted 500 people last month because they delivered my (encrypted) messages. They still shouldn’t learn that any of my messages are making fun of the Swedish monarchy.
While confidentiality and traceability may seem like they’re in conflict, it’s important to point out that some forms of tracing can be implemented in a non-coercive way that does not inherently violate confidentiality. For example, imagine Alice originates a meme, and this meme subsequently makes its way to police officer Eve via the following forwarding path:
Provided that Bob, Charlie and Dave are willing to cooperate with the police, then Eve can use shoe-leather detective work to trace the content backwards towards Alice. After all: each participant (1) is an authorized recipient of the data and (2) knows who they received the content from. Nobody is “breaking” E2E if they perform this sort of cooperative tracing: it’s just people sharing information they’re already entitled to have.
By passing these laws, police are tacitly admitting that voluntary content tracing is not sufficient to met their investigative needs. This implies that when police try to follow a chain like the one shown above, they’re running into people who are either (1) unwilling to share this information in a timely way, or (2) simply don’t have the information anymore — maybe because they deleted the messages or lost their phone.
The prevalance of uncooperative nodes in the above graph makes it virtually impossible for cooperative tracing to find the originator. It seems obvious that real-world situations like this will make voluntary tracing very difficult to achieve.
This brings us to the central challenge of all content tracing proposals so far: to make tracing possible, a tracing system needs to turn every WhatsApp user into a cooperative green circle — regardless of whether users actually want to cooperate with police. Moreover, to prevent users from losing their phones and/or going offline, the system will need to force users to place the necessary tracing information into escrow as soon as they receive content, so it will remain available even if users leave the network or lose their phones.
Both proposals take something like the following “strawman” approach:
Each time someone sends content to another user, they will generate some fresh encryption key K .
Frankly I’m not particularly ok with that. Even if I was willing to bend to the logic that we should have the means to trace “content originators” — the actual justification governments have offered for building systems like this one — I surely would not want to reveal every random user account that happened to forward the content . That information just seems like a recipe for oppression.
The approach I described above relies on a critical assumption: that all participants in the system are going to behave honestly — that is, everyone will run the official WhatsApp client, which will contain logic designed to store an escrow record on WhatsApp’s servers. Nobody in this system will try to bypass this system by running an unofficial client, or by hacking their client to disable the escrow logic.
If you’re willing to make such a strong assumption, why bother building such a complicated system? Why not modify your WhatsApp client to add a small “watermark” to each fresh non-forwarded media file, a watermark that identifies the account ID of the original sender? (If you’re worried about confidentiality, you could encrypt this using a key held by WhatsApp.) After all: once you’ve assumed that everyone is running an honest client, you can assume that the content originator will be too — can’t you? This approach would still reveal a lot of information to police, but it wouldn’t reveal the identity of every random person who forwarded the content.
I’m sure there is some convenient answer to this, but my suspicion is that it once you try to explore the question deeply, the answer is going to make a hash of your very simple security assumptions. That’s why life is complicated.
If you read this far to answer the (rarified) question of how traceability could work and whether it breaks E2E encryption, then you can stop here. The rest of this post is not about that. It’s just me alienating a whole bunch of my academic peers.
Here is what I want to say to them.
The debate around key escrow and law enforcement surveillance is a very hard one. People have a lot of opinions about whether this work is “helping the good guys” or “helping the bad guys”, i.e., whether it’s about helping police find criminals, or whether it’s going to build the infrastructure for authoritarianism and spying. I don’t know the answer to this. I suppose the answer depends to some extent on the integrity of the government(s) that are implementing them. I have opinions, but I don’t expect all of my colleagues to share them.
What I would ask my colleagues to think hard about is the following:
When you propose (or review a paper that proposes) a new “lawful access” system, is it solving the hard problems, or is it punting on the hard problems and solving only the easy ones?
Because at the end of the day, building systems that violate the confidentiality of E2E encryption is a relatively easy problem from a scientific perspective. We’ve known how to build key escrow systems from the earliest days of encryption. Building these systems is not interesting, scientifically. It is useful from an engineering perspective, of course — to parties who want to deploy such systems. When respected academics write such papers, it is also politically useful to those same parties.
What is scientifically interesting is whether we can build systems that actually prevent abuse, either by governments that misuse the technology or by criminals who steal keys. We don’t really know to do that very well right now. This is the actual scientific problem of designing law enforcement access systems — not the “access” part, which is relatively easy and mostly consists of engineering. In short, the scientific problem is figuring out how to prevent the wrong type of access.
When I read a paper that builds a sophisticated surveillance system, I expect it to address those abuse problems in a meaningful way. If the paper punts the important problems to subsequent work — if what I get is a paragraph like the one at right — my thinking is that you aren’t solving the right problem. You’re just laying the engineering groundwork for a world I don’t want my kids to live in. I would politely ask you all to stop doing that.
By Matthew Green in backdoors August 1, 2021 August 2, 2021 3,248 Words 1 Comment