The associated paper  summarises the information revealed by Signal succinctly:
The Signal messenger is primarily focused on user privacy, and thus exposes almost no information about users through the contact discovery service. The only information available about registered users is their ability to receive voice and video calls. It is also possible to retrieve the encrypted profile picture of registered users through a separate API call,if they have set any. However, user name and avatar can only be decrypted if the user has consented to this explicitly for the user requesting the information and has exchanged at least one message with them.
So Signal comes out excellently from this, yet is mentioned in the title. However, the paper does find that Telegram reveals to the world, in real time, exactly how many Telegram users have a particular phone number in their address book...
Can we change the title from the (click baiting) university press release to one which more accurately reflects the content of the paper?
For Telegram, the researchers found that its contact discovery service exposes sensitive information even about owners of phone numbers who are not registered with the service.
For Signal, TFA makes it clear that correlation defeats Signal's privacy measures:
Interestingly, 40% of Signal users, which can be assumed to be more privacy concerned in general, are also using WhatsApp, and every other of those Signal users has a public profile picture on WhatsApp. Tracking such data over time enables attackers to build accurate behavior models. When the data is matched across social networks and public data sources, third parties can also build detailed profiles, for example to scam users.
More privacy-concerned messengers like Signal transfer only short cryptographic hash values of phone numbers or rely on trusted hardware.
However, the research team shows that with new and optimized attack strategies, the low entropy of phone numbers enables attackers to deduce corresponding phone numbers from cryptographic hashes within milliseconds.
It is hard to say how Signal can improve upon these attacks other than to not use phone numbers at all.
I don't think I understand how this is not circular reasoning (can't use UUIDs in place of phone numbers because contact list is comprised of phone numbers instead of UUIDs.) If contacts are not phone numbers, then is there a problem with them living on Signal's servers? Are we back to the complaint about discovery being difficult?
Signal uses phone numbers because it makes discovery easy. Threema, for example, can use phone numbers for discovery but does not require it. Discovery without phone numbers is easy. I see my contacts and scan their Threema QR codes. If I need to contact a friend of a friend, my friend gives me the FoaF's Threema ID.
Because your contact list is something you should backup somewhere (cardav, Google,...), and this is the expected place for all your contact information.
Signal would need to store a second contact list if it was not using the phone contacts. And suddenly you need to backup this second contact list. If every app does that you can forget about the user backing up everything, they simply won't do it and the feature becomes useless. The solution would be for Signal to store it on their server, obviously encrypted. But then you have different privacy issues to take care of: how can you retrieve a user's contact without storing its identity. How do you hide the number of contact they have...
so signal claims to protect my messages yet denies me privacy by insisting on making my contact list public where every other app can see it, just because they believe that most users are to dumb to back up their contacts?
every chat application that i have stores its own contact list. in fact i don't even have any contacts in my general phone contact list, because i don't call or send sms to people. and i don't want any chat contacts in my phone contact list.
i have not tried signal yet, mainly because it is not available on f-droid. but if signal insists on storing its contacts in my general phone list then i won't be able to use it. and that's ignoring the problem with using phonenumbers.
there is no technical problem to store contacts locally. deltachat does that too. deltachat also provides a backup feature to export the local data including contacts and messages so you can restore them on another device.
there is no reason, signal couldn't do the same.
i don't know why this is so unusual. we are having this same argument every time signal's use of phone contacts is brought up. and every time the same claims are being made.
But if Signal only used the phone's contact list, and only stored it locally, and if a user independently backed up her contact list, wouldn't that mean in the case of phone loss, Signal could rebuild its contact list once the user restored her contacts to the new phone? Am I missing something?
I wonder could it be something like how diffie-helman allows a watertight tls connection to form without shared secret. In that case you could base your session on a on some random hash derived from some kind of passphrase which could be provided to later identify the session
Telegram treats every single person on the contact list as your buddy and advertises it when they sign up by default.
e.g. If you had stored a plumber number 10 years ago, you'll receive a notification telling that the plumber is on Telegram now. Of course likewise, if you start using Telegram today everyone who has your contact and uses Telegram will receive the notification; be prepared for some awkward conversations with people whom you have forgotten.
•Telegram's latency seems to be low when compared to WhatsApp(Although part of which could be optimised code, data center proximity should account more and if so how a supposed renegade group of techies with no revenue afford better data center facilities than their Billion$ competitors?).
•Their feature update notifications seems to create a sense of consumer focussed entity when compared to the competitors.
•The bot API has made the platform extensible than others (Messenger restricted several features of their API after Privacy fiasco).
That's all, I don't buy the argument of Telegram USP as security and marketing it for one seems to be disingenuous at best and malicious at worst IMO.
I have spent a great deal of time thinking about contact discovery and how to make it private or infeasible to do at scale.
If a service X knows the mapping between a user id and some useful info it can display (eg the name or photo) then whatever you do to get that user id, you can then display that useful info if it would be shown to any user of the service. Such as Facebook showing the profile pic and name (that’s why the real names policy is DUMB for privacy). So people resort to effectively usernames. This means you can id the user across sites and then later try to scrape info associated with that username across sites.
The solution is to remove all info, including usernames, unless the person has shared it with you (eg friended you and shared some info like a username with friends). Most of us on forums don’t give a crap who answering, just their reputation. For strangers, why have avatars or usernames at all? Why have anything?
Otherwise you will have to rate limit scrapers and stuff like that, playing a cat and mouse game against sybil accounts.
I think Signal's necessity of tying the user to the phone number is where it could be improved. Signal could take a lesson from Wireapp which allows setting up a pseudonym so when connecting to another user I do not automatically have to share my phone number with them. ... e.g. if I want to make my phone number not be a problem, then I need a burner (or get burned). which is another step for the user. Depends on the threat-model if this is an issue I suppose.
Telegram so far never had an independent audit of its crypto or maybe I'm wrong?
I'm pretty sure Moxie knew the upsides and downsides to using telephone numbers - and explicitly chose the alternative that maximised network effects over the one that maximised privacy. I suspect now there's (probably?) a large enough Signal network too self-sustain, and they can (and are) condsider allowing non-phone-number users to exists on their platform.
I'm not sure they made the compromises and decisions the way I would have preferred them, but their e2e secure messenger platform is way more ubiquitous than mine (which I never wrote), so in spite of that, I reckon they've done more to "make the world a better place" than I have...
(I do still get mad everytime Signal tells me "Some random or friend who's phone number you saved sometime in the last decade or so is now using Signal!" I'm 99% certain none of those people knew I was going to see that message when they installed/configured their "super private e2e encrypted messenger app!!!")
This is the exact sort of thing that allows people to think that things like Telegram are acceptable equivalents to Signal instead of disastrously poor imitators. It's a shame the discourse around secure messengers has become so polluted.
In the paper they were still able to cover 100% of US numbers for Signal and discover all of its users, but less than 0.02% for Telegram and discover only 908 of its users due to simple rate limits, how is Signal better at this exactly? On top of that the paper purposely chose unrealistic threat models and assumptions about privacy, as if letting other people know your phone number is somehow acceptable for privacy in the first place (it isn't and never was).
How is user discovery of Telegram at 0.02% worse than Signal at 100%? It isn't like they could possible get it any higher and Telegram's couldn't get much lower. People who know what they are talking about have been critical of Signals use of phone numbers since the start but Signal have always brushed it off as irrelevant.
This is basically a question of "should e2ee services allow users to auto-discover/discover each other or not?"
Whatsapp just has a plaintext metadata mapping of the global social graph and each user's social graph. Signal has every user upload their address book into a secure enclave so that they can at least somewhat plausibly resist a subpoena for a user's social graph. This does not stop a determined attacker from making a list of all phone numbers/usernames on the service and discovering who is using the service (IE the former being an individual's social graph, which is hidden, vs the graph of all users which is discoverable).
I don't think I've ever seen Signal say this, so this opinion is mine and not theirs, but I don't think Signal can actually protect who uses the service, only what they say on the service and who is in their social graph. A determined attacker, even if they didn't have this address book lookup tool, could correlate IP logs and learn a lot if they had an omniscient view of user traffic.
The core question is this: should e2ee systems have any user growth/discovery tools or not? On some level the real question is "does Signal need to grow at all?".
I think the answer is "yes" but that's not particularly grounded in any dogma other than people want to work on growing products.
In summary, I don't think Signal hides who uses the service, only what their users say on the service (and who is in whom's address book). In this way Signal conceals each user's individual social graph but not the total social graph of who is using the service.
Hm. It's weird to dump growth and discovery into one pot, unless growth has a specific meaning there. You need to be able to add new users to a service... otherwise it's kind of empty, which defeats the point of a messaging service.
Discovery I agree is a trade-off of security vs usability / service attractiveness.
I'm fairly certain you could create something very hard if you assume 2 users meet in person once to exchange long identifiers and keys via QR-code scans. Add in burner phones and/or public WIFI and tracking that stops being feasible and you'd rather follow the human.
However, that'd be very inconvenient and maybe impossible for many - and you'd be back at empty. And possible to target entirely.
Signal has always asked for contact reading permission (I'm 99% certain), but it still does not do a good enough job of telling you "Once you grant this permission, we will notify every other Signal user in your contacts list that you are also now a Signal user".
I still get pissed when it does that to friends of mine (and less pissed, but still unhappy when it does it to co workers, ex colleagues, work clients, government employees, taxi drivers, pizza delivers, and all the other random numbers my phone has saved in it's contact list over the last decade or so...)
When I first installed it however many yearss ago that was, I nearly stopped because the first thing it did was ask for access to my contacts. I can't see that request for contacts access changing, so you have to be not remembering it. Then again, I'm on iPhone, so I can't speak to what Android does/doesn't do.
The fact users can become confused of something doesn't affect whether it's a fact or not. An insistence upon truths that contradict the facts works just fine except for the practical implications which are stubbornly unchanged.
During WW2 there was tremendous innovation in the field of electronics and radio. Some way through the war, both sides began fitting relatively small radio transmitters to aircraft, which enables an equipped aircraft to actively transmit. So one obvious idea is to transmit "Hey I'm friendly" and then you know not to send up interceptors.
So there's a nice switch on your bomber aeroplane that activates this fancy new "I'm friendly" transmitter, you are trained to switch it on as you return to base, and the chap fitting it seems damn sure it's important to switch it off when leaving. Which is odd right? I mean, it prevents getting shot down, stands to reason you'd turn it on all the time. And so, despite the urging of those who understood how it works, leaving it switched on was indeed common practice, and commanders would defend their crews for doing this, arguing that the perceived safety of the "Don't shoot me down" transmitter allowed them to press home attacks in conditions where it might otherwise be prudent to withdraw.
Which is funny because of course the reason to switch off the transmitter is that it's a free homing beacon for enemy fighters and anti-aircraft weapons, so in choosing to do this they were actually significantly increasing their danger of death.
I don't suppose anybody knows if Android version of Signal back in the 2014-2016 period asked for contact permission (in a non Android mandated way)? Ie if post 2016 Android Signal app running on pre-marshmellow Android versions does?
Even if it does ask for permission to my contacts, I don't like the use case of it importing all of my contacts. It's all-or-nothing, and if I choose no then the app may not install. Again, I don't really remember giving the app permission to my contacts, so maybe it's my bad, but I would have liked to have been able to choose which contacts to import and not just everyone. I remember this happened after an update, and to continue chatting with my friends, I'm not sure I had much of a choice but to give it access to my contacts.
I don't really trust Signal all that much, but my friends seem to. It's founded by "Moxy Marlinspike" which is a guy with a made-up name, who Twitter hired to fix their security issues - and it looks like they wasted their money on that, so I don't have the most confidence in "Moxy" to really keep my chats private.
OTR solves auto-discover without any leaky central database by adding small amounts of coded whitespace to plain text messages you send which the other side recognizes. The far side can then initiate the OTR handshake.
I think it's important to put this into context. They're stating that a malicious user could crawl public info of other users, thereby building (over time) a behavioral model of those users. The theory that you could protect users from that by hashing phone numbers and using the hash for contact discovery, turns out not to be accurate, because there are few enough phone numbers in existence that you can just brute force the hash.
I do think it's important for people using these kinds of services (and I'm one of them!) to understand their limitations, but I also kinda find this a bit self-evident, if you think about how contact discovery works. There's simply no way around it (unless you stop using phone numbers to exchange contacts). So in the sense that studies like these help educate non-technical users of the technical limitations of services, this is great!
However, to say they "threaten privacy"... That feels like a gross mischaracterization of what's going on here. Every social technology site, app, etc, has this problem, and it's something that could be, to an extent, mitigated for (detection of scanning attempts, rate limiting, etc). Meanwhile, these are the apps that are bringing E2EE to the masses. It feels like missing the forest for the trees.
«It should be possible for privacy-concerned users to provide another form of identifier (e.g. a username or email address, as is the standard for social networks) instead of their phone number. This increases the search space for an attacker and also improves resistance of hashes against reversal. Especially random or user-chosen identifiers with high entropy would offer better protection.»
Threema does this. By default users get an 8-character random identifier. Linking a phone number and/or e-mail address is optional. This way, users can choose their own balance between the usability of contact discovery and the privacy of random identifiers.
All the other techniques are mainly making it harder for attackers, but not impossible. If a user on a 5 year old phone should be able to sync an address book of 2000 contacts in reasonable time, then the calculation of hashes cannot be made all too computationally intensive (e.g. by using intentionally expensive derivation functions like scrypt or argon2). The asymmetry between the weak hardware of a consumer phone and the abundant computation power of a cluster is what makes fighting brute force attacks so difficult.
Granted, the proposed incremental contact discovery using leaky buckets is quite an interesting form of rate limiting. It also has a cost though, namely increased complexity, and thus an increased chance for bugs / malfunction (hurting the user experience) and vulnerabilities (hurting security).
Contact discovery is a difficult balancing act.
(One last comment: While private contact discovery is a difficult problem, securing profile information isn't. The fact that I can grab the public profile picture / information and online status of almost any WhatsApp or Telegram user is inexcusable. Giving the users control over access permissions is easy. Signal does this by encrypting the profile and sharing the key. Threema does this by sharing profile information only using end-to-end encrypted messages, without servers being involved for storage.)
> It should be possible for privacy-concerned users to provide another form of identifier (e.g. a username or email address, as is the standard for social networks) instead of their phone number.
This expands the search space, without actually solving the problem, I think. The problem exposed by the study shows that phone numbers have a small enough search space to be readily enumerable. Adding email addresses and/or usernames just means the same attacker would need to move to well understood JackTheRipper/Hashcat style dictionary attacks.
I think to thwart these types of attacks, every user identifier needs to be something very like a GUID (and a proper long one like 128 bits and a totally random one, not a hash of their phone number or email address).
If you attack email addresses as a dictionary attack instead of brute forcing the entire possible email address space, they’re actually a smaller search space. The paper claims 53 trillion possible (global) phone numbers, and 700 billion mobile numbers. Haveibeenpwned has just recently passed 10 billion email addresses- Troy probably doesn’t have _every_ valid email in existence (yet), but I’d be surprised if it was as low as 1 in 70 (or 1 in 5,300). At least cutting phone numbers down to just a specific country works better in eliminating in needed searches than for email (all those @gmail.com @outlook.com and to a lesser extent @company.com addresses aren’t country specific).
I'm one of those WA + Signal users. There's only a few barriers (besides momentum) now that prevent me from turning friends to Signal (some of them are dumb, few are big). But by far the number one feature is groups. We're at a time where people are concerned about privacy, this should be utilized.
Other than that, Signal needs to become feature rich. There are many features people want that are just pushed aside. Unfortunately Signal is making the shift from only crypto/privacygeeks to mainstream. In crossing that crevasse Signal needs to consider a different set of opinions that previously it could safely ignore. I would leave Signal if it left the "privacy above all else" mentality, but the forums suggest a high group think about what is "a good feature" and what is "a dumb feature" (and how people are going to use it). If it is a highly requested feature, just add it to the list of things to add. You can't ignore it anymore.
(And can we just add a link to the third party sticker website? People seem to care about that and sticker discovery is needlessly difficult. I get asked this frequently and am constantly sending the sticker link. I'm sorry, but the default ones suck and I cannot understand any good reason it works this way)
> (And can we just add a link to the third party sticker website? People seem to care about that and sticker discovery is needlessly difficult. I get asked this frequently and am constantly sending the sticker link. I'm sorry, but the default ones suck and I cannot understand any good reason it works this way)
Presumably the reason is that your own sticker sets should be private by default, which makes it more work to allow optional public sharing of them (which supposedly was not worth delaying the feature for). For example, I have a sticker set of weird pictures of a friend of mine that I like to use, but only with mutual acquaintances.
Btw, I would argue that stickers are the prime example of Signal trying to become feature rich. But of course, there's only so much you can do at the same time. (Though new features appeared to be released more often recently, presumably as a result of their funding infusion a while back.)
> I would argue that stickers are the prime example of Signal trying to become feature rich.
I agree with that and think it is a good example. I know they got a lot of flack for it, even seeing Rachel get downvoted quite a bit for saying this is what Signal needs.
As to the sticker discovery, I am more referring to a link in app that leads you somewhere like here. It is nice that if a friend sends a sticker to you that you can download the entire set. It is nice that you can make your own (which is presumably what you are referring to). But if we can download these stickers without some warning (presumably no danger) then this would fix the endless comments I hear of "Signal doesn't have good stickers like Facebook does." Just seems like extremely low hanging fruit to me.
I mean, it does seem to me like Signal is deliberately courting the mainstream audience, and has been from the start. That's why it uses phone numbers as identity keys, that's why stickers and such, etc.
The phone numbers were an easy way to do identification and create a social graph. They are working on a more privacy focused one, but that is technically much more difficult. I wouldn't call the phone numbers a "courting the mainstream audience." I would stickers, but then that's just a single example that was already mentioned.
I'm not trying to say what is right or wrong. To be clear here.
I've seen many people ask for a bidirectional delete like in WA. The response always comes back to the lines of "just don't make typos." When I've seen arguments akin to a company nuking a company phone the responses are "well that's dumb because someone can run a custom Signal and save all the data anyways (like that has anything to do with the threat model at hand or changes that this is a probabilistic method of security, but at least doesn't guarantee that the holder of a phone can read confidential messages).
I should note that a "compromise" was created where Signal is introducing the feature but there is a time limit. It is not clear to me why this time limit is there other than because some people think it will "trick" people into thinking such things like screenshots exist (I guarantee you every 13 year old boy with TikToc knows how to screenshot or record). Basically the argument is that this feature will "trick" people into thinking that the message no longer can possibly exist anywhere (I don't think many would actually believe this, but yes there are dumb people. Security is never dumb proof).
Another example is the sticker thing. Just make them discoverable.
Additionally for Signal users: It is possible to turn the notification feature off, but if you newly join Signal, every Signal user in your address book will be notified unless they have switched it off.
Fascinating paper. It seems that if you were to have a number of VOIP phone numbers (say a four or five different Google Voice numbers) and used different numbers for different services, it would defeat the correlation attack but you would also need to scramble other personal data (avatar Etc.) in order to prevent that from being used for correlation.
Of course if you have access to the telephone network real time localization service you could do correlation analysis that way.
 Allegedly "LEO Access Only" but operated by people who think $50 is a lot of money.
Exactly. I completely detest how they use this as a way to 'securely verify' real users, but they argue that the only unique cryptic information everyone is likely to know is their phone number. So why not use that to verify?
As you already pointed out there's a gigantic downside which is your privacy. I always keep hearing the occasional person that says that they are not on Facebook (for privacy reasons) but uses WhatsApp. I later congratulated them for signing up to Facebook and they also allowed WhatApp to upload their entire address book to find or search for anyone's number on Facebook who is not on WhatsApp!
Wire, Signal, and Telegram do the same thing but are just as bad for privacy and are disqualified.
As someone who has thought a good deal about contact discovery the mitigation techniques section is actually pretty interesting.
Quicksy.im, an XMPP client but based on phone numbers and with built in contact discovery, I developed ~2 years ago, already does very strict rate limiting, but the paper mentions some other techniques as well that I should probably look at.
Like with websites and password managers, rate limiting works fine when going via the expected auth service. Doesn't help at all when NSA/MSS/Mossad have popped the contact hash database off Whispersystem's backend.
(Admittedly, if that's your threat model, I hope you have enough magic amulet's in the submarine you now live in...)
For one thing hashing means Signal doesn't care that my telephone number is sixteen digits while yours probably isn't. All the hashes are the same size.
In 1975 other users would have cared because that's sixteen digits to painstakingly memorise or copy down somewhere, but that problem went away. Very few people today even notice because who needs telephone numbers?
And it's not true that any finite domain is tractable. The IPv6 address space is large enough, and thus sparse enough that it's basically pointless to try to connect to random addresses. If you pick random 32-bit IP addresses and connect to TCP port 22 a lot of them will answer. Some of them might have a bug you know to exploit. Maybe you can get one thousand answers per hour and one in every ten thousand is vulnerable to your attack, you are now successful twice every day. Whereas if you try this with IPv6 you'll die of old age before you connect successfully let alone find a vulnerable server.
Would you be happy if that's your only argument against a prosecutors case against you?
How much would you bet against there being someone in jail right now convicted on nothing more than having bought a sim/phone with a recycled number that'd previously been used by someone dealing/buying drugs?
No I would not be happy. What I was thinking was: How does who I text help with in person contact tracing. If I text my High School buddy in california how does that help determine who I came in contact with if I were later diagnosed with Covid.
I don't get the relationship between numbers in my phone and the probability I stood in line at a starbucks whith someone else who was infected.
Right. I didn’t think we’d been talking about contact tracing. The linked article was talking about “contact discovery” in the context of contact list scraping by messaging apps (like WhatsApp/Telegram/Signal), not about COVID exposure contact tracing...
Now that we are though, if I were a contact tracer, I’d totally ask to see your text messages - there’s a reasonable chance that while not everybody you texted is someone you came into contact with, there’s also probably a fairly high correlation the other way - if you had met up with someone you quite likely messaged or called them to arrange if. If it was my job to help you remember all the people you’d spent time with in the last 2-3 weeks, I’d definitely like to go through your messages and call logs to remind you about anyone you might’ve forgotten.