Walgreens was just compromised and an attacker got the master list of email addresses that also included email addresses of people who had opted-out of receiving Walgreens emails. The attacker promptly sent phishing attacks to all of the email addresses in an attempt to elicit private user data.
This brings up an interesting design issue. On one hand you need to store the email addresses of people that have opted out of email, otherwise you may inadvertently re-add them in the future from other sources or users actions, but on the other hand, storing all of these email address is increasing the damage if the list is compromised.
Solution? Hashing. For every email address that is in your "opt-out" list, simply store the hash of the email address instead of the actual email address. When you get a new email address compare the hash against your list of the "opt-out" email addresses. If you have a match, then its an opt-out. Throw away that email address. For new opt-outs, simply add the hash of their email address to the "opt-out" hash store and discard the plain text email address.
This way you get the benefits of not inadvertently re-adding users that have previously opted-out while also ensuring that a compromise won't disclose the huge number of people who really don't want any of your emails anyway.
Note: Don't hash your "opt-in" email addresses or your normal mail functionality won't work at all.
-Michael Coates - @_mwc
Hmm. I suppose that this is the best policy, but I do feel compelled to point out that the hash of an email address isn't really unique. That is, two different strings (which will therefore have different hashes) can still be the same email address.
ReplyDeleteFor example, these addresses will all cause email to be delivered to the same mailbox:
john@example.com
john+walgreens@example.com
John Doe
John "Anonymous" Doe
There are plenty of other ways to vary the address in ways that won't change where it will be delivered. I suppose it's pretty unlikely that the spam software Walgreens was using would correctly use a stored address like 'john@example.com' to also avoid sending mail to 'john+walgreens@example.com' anyway.
This also falls apart if the "opt-out" refers to advertisements sent by Walgreens, but not to features like "Forgot your password".
ReplyDelete@db48x:
ReplyDelete"For example, these addresses will all cause email to be delivered to the same mailbox:"
Well, it doesn't work if you want to exclude "mailboxes". How should one know which email address is forwarded to which mailbox?
Now to the examples you gave:
first of all, only the first two are actual email addresses.
An address of pattern
"Display-name "
can be treated as follows:
Ignoring the display-name is no problem. It has no effect but looking nice at the receivers side.
The domain-part + tld are no problem either, you can bring them to a standard format (e.g. lower casing it) while the local-part is a bit problematic. It's form and what is valid there is per specification up to the receiving server. I'm not sure if you're allowed to generally lowercase the local-part and expect things to work. I guess for all practical purposes it is OK to do it. At least with email addresses of most (all?) big email services it should be OK.
@Colby Russell: I think this posting aims only at cases where email addresses are clearly saved unnecessarily while a viable replacement is available.
I [attempted to] contact the security team at Walgreens to inform them about this post.
ReplyDeleteHopefully it can prove helpful.
@Colby Russell:
ReplyDeleteAs Mic mentioned, the purpose of this post is to provide a mechanism for recording email addresses that wish to "opt-out" without actually storing that clear text email address.
Forgot my password is a different scenario that would not be included in this model. If the forgot my password functionality is designed to email a reset link to the user's email address, then the email address must be stored in a clear text or reversible format. In this case, hashing would not work.
@Caspy7 - Thanks. Hopefully this is helpful to Walgreens or others in similar situations. Walgreens will also need to address the root cause of their compromise and look at their security development lifecycle as a whole to understand what broke down in the process.
@Michael: Actually I use this technique for "lost my password" stuff too. I don't like storing any more data than I need to, and so what I do is I store the (sha256) hash of their email, and if somebody loses their login info then I ask them for it, get the user whose email hash matches that, and do all the lost-password trickery that way. Their email address never hits a disc (or even long-term ram, hopefully) but I can still reset their password with a *fairly* high degree of certainty that I'm not doing it for the wrong person.
ReplyDelete@bwmaister - Very nice. I like that design a lot.
ReplyDeleteMost organizations have bigger email list problems than this - they are keeping email lists in spreadsheets, and other accesible places. In addition, those addresses may be needed for operational purposes - such as notifying some Customers of a breach or recalls on products. Emailing someone for either of these scenarios are not viewed as "Spam" by CAN-SPAM. in addition, you need a record of opt-outs to be CAN-SPAM compliant...using this technique actually makes you non-compliant.
ReplyDeleteIn the end, the best thing to do is use a proper email provider or marketing automation vendor that can properly allow you to setup emailing preferences and opt-outs (and can enforce adding unsubscribe links, etc to ALL your emails). These systems store the date/time and IP address of the computer that opted out for compliance reasons. Finally - these systems generally will aggressively SUPPRESS from sending to unsubscribed users (even so far as preventing sending the same email twice accidentally) by marketing users accidentally.
While opt-outs can be used if hacked into - I would assume this would be just as bad as getting the rest of your list....in the end this is kind of security through obfuscation. The best remedy is to store ALL your email addresses in a secure, audited system that is protected properly.
"Note: Don't hash your "opt-in" email addresses or your normal mail functionality won't work at all. "
ReplyDeleteOh.. so that's why the email keeps bouncing ;-)
>> For example, these addresses will all cause email to be
ReplyDelete>> delivered to the same mailbox:
> Well, it doesn't work if you want to exclude "mailboxes".
> How should one know which email address is forwarded to
> which mailbox?
My point was only that you could end up obtaining two email addresses which have different hashes but deliver to the same person, and that person's opt-out message would end up being applied only to one of those addresses, defeating the purpose.
If the address isn't hashed, then at least you can determine with a high degree of confidence when two addresses are duplicates. Hash them and you can only determine that they are string matches. On the other hand, you could successfully argue that email addresses are far too complicated and that as a result nobody even bothers. Certainly allowing comments in addresses was a little over the top.
Aaron's comment is probably a bigger concern though. Even companies that manage not to fall into the trap of using excel will often fail in hilarious ways. My Dad and I were discussing this last month, and it turns out that the fancy enterprise database software-as-a-service disaster that they use won't let them leave the email field blank in their customer records. It'll also throw exceptions if they try to store an address with no at-sign, or more than one at-sign, or a forward slash, or who knows what other monstrosity a customer might type in. I think I talked him out of replacing all of those "invalid" email addresses with "not@entered.com", but I suspect that someone somewhere has started receiving some extra email recently.