Don't do it: Authentication
Why you will fail building an authentication system, and what you should be doing instead.
Seriously don’t.
I spent almost 3 full years caring for an authentication system. I don’t want to do that again.
How do Auth systems come 👶 into the world?
You have a great idea. You want to build it right now. You sit down and the first thing you think of is how do you retain users. So you build that familiar email and password form. You add another password confirmation field for good measure too (terrible UX). You store the email and password in the database.
Don’t store the password. 🗃️ If I ask you right now — How do you store passwords in computers? — and your answer is not hashed with cryptographic randomness as salt using the Argon2 password hashing function1 running for at least 250ms, with at least 2 threads2 and 512MB3 RAM — step away from the password field.
Answered correctly? You can’t win. If you care about security or privacy 🥷, you might be thinking — just 2 threads and 512MB RAM — what is this, 2004!? We all love doing the right thing, but your cloud ⛈️ bill disagrees.
Say you have a constant stream of 8 user logins per second (be proud!) or 21 million monthly logins. You will need 16 cores doing only password hashing all day long, and about 6GB of RAM. AWS On-Demand price for two c4.2xlarge
instances is $370 monthly. It’s not bad for 21 million monthly logins, but also not cheap.
Users arrive 🐕 unpredictably, and they keep clicking 🖱️ the damn Log in button which takes 1 second to actually log them in. So now instead of 8 logins per second, you have 8 logins per trigger happy user4. Your autoscale nodes are burning hot, and you’re somehow supposed to find more at any cost out of thin air. Not sure if you’ve heard, but getting a new EC2 instance sometimes may take minutes, and those users keep clicking Log in. Log in. Log in. So you do the only thing you can — over-provision. Your bill is now $3,700 monthly for password hashing. That’s an expensive door 🚪.
Your app is so awesome 🎉 tho. So now people hate it5 and they send you bogus login attempts to see if Jeff Bezos will put your name on his little blue rocket 🚀, since your cloud bill says you’re paying for it. To the moon 🌘!
Do instead. 👍 First, just don’t implement that sign-up form by yourself. And no I don’t mean get a library to do it for you. Pick your poison: Firebase Auth (it’s free!), AWS Cognito (almost free), Auth0 (definitely not free), RedHat Keycloak (for the brave), Supabase Auth… or any service or packaged authentication system. You’ll usually get someone else to pay for the password logins, and you’ll also get free admin pages, 2FA / MFA, OAuth, OIDC, SAML, social login and other goodies you won’t ever be able to properly implement yourself.
Make no mistake, Auth software kills businesses. It’s a constant drain 🚽 on the development, maintenance, operation and security budget without adding much or any business value. 📈
But I insist. You’ve been warned and here’s more warnings. ✨
First, just forget about passwords 🔑. Use social login and for the antisocial 🤗 provide a passwordless login by sending a unique, hard-to-guess, single-use and expiring login link or code over email or if you must SMS. Add a thin layer of rate-limiting to prevent sending out too many emails per address. Avoid rate limiting by IP as many users today exit through the same IP address.
JWT — I know you’re gonna use them. You’re gonna do it wrong, and one day you’ll fail so hard it’ll be on the news.
So I dare ask again — What is the expiration time ⏲️ of a JWT?
If you answered anything over 10 minutes, step away from the keyboard ✋⌨️. The whole point of JWT is to use cryptographic verification of the authentication token, which can be done anywhere and independently of a central authority. This means you need to get new tokens every 10 minutes or less, and invalidation (logout) can be forgotten about6. Less is more… And now comes the funnest part.
It’s getting old, I know — How do you verify 🔍 a JWT?
If you answered by using the HS256
and family of algorithms, really step away from the keyboard ✋⌨️. To use the HS*
(HMAC-SHA) family of JWT verification algorithms, you need to define a shared secret — isn’t that an oxymoron 😱 — which you will clumsily log, or commit in Git, or paste in Slack or in a mean Tweet… So you must use public-key cryptography through the RS*
(RSA) and ES*
(Elliptic Curves) algorithms to do it proper justice. And that’s really hard and quite expensive. So just don’t do it yourself7.
I know you did everything wrong. I did too. We all did. 😇 So now what?
It always pays to have something to hold on to when everything comes crashing down. Find that thing for yourself 🍕, and nurture it. Then set a date to kill passwords in your system. Open your calendar and start scheduling 📅.
No new users with passwords after 2022.
Existing users can use an email link login or a password (intentionally second option) through 2023.
Halve the JWT expiration time in the next 6 months, and then halve it again.
Getting rid of
HS256
is a nightmare, so implement a rotation 🚲 mechanism first. Issue new tokens with one key, but keep accepting tokens signed by 2 older keys.Commit to changing the signing key every 3 months (make a repeating calendar entry now), then every month, then every week.
Switch to an authentication service or packaged software 💾. It’ll hurt but it will be worth it.
Gimme more failures. Enjoy 🍸.
You suddenly realize that you can halve the password hashing bill if you implement a Turing test 🦾 a.k.a. CAPTCHA. If you care about humanity, please don’t. Not all of us have good or sometimes any vision 🙈 or hearing 🙉, and is that a truck 🛻 or boat 🛥️?
Rate limiting 🚥 is never a solution8. Ok it may be if you know who your user is — they’ve gone through some authentication. But not before that. If you rate limit by email address / username, both of which are generally public information, how do you know that someone purposefully triggered rate limiting for the user to prevent them from using your app? There’s no way to know. So don’t do rate limiting. You might be able to get away by adding a waiting line for logins, tho.
Proof-of-Work? Unlikely to solve bigger issues, the price is global pollution 🌳.
Single sign-on or login federation 🇺🇳 systems. Don’t implement them by yourself. And when you do, make sure you setup strict redirect check rules. It’s super easy to pass credentials to the bad guys if you don’t check redirect URL hosts properly. Avoid using relative URLs for such redirects. Don’t ask how I know.
Lowercase emails but also don’t. 🤯 So you think email addresses are case insensitive. That’s so insensitive of you. Some commercial email software may distinguish between differently cased variations of email addresses. So lowercase emails for unique identification, but send emails as the user entered the address. I feel your pain 💉.
Pockets burning 💸 and password hashing is killing my whole app. Help! 🆘 There’s only one way to fix ease this, and it’s not particularly pretty. If you do password hashing in any of these technologies: NodeJS, Ruby, Python, PHP, Java, Go, Rust or just about anything else you need to refactor9. If your web app runs on a thread-pool server you have to do password hashing strictly before the request hits your thread pool HTTP server. This is because you want to minimize any waiting in a thread-pool server, as it’s preventing partial execution of other requests, and could induce a latency cascade of death. Password hashing should usually take 1/4s or more, so it’s not ideal that way. You can achieve this by writing a proxy in NodeJS, Go or another event-driven language / framework (Twisted for Python, EventMachine for Ruby, there are tons). This proxy looks up the email in the users database and submits the password to be hashed and compared by a queue-backed password hashing service. This is because event-driven systems can wait on infinitely many things, but they must not use the CPU a lot (which password hashing clearly does). The queue-backed password hashing service takes as many passwords as there are available cores at once, and processes them as fast as possible, returning results to the waiting proxy. Sometimes you can use Lambda or equivalent serverless compute to do this on-demand. Once the email and password combination match, only then forward the login request to your thread-pool login service. For event-driven HTTP servers, like those built on NodeJS or Go, you can skip the proxy part and directly talk to the queue-backed password hasher. You can setup an auto-scaling system based on the queue size now, instead of CPU / RAM utilization, as this has the unfortunate effect of being constantly over-provisioned and totally unpredictable. It’s long-winded, but cheaper this way.
Let me know if you have any more Failures Worth Spreading!
It’s also somewhat OK to use bcrypt or better yet scrypt… but Argon2 is the best we’ve got.
Argon2 is designed to be anti-parallel, so specifying the 2 threads means that it requires two cores to run simultaneously. If you don’t give it full access to two cores, hashing is going to take 2x more instructions on a single core, and you can’t optimize around it. When your passwords eventually leak, this will make it very expensive for attackers to do anything useful with your passwords.
Argon2 is designed to require high-speed RAM. If your system does not have 512MB of RAM free per password to hash, it won’t work. When your passwords eventually leak into the world, this will make it very expensive for attackers to try and use your passwords.
A solution appears: Rate limit on the frontend!
Solution above disappears.
I’ll talk more about authorization in another post.
Let me know if you want me to do a series about this.
I’ll also talk more about this in another post.
Only if it makes financial sense for you.