Class Notes CPEG 470/670 Fall 2024

Hashing

Here is a site where you can get a random dictionary word, hashed with a sophisticated hashing algorithm, and then recreate it in javascript:

See the Pen wMYbyw by Andy Novocin (@AndyNovo) on CodePen.

Play with that a bit, and let's collectively list properties of the hashing process.

Give yourself a challenge: fork and edit that pen. Replace https://websec.prof.ninja/scripts/sha512_with_answer.php with https://websec.prof.ninja/scripts/sha512.php. Now, using the language of your choice, find the word from https://websec.prof.ninja/dictionary.txt that created the hash challenge you came up with. If you're feeling lazy I have a cousin of what you need in python2 just below this.

Time how long it took you to reverse that hash.

If it costs $5/month to run a single core VM on AWS (EC2) which can hash at about the same rate as your timing, how much money would it take to reverse the hash of a password which is one random word? How about two random words? Three? Four? Five?

XKCD for anything

If your password were randomly chosen from a set of N options how large would N need to be to have it cost an opponent 10 million USD to crack your password?

Google the following hash: 4ce18085d50ac35207a3e167aafb7cfb42ada855a377bb3c19e0cc0ac55f0d72c862deb698ce7f68411ffb916489cb4563c1c4fbaab9c5fde46de9475713cb8a did it have any results? What word was hashed to make that digest?

If Google has indexed your hash digest before, even if it came from a set of N so large that it costs $10 Million to crack, does it still take $10 million to hack?

Take a password from the top 100 most common passwords, and take the sha256 hash of it, then google that. Now add some "random" bytes to the back of it, how much do you need to add before google doesn't recognize it?

That task shows you, by way of google, that hashing is not enough. Here I take a weak password and hash it using SHA256. Next I will add a keyboard cat salt to the end and hash that. Now the next task should convince you that salting helps protect your users from themselves (at least a little).

I'd like to take a second and talk algorithmic complexity, specifically the difference between $\mathcal{O}(n)$ and $\mathcal{O}(\log{}n)$ in terms of intuition:

Here's a table: http://cs.prof.ninja/slides/class3/#7 and the a Scale of the Universe

Rainbow Tables and Salts

Create a database where you have hashed every word in the dictionary using sha256 and sha512. If you use an index how does the search cost change vs no index in reverse hashing now? This idea is the rainbow table. Tools like John the Ripper use this kind of strategy.

SALTING: sha256(password + salt)
What does it gain you in terms of security?
Can you use one salt for the whole company?
Where do you store the salt?

Adapt the login SQL query/ies to now use a hash and a salt. It used to be something like select * from user_auth where userid=:userid and password=:password

Stretching

This is one of my favorite concepts. It is a simple trick that lets you make your hash function take arbitrarily long to compute. That means that you can change the math on your attackers. Instead of 2 million hashes per CPU-second it can be 1 hash per CPU second, or even a fraction of a hash per second. So now you are in control of setting how hard it is for an attacker to break you.

Now we will apply some asymptotic leverage to this situation by "stretching" the passwords. We will cobble together a more super-powered hashing scheme by repeating the hash over and over again.

Here is the pseudo-code:

$$ x_0 := "" $$

for $r$ iterations: $$ x_i := h(x_{i-1} \parallel p \parallel s) $$

$$ K := x_r $$

Here the $ \parallel $ notation means concatenate as strings.

That notation is particularly mathy, so it might seem too intense. That is a knee-jerk reaction from math-anxiety. It's just different and you can treat it like learning the rules to baseball. Here is my simpler explanation:

$x_0 := ""$ means set a variable $x_0$ to the empty string, the $:=$ notation is assignment, so if I wanted to set $w$ to 23 I could write $w := 23$ and you would read it more like a command than an observation.

Then the $r$ iterations. This is saying that we want to loop over many hashes. Let me show you by example:

If $x_0 :=$ "" then $x_1$ will be the hash of "$pass1234SALTHERE$". Suppose that the hash of that is "$abcd$" (it wouldn't be of course but we're explaining here). Then $x_2$ would be the hash of
"$abcdpass1234SALTHERE$" and so on and so on and so on.

Picking an r: The more often you iterate this process the more expensive it is to do a single hash. So for instance with $r$ around 1 million we just increased the brute-force reverse cost to 1 core year even with just 2 random words. The cost of doing the hashing should be as large as our user can stand, normally this about 1 second.

Kata: Salt and stretch a passphrase with specified stretch factor

Test this out: Write a hash function that consumes, password, salt, r where r is the number of iterations/stretches we do. Hand compute the result for a specific password and salt when r=3 then make sure your function's output matches (even the screenshot I have above this task).

In Summary

Passwords still live because they are the cheapest quickest way for developers to authenticate
You should presume that your database will be leaked at some point
Imagine you have a leaked database from a company, if passwords in the clear then all users hacked
So we don't store plaintext but use hashes that are recreatable at each auth session
Hash algorithms are one-way, impossible to reverse on randomized inputs
We can reverse them when humans act human no matter how good the crypto
On reflection this captures the idea of entropy, and reflections on the size of brute force searches
Back to the leaked DB. If everything was hashed in a straightforward way the best attack is a rainbow table, google some hashes
Best prevention of rainbow attacks is to make an individual, randomized, salt for each password
Back to the leaked DB. What about when your CEO reuses passwords and makes simple stuff like pass1234?
We can win the algorithmic complexity-theory battle using stretching to hash over and over in order to increase the computation time
Updating the depth of stretch over time is also important.
This strategy is standardized in bcrypt then scrypt
You should time things and contemplate the near-infinite
With all crypto systems you should analyze the practical cost in dollars of cracking it.

Hashes, Passwords, Salts, Stretching

Hashing

Rainbow Tables and Salts

Stretching

Kata: Salt and stretch a passphrase with specified stretch factor

In Summary

1 Flag

2 Flag