Using neural networks for password cracking
In this blogpost I'm going to write about how it's possible to use neural networks for password cracking.
TL;DR
It's possible to train a neural network on a dataset of cracked passwords and then use it to predict uncracked passwords.
Idea
Somewhen during the last year I found a fascinating blogpost about recurrent neural networks on hackernews and Karpathy's char-rnn on Github. I was playing around with neural networks before, so I started to think about ways to (ab)use these in a security context. People are already trying to use neural networks as intrusion detection systems or similar defensive security strategies.
But what about the attacker's perspective? Can neural networks somehow be used to facilitate attacks or offensive behaviour?
After finishing Karpathy's article and especially the section about creating new baby names, I was wondering if it could learn patterns from passwords and generate similar ones.
Imagine that you have a large dataset of hashed passwords from a specific website [insert recent hack here]. You run your standard hash cracking tool and wordlist against it and manage to crack a portion of the passwords. Depending on the website's niche, there may be a pattern in the passwords which you don't know. So let's see if a RNN can figure it out and generate new valid passwords for previously uncracked hashes.
Setup
Disclaimer: I'm not a machine learning expert, so I may have done something wrong. I also didn't finish the model training, so this is more of a proof of concept.
I didn't want to waste my time and computation power on hashing and cracking passwords, so I skipped this step and downloaded the rockyou.txt wordlist. It is around 134MB in size and contains around 14M passwords. The notes state Best list available; huge, stolen unencrypted
, but there was no particular reason for choosing this one. I always assumed that wordlists would be free of duplicates, but this assumption didn't hold for this one, so I removed them first:
$> sort rockyou.txt | uniq -u > rockyou.unique.txt
$> du -sb rockyou.*
139921497 rockyou.txt
139842217 rockyou.unique.txt
It's unlikely that cracked passwords are sorted, so I randomized the unique wordlist again:
$> sort -R rockyou.unique.txt > rockyou.random.txt
After that I split it into two parts:
- The training set with the first 250k passwords:
$> head -n 250000 rockyou.random.txt > train.txt
- The rest is the testing set
I reassured me that there are no overlapping entries in both the training and testing dataset:
$> sort train.txt test.txt | uniq -d | wc -l
0
Training neural networks on a CPU isn't recommended anymore, because GPUs are much faster when it comes to parallel number crunching. Unfortunately I don't have a CUDA cluster at home, so I rent an AWS GPU g2.2xlarge spot instance for some hours. The spot instances are really, really cheap and perfect for such experiments. Luckily someone created a community AMI (ami-c79b7eac) with all tools and cuda drivers installed and configured. You'll just need to run the following commands to setup char-nn:
$> cd /tmp/
$> sudo luarocks install nngraph
$> sudo luarocks install optim
$> git clone https://github.com/karpathy/char-rnn
$> cd char-rnn
The last step is to upload the train.txt
file to /tmp/char-rnn/data/pws/input.txt
.
I trained the dataset on three different recurrent neural networks for around 10 epochs:
$> th train.lua -rnn_size 128 -num_layers 3 -eval_val_every 500 -seq_length 50 -data_dir data/pws/ -savefile pws
$> th train.lua -rnn_size 512 -num_layers 3 -eval_val_every 500 -seq_length 100 -data_dir data/pws/ -savefile pws2
$> th train.lua -rnn_size 64 -num_layers 3 -eval_val_every 500 -seq_length 20 -data_dir data/pws/ -savefile pws3
Once in every epoch, I let the network generate output of 25kb in length:
$> th sample.lua -length 25000 cv/lm_pws_epochXY > output/pws-epochXY
That was around 2500 lines (=passwords).
$> wc -l *.txt
2681 pws-0.54.txt
2534 pws1-1.62.txt
2599 pws1-2.16.txt
2571 pws1-3.24-12345.txt
2571 pws1-3.24-pass.txt
2570 pws1-3.24.txt
2498 pws1-4.86.txt
2583 pws1-5.40.txt
2564 pws1-6.48.txt
2566 pws1-7.02.txt
2638 pws1-8.10.txt
2523 pws1-9.18.txt
2416 pws1-10.26-t0.3.txt
2540 pws1-10.26-t0.5.txt
2628 pws1-10.26-t0.8.txt
2567 pws1-10.26.txt
2446 pws2-1.08.txt
2681 pws2-2.16.txt
2634 pws3-0.65.txt
2585 pws3-6.91.txt
2609 pws3-9.72.txt
2552 pws3-12.09.txt
2574 pws3-16.19.txt
2611 pws3-18.13.txt
The files have the following format: pwsX-EPOCH[-t].txt
where
- X is the network id
- EPOCH is the epoch of the used savefile
- -t is the temperature used for sampling
Results
Again, our goal was to see if the network can predirect passwords from the testing dataset (test.txt
) by learning passwords from the training set (train.txt
). Now that we've gathered some (minimal amount of) data, let's evaluate it.
Network #1
- Neurons per layer: 128
- Layers: 3
- Sequence length: 50
This network generated 31k unique passwords of which only 92 are in the training set:
$> cat pws-* | sort | uniq -u > output-1.txt
$> wc -l output-1.txt
31441 output-1.txt
$> sort output-1.txt train.txt | uniq -d | wc -l
92
$> sort output-1.txt test.txt | uniq -d | wc -l
3180
YAY the network created 3180 passwords which can also be found in the testing set. Let's have a look at some of these:
$> sort output-1.txt test.txt | uniq -d | sort -R | head -n 25
freddy12
ava2005
joseph2006
sexylexy
marcolo
jackson7
jackson1
ariela11
andrew12345
mercedes23
brittshit
19081984
jklover
boodoo
eliana19
725972
daliani
pacy99
121093a
celin1
Hottie.
andrew1010
tatty9
karina79
amanda213
080223
Network #2
- Neurons per layer: 512
- Layers: 3
- Sequence length: 100
I didn't train it too long, because it was relative slow per batch. Only 5k password generated of which 3 are in the training and 178 are in the testing dataset.
$> wc -l output-2.txt
5046 output-2.txt
$> sort output-2.txt train.txt | uniq -d | wc -l
3
$> sort output-2.txt test.txt | uniq -d | wc -l
178
$> sort output-2.txt test.txt | uniq -d | sort -R | head -n 25
zhan23
hooda1
dobes
081087
69777
jestin
492007
25372
griss34
14711
172085
ka2006
ellva
10113
franz27
nissa09
33
012187
a
1113sd
ilove2
ford19
40309
kina13
kiku13
Network #3
- Neurons per layer: 64
- Layers: 3
- Sequence length: 20
The smallest network managed to generate 15k passwords with only 15 being in the training and 726 in the testing set.
$> wc -l output-3.txt
14976 output-3.txt
$> sort output-3.txt train.txt | uniq -d | wc -l
15
$> sort output-3.txt test.txt | uniq -d |wc -l
726
$> sort output-3.txt test.txt | uniq -d | sort -R | head -n 25
dannica
hissss
lobe21
carlo14
ho
killer1
candice12
laiska
kaley2
pusher1
778658
7344501
leanng
bron55
GOLLITA
jaimi
490767
552655
71
12
cocker07
660244
nanina
hirann
love19
All networks together
Let's combine the output of all three networks. We get more than 50k unique passwords from which 3,8k are elements from the testing set and 105 from the training set.
$> wc -l output-all.txt
50691 output-all.txt
$> sort output-all.txt train.txt | uniq -d | wc -l
105
$> sort output-all.txt test.txt | uniq -d | wc -l
3881
$> sort output-all.txt test.txt | uniq -d | sort -R | head -n 25
newmon
chesar
pawita
opalla
buster6
reneeia
singers123
patito123
6312355
biggie1
297572
jairo01
bday1992
coolboy1
meeka1
blackie
laura25
neil10
mone10
12345sexy
dog143
193208
mash28
alexa87
793806
I'm sure that one can predict more passwords when the model is trained longer, with other parameters or when the -primetext
parameter is used during sampling.
Data
If that caught your interest and you want to continue to work on this and/or start the recurrent neural network from a savefile, I've uploaded the contents of the directories:
- cv/
- data/
- output/
to Github: https://github.com/gehaxelt/RNN-Passwords.
Conclusion
As shown above, it is possible to let a computer crunch some numbers and predirect unkown passwords. But based on the amount of data used for the training/testing dataset and the small percentage of predirected passwords, I don't think that his approach has any practical relevance.
-=-