Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customizable Word List #33

Open
ghost opened this issue Nov 10, 2014 · 6 comments
Open

Customizable Word List #33

ghost opened this issue Nov 10, 2014 · 6 comments

Comments

@ghost
Copy link

ghost commented Nov 10, 2014

It would be nice if I could plug my own word list (+100k words) into this program, but I can't because the word list is hard coded into the program.

@defuse
Copy link
Owner

defuse commented Nov 10, 2014

What dictionary do you want to use? The reason I use the 6k words dictionary is that all the bigger dictionaries I could find have un-memorable words, you'll get things like

box.reductionistic.pessimism.contorted.fly

whose time-it-takes-to-type versus security is low. This forces you to run passgen over and over again until you get words you like, which reduces security as much as using a shorter wordlist. If you have a bigger dictionary that's still usable I'll probably use that one instead.

@defuse defuse closed this as completed Nov 10, 2014
@defuse defuse reopened this Nov 10, 2014
@defuse
Copy link
Owner

defuse commented Nov 10, 2014

Oops, hit the close button by mistake.

@defuse
Copy link
Owner

defuse commented Nov 10, 2014

(Oh yeah, and words that are hard to spell. Those are annoying.)

@ghost
Copy link
Author

ghost commented Nov 10, 2014

Gutenberg.org has a pretty massive free +300k word list. I think it's public domain, but I would double check to make sure.

I wrote a little python script to strip out words that are too long or too short. It not perfect, but it helps. I sent the .zip file in an e-mail. With a little fiddling, it's surprisingly effective.

Max: 10, Min: 6 --> ~100k results. Some of the words are a wee bit esoteric, but it's tolerable.
Max: 8, Min: 4 --> ~50k results. If min >= 4, you strip out most of the junk words.

@ghost
Copy link
Author

ghost commented Nov 10, 2014

Update: I checked the license. It looks like it's public domain.

@ghost
Copy link
Author

ghost commented Nov 10, 2014

Alternatively, http://correcthorsebatterystaple.net/ has a remarkably memorable word list that's free of junk words and chalked full of normal vocabulary. Unfortunately, it's only 2,228 words long.
https://github.com/jvdl/CorrectHorseBatteryStaple/blob/master/data/wordlist.txt

Their "jargon" wordlist has +9,000 words and still doesn't have too much junk in it.
https://github.com/jvdl/CorrectHorseBatteryStaple/blob/master/data/jargon.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant