javirandor
commited on
Commit
•
3c676b3
1
Parent(s):
bde0882
Update README.md
Browse files
README.md
CHANGED
@@ -15,4 +15,42 @@ The model inherits the [GPT2LMHeadModel](https://huggingface.co/docs/transformer
|
|
15 |
Passwords can be sampled from the model using the [built-in generation methods](https://huggingface.co/docs/transformers/v4.30.0/en/main_classes/text_generation#transformers.GenerationMixin.generate) provided by HuggingFace and using the "start of password token" as seed (i.e. `<s>`). This code can be used to generate one password with PassGPT:
|
16 |
|
17 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
```
|
|
|
|
|
|
|
|
15 |
Passwords can be sampled from the model using the [built-in generation methods](https://huggingface.co/docs/transformers/v4.30.0/en/main_classes/text_generation#transformers.GenerationMixin.generate) provided by HuggingFace and using the "start of password token" as seed (i.e. `<s>`). This code can be used to generate one password with PassGPT:
|
16 |
|
17 |
```
|
18 |
+
from transformers import GPT2LMHeadModel
|
19 |
+
from transformers import RobertaTokenizerFast
|
20 |
+
|
21 |
+
tokenizer = RobertaTokenizerFast.from_pretrained("javirandor/passgpt-10characters",
|
22 |
+
max_len=12,
|
23 |
+
padding="max_length",
|
24 |
+
truncation=True,
|
25 |
+
do_lower_case=False,
|
26 |
+
strip_accents=False,
|
27 |
+
mask_token="<mask>",
|
28 |
+
unk_token="<unk>",
|
29 |
+
pad_token="<pad>",
|
30 |
+
truncation_side="right")
|
31 |
+
|
32 |
+
model = GPT2LMHeadModel.from_pretrained("javirandor/passgpt-10characters").eval()
|
33 |
+
|
34 |
+
NUM_GENERATIONS = 1
|
35 |
+
|
36 |
+
with torch.no_grad():
|
37 |
+
# Generate passwords sampling from the beginning of password token
|
38 |
+
g = model.generate(torch.tensor([[tokenizer.bos_token_id]]).cuda(),
|
39 |
+
do_sample=True,
|
40 |
+
num_return_sequences=NUM_GENERATIONS,
|
41 |
+
max_length=12,
|
42 |
+
pad_token_id=tokenizer.pad_token_id,
|
43 |
+
bad_words_ids=[[tokenizer.bos_token_id]])
|
44 |
+
|
45 |
+
# Remove start of sentence token
|
46 |
+
g = g[:, 1:]
|
47 |
+
|
48 |
+
decoded = tokenizer.batch_decode(g.tolist())
|
49 |
+
decoded_clean = [i.split("</s>")[0] for i in decoded] # Get content before end of password token
|
50 |
+
|
51 |
+
# Print your sampled passwords!
|
52 |
+
print(decoded_clean)
|
53 |
```
|
54 |
+
|
55 |
+
You can find a more flexible script for sampling [here](https://github.com/javirandor/passgpt/blob/main/src/generate_passwords.py).
|
56 |
+
|