{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "This Notebook is a Stable-diffusion tool which allows you to find similiar tokens from the SD 1.5 vocab.json that you can use for text-to-image generation. Try this Free online SD 1.5 generator with the results: https://perchance.org/fusion-ai-image-generator\n", "\n", "Scroll to the bottom of the notebook to see the guide for how this works." ], "metadata": { "id": "L7JTcbOdBPfh" } }, { "cell_type": "code", "source": [ "# @title ✳️ Load/initialize values\n", "# Load the tokens into the colab\n", "!git clone https://huggingface.co/datasets/codeShare/sd_tokens\n", "import torch\n", "from torch import linalg as LA\n", "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", "%cd /content/sd_tokens\n", "token = torch.load('sd15_tensors.pt', map_location=device, weights_only=True)\n", "#-----#\n", "VOCAB_FILENAME = 'tokens_most_similiar_to_girl'\n", "ACTIVE_IMG = ''\n", "#-----#\n", "\n", "#Import the vocab.json\n", "import json\n", "import pandas as pd\n", "with open('vocab.json', 'r') as f:\n", " data = json.load(f)\n", "\n", "_df = pd.DataFrame({'count': data})['count']\n", "\n", "vocab = {\n", " value: key for key, value in _df.items()\n", "}\n", "#-----#\n", "\n", "# Define functions/constants\n", "NUM_TOKENS = 49407\n", "\n", "def absolute_value(x):\n", " return max(x, -x)\n", "\n", "\n", "def token_similarity(A, B):\n", "\n", " #Vector length#\n", " _A = LA.vector_norm(A, ord=2)\n", " _B = LA.vector_norm(B, ord=2)\n", "\n", " #----#\n", " result = torch.dot(A,B)/(_A*_B)\n", " #similarity_pcnt = absolute_value(result.item()*100)\n", " similarity_pcnt = result.item()*100\n", " similarity_pcnt_aprox = round(similarity_pcnt, 3)\n", " result = f'{similarity_pcnt_aprox} %'\n", " return result\n", "\n", "\n", "def similarity(id_A , id_B):\n", " #Tensors\n", " A = token[id_A]\n", " B = token[id_B]\n", " return token_similarity(A, B)\n", "#----#\n", "\n", "#print(vocab[8922]) #the vocab item for ID 8922\n", "#print(token[8922].shape) #dimension of the token\n", "\n", "mix_with = \"\"\n", "mix_method = \"None\"\n", "\n", "#-------------#\n", "# UNUSED\n", "\n", "# Get the 10 lowest values from a tensor as a string\n", "def get_valleys (A):\n", " sorted, indices = torch.sort(A,dim=0 , descending=False)\n", " result = \"{\"\n", " for index in range(10):\n", " id = indices[index].item()\n", " result = result + f\"{id}\"\n", " if(index<9):\n", " result = result + \",\"\n", " result = result + \"}\"\n", " return result\n", "\n", "# Get the 10 highest values from a tensor as a string\n", "def get_peaks (A):\n", " sorted, indices = torch.sort(A,dim=0 , descending=True)\n", " result = \"{\"\n", " for index in range(10):\n", " id = indices[index].item()\n", " result = result + f\"{id}\"\n", " if(index<9):\n", " result = result + \",\"\n", " result = result + \"}\"\n", " return result" ], "metadata": { "id": "Ch9puvwKH1s3", "collapsed": true, "outputId": "033c251a-2043-40e7-9500-4da870ffa7fd", "colab": { "base_uri": "https://localhost:8080/" }, "cellView": "form" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Cloning into 'sd_tokens'...\n", "remote: Enumerating objects: 20, done.\u001b[K\n", "remote: Counting objects: 100% (17/17), done.\u001b[K\n", "remote: Compressing objects: 100% (17/17), done.\u001b[K\n", "remote: Total 20 (delta 4), reused 0 (delta 0), pack-reused 3 (from 1)\u001b[K\n", "Unpacking objects: 100% (20/20), 310.37 KiB | 2.10 MiB/s, done.\n", "Filtering content: 100% (3/3), 160.82 MiB | 26.64 MiB/s, done.\n", "/content/sd_tokens\n" ] } ] }, { "cell_type": "code", "source": [ "# @title ⚡ Get similiar tokens\n", "import torch\n", "from transformers import AutoTokenizer\n", "tokenizer = AutoTokenizer.from_pretrained(\"openai/clip-vit-large-patch14\", clean_up_tokenization_spaces = False)\n", "\n", "# @markdown Write name of token to match against\n", "token_name = \" banana\" # @param {type:'string',\"placeholder\":\"leave empty for random value token\"}\n", "\n", "prompt = token_name\n", "# @markdown (optional) Mix the token with something else\n", "mix_with = \"\" # @param {\"type\":\"string\",\"placeholder\":\"leave empty for random value token\"}\n", "mix_method = \"None\" # @param [\"None\" , \"Average\", \"Subtract\"] {allow-input: true}\n", "w = 0.5 # @param {type:\"slider\", min:0, max:1, step:0.01}\n", "# @markdown Limit char size of included token\n", "\n", "min_char_size = 0 # param {type:\"slider\", min:0, max: 50, step:1}\n", "char_range = 50 # param {type:\"slider\", min:0, max: 50, step:1}\n", "\n", "tokenizer_output = tokenizer(text = prompt)\n", "input_ids = tokenizer_output['input_ids']\n", "id_A = input_ids[1]\n", "A = torch.tensor(token[id_A])\n", "A = A/A.norm(p=2, dim=-1, keepdim=True)\n", "#-----#\n", "tokenizer_output = tokenizer(text = mix_with)\n", "input_ids = tokenizer_output['input_ids']\n", "id_C = input_ids[1]\n", "C = torch.tensor(token[id_C])\n", "C = C/C.norm(p=2, dim=-1, keepdim=True)\n", "#-----#\n", "sim_AC = torch.dot(A,C)\n", "#-----#\n", "print(input_ids)\n", "#-----#\n", "\n", "#if no imput exists we just randomize the entire thing\n", "if (prompt == \"\"):\n", " id_A = -1\n", " print(\"Tokenized prompt tensor A is a random valued tensor with no ID\")\n", " R = torch.rand(A.shape)\n", " R = R/R.norm(p=2, dim=-1, keepdim=True)\n", " A = R\n", " name_A = 'random_A'\n", "\n", "#if no imput exists we just randomize the entire thing\n", "if (mix_with == \"\"):\n", " id_C = -1\n", " print(\"Tokenized prompt 'mix_with' tensor C is a random valued tensor with no ID\")\n", " R = torch.rand(A.shape)\n", " R = R/R.norm(p=2, dim=-1, keepdim=True)\n", " C = R\n", " name_C = 'random_C'\n", "\n", "name_A = \"A of random type\"\n", "if (id_A>-1):\n", " name_A = vocab[id_A]\n", "\n", "name_C = \"token C of random type\"\n", "if (id_C>-1):\n", " name_C = vocab[id_C]\n", "\n", "print(f\"The similarity between A '{name_A}' and C '{name_C}' is {round(sim_AC.item()*100,2)} %\")\n", "\n", "if (mix_method == \"None\"):\n", " print(\"No operation\")\n", "\n", "if (mix_method == \"Average\"):\n", " A = w*A + (1-w)*C\n", " _A = LA.vector_norm(A, ord=2)\n", " print(f\"Tokenized prompt tensor A '{name_A}' token has been recalculated as A = w*A + (1-w)*C , where C is '{name_C}' token , for w = {w} \")\n", "\n", "if (mix_method == \"Subtract\"):\n", " tmp = w*A - (1-w)*C\n", " tmp = tmp/tmp.norm(p=2, dim=-1, keepdim=True)\n", " A = tmp\n", " #//---//\n", " print(f\"Tokenized prompt tensor A '{name_A}' token has been recalculated as A = _A*norm(w*A - (1-w)*C) , where C is '{name_C}' token , for w = {w} \")\n", "\n", "#OPTIONAL : Add/subtract + normalize above result with another token. Leave field empty to get a random value tensor\n", "\n", "dots = torch.zeros(NUM_TOKENS)\n", "for index in range(NUM_TOKENS):\n", " id_B = index\n", " B = torch.tensor(token[id_B])\n", " B = B/B.norm(p=2, dim=-1, keepdim=True)\n", " sim_AB = torch.dot(A,B)\n", " dots[index] = sim_AB\n", "\n", "\n", "sorted, indices = torch.sort(dots,dim=0 , descending=True)\n", "#----#\n", "if (mix_method == \"Average\"):\n", " print(f'Calculated all cosine-similarities between the average of token {name_A} and {name_C} with Id_A = {id_A} and mixed Id_C = {id_C} as a 1x{sorted.shape[0]} tensor')\n", "if (mix_method == \"Subtract\"):\n", " print(f'Calculated all cosine-similarities between the subtract of token {name_A} and {name_C} with Id_A = {id_A} and mixed Id_C = {id_C} as a 1x{sorted.shape[0]} tensor')\n", "if (mix_method == \"None\"):\n", " print(f'Calculated all cosine-similarities between the token {name_A} with Id_A = {id_A} with the the rest of the {NUM_TOKENS} tokens as a 1x{sorted.shape[0]} tensor')\n", "\n", "#Produce a list id IDs that are most similiar to the prompt ID at positiion 1 based on above result\n", "\n", "# @markdown Set print options\n", "list_size = 100 # @param {type:'number'}\n", "print_ID = False # @param {type:\"boolean\"}\n", "print_Similarity = True # @param {type:\"boolean\"}\n", "print_Name = True # @param {type:\"boolean\"}\n", "print_Divider = True # @param {type:\"boolean\"}\n", "\n", "\n", "if (print_Divider):\n", " print('//---//')\n", "\n", "print('')\n", "print('Here is the result : ')\n", "print('')\n", "\n", "for index in range(list_size):\n", " id = indices[index].item()\n", " if (print_Name):\n", " print(f'{vocab[id]}') # vocab item\n", " if (print_ID):\n", " print(f'ID = {id}') # IDs\n", " if (print_Similarity):\n", " print(f'similiarity = {round(sorted[index].item()*100,2)} %')\n", " if (print_Divider):\n", " print('--------')\n", "\n", "#Print the sorted list from above result\n", "\n", "#The prompt will be enclosed with the <|start-of-text|> and <|end-of-text|> tokens, which is why output will be [49406, ... , 49407].\n", "\n", "#You can leave the 'prompt' field empty to get a random value tensor. Since the tensor is random value, it will not correspond to any tensor in the vocab.json list , and this it will have no ID.\n", "\n", "# Save results as .db file\n", "import shelve\n", "VOCAB_FILENAME = 'tokens_most_similiar_to_' + name_A.replace('','').strip()\n", "d = shelve.open(VOCAB_FILENAME)\n", "#NUM TOKENS == 49407\n", "for index in range(NUM_TOKENS):\n", " #print(d[f'{index}']) #<-----Use this to read values from the .db file\n", " d[f'{index}']= vocab[indices[index].item()] #<---- write values to .db file\n", "#----#\n", "d.close() #close the file\n", "# See this link for additional stuff to do with shelve: https://docs.python.org/3/library/shelve.html" ], "metadata": { "id": "iWeFnT1gAx6A", "cellView": "form" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Below image interrogator appends CLIP tokens to either end of the 'must_contain' text , and seeks to maximize similarity with the image encoding.\n", "\n", "It takes a long while to check all the tokens (too long!) so this cell only samples a range of the 49K available tokens.\n", "\n", "You can run this cell, then paste the result into the 'must_contain' box , and then run the cell again.\n", "\n", "Check the sd_tokens folder for stored .db files from running the '⚡ Get similiar tokens' cell. These can be used in the ⚡+🖼️ -> 📝 Token-Sampling Image interrogator cell\n" ], "metadata": { "id": "IUCuV9RtQpBn" } }, { "cell_type": "code", "source": [ "# @title ⚡+🖼️ -> 📝 Token-Sampling Image interrogator\n", "#-----#\n", "NUM_TOKENS = 49407\n", "import shelve\n", "db_vocab = shelve.open(VOCAB_FILENAME)\n", "print(f'using the tokens found in {VOCAB_FILENAME}.db as the vocab')\n", "# @markdown # What do you want to to mimic?\n", "use = '🖼️image_encoding from image' # @param ['📝text_encoding from prompt', '🖼️image_encoding from image']\n", "# @markdown --------------------------\n", "use_token_padding = True # param {type:\"boolean\"} <---- Enabled by default\n", "prompt = \"photo of a banana\" # @param {\"type\":\"string\",\"placeholder\":\"Write a prompt\"}\n", "#-----#\n", "prompt_A = prompt\n", "#-----#\n", "from google.colab import files\n", "def upload_files():\n", " from google.colab import files\n", " uploaded = files.upload()\n", " for k, v in uploaded.items():\n", " open(k, 'wb').write(v)\n", " return list(uploaded.keys())\n", "#Get image\n", "# You can use \"http://images.cocodataset.org/val2017/000000039769.jpg\" for testing\n", "image_url = \"http://images.cocodataset.org/val2017/000000039769.jpg\" # @param {\"type\":\"string\",\"placeholder\":\"leave empty for local upload (scroll down to see it)\"}\n", "colab_image_path = \"\" # @param {\"type\":\"string\",\"placeholder\": \"eval. as '/content/sd_tokens/' + **your input**\"}\n", "# @markdown --------------------------\n", "from PIL import Image\n", "import requests\n", "image_A = \"\"\n", "#----#\n", "if(use == '🖼️image_encoding from image'):\n", " if image_url == \"\":\n", " import cv2\n", " from google.colab.patches import cv2_imshow\n", " # Open the image.\n", " if colab_image_path == \"\":\n", " keys = upload_files()\n", " for key in keys:\n", " image_A = cv2.imread(\"/content/sd_tokens/\" + key)\n", " colab_image_path = \"/content/sd_tokens/\" + key\n", " else:\n", " image_A = cv2.imread(\"/content/sd_tokens/\" + colab_image_path)\n", " else:\n", " image_A = Image.open(requests.get(image_url, stream=True).raw)\n", "#------#\n", "from transformers import AutoTokenizer\n", "tokenizer = AutoTokenizer.from_pretrained(\"openai/clip-vit-large-patch14\", clean_up_tokenization_spaces = False)\n", "from transformers import CLIPProcessor, CLIPModel\n", "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-large-patch14\" , clean_up_tokenization_spaces = True)\n", "model = CLIPModel.from_pretrained(\"openai/clip-vit-large-patch14\")\n", "#-----#\n", "if(use == '🖼️image_encoding from image'):\n", " # Get image features\n", " inputs = processor(images=image_A, return_tensors=\"pt\")\n", " image_features = model.get_image_features(**inputs)\n", " image_features = image_features / image_features.norm(p=2, dim=-1, keepdim=True)\n", " name_A = \"the image\"\n", "#-----#\n", "if(use == '📝text_encoding from prompt'):\n", " # Get text features\n", " inputs = tokenizer(text = prompt, padding=True, return_tensors=\"pt\")\n", " text_features_A = model.get_text_features(**inputs)\n", " name_A = prompt\n", "#-----#\n", "# @markdown # The output...\n", "must_start_with = \"\" # @param {\"type\":\"string\",\"placeholder\":\"write a text\"}\n", "must_contain = \"\" # @param {\"type\":\"string\",\"placeholder\":\"write a text\"}\n", "must_end_with = \"\" # @param {\"type\":\"string\",\"placeholder\":\"write a text\"}\n", "# @markdown -----\n", "# @markdown # Use a range of tokens from the vocab.json (slow method)\n", "start_search_at_index = 0 # @param {type:\"slider\", min:0, max: 49407, step:100}\n", "# @markdown The lower the start_index, the more similiar the sampled tokens will be to the target token assigned in the '⚡ Get similiar tokens' cell\". If the cell was not run, then it will use tokens ordered by similarity to the \"girl\\\" token\n", "start_search_at_ID = start_search_at_index\n", "search_range = 1000 # @param {type:\"slider\", min:100, max:49407, step:100}\n", "\n", "samples_per_iter = 10 # @param {type:\"slider\", min:10, max: 100, step:10}\n", "\n", "iterations = 5 # @param {type:\"slider\", min:1, max: 20, step:0}\n", "restrictions = 'None' # @param [\"None\", \"Suffix only\", \"Prefix only\"]\n", "#markdown Limit char size of included token <----- Disabled\n", "min_char_size = 0 #param {type:\"slider\", min:0, max: 20, step:1}\n", "char_range = 50 #param {type:\"slider\", min:0, max: 20, step:1}\n", "# markdown # ...or paste prompt items\n", "# markdown Format must be {item1|item2|...}. You can aquire prompt items using the Randomizer in the fusion gen: https://perchance.org/fusion-ai-image-generator\n", "_enable = False # param {\"type\":\"boolean\"}\n", "prompt_items = \"\" # param {\"type\":\"string\",\"placeholder\":\"{item1|item2|...}\"}\n", "#-----#\n", "#-----#\n", "START = start_search_at_ID\n", "RANGE = min(search_range , max(1,NUM_TOKENS - start_search_at_ID))\n", "#-----#\n", "import math, random\n", "NUM_PERMUTATIONS = 6\n", "ITERS = iterations\n", "#-----#\n", "#LOOP START\n", "#-----#\n", "# Check if original solution is best\n", "best_sim = 0\n", "name = must_start_with + must_contain + must_end_with\n", "ids = processor.tokenizer(text=name, padding=use_token_padding, return_tensors=\"pt\")\n", "text_features = model.get_text_features(**ids)\n", "text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True)\n", "#------#\n", "sim = 0\n", "if(use == '🖼️image_encoding from image'):\n", " logit_scale = model.logit_scale.exp()\n", " torch.matmul(text_features, image_features.t()) * logit_scale\n", " sim = torch.nn.functional.cosine_similarity(text_features, image_features) * logit_scale\n", "#-----#\n", "if(use == '📝text_encoding from prompt'):\n", " sim = torch.nn.functional.cosine_similarity(text_features, text_features_A)\n", "#-----#\n", "best_sim = sim\n", "best_name = name\n", "name_B = must_contain\n", "#------#\n", "results_sim = torch.zeros(ITERS*NUM_PERMUTATIONS)\n", "results_name_B = {}\n", "results_name = {}\n", "#-----#\n", "for iter in range(ITERS):\n", " dots = torch.zeros(min(list_size,RANGE))\n", " is_trail = torch.zeros(min(list_size,RANGE))\n", "\n", " #-----#\n", "\n", " for index in range(samples_per_iter):\n", " _start = START\n", " id_C = random.randint(_start , _start + RANGE)\n", " name_C = db_vocab[f'{id_C}']\n", " is_Prefix = 0\n", " #Skip if non-AZ characters are found\n", " #???\n", " #-----#\n", " # Decide if we should process prefix/suffix tokens\n", " if name_C.find('')<=-1:\n", " is_Prefix = 1\n", " if restrictions != \"Prefix only\":\n", " continue\n", " else:\n", " if restrictions == \"Prefix only\":\n", " continue\n", " #-----#\n", " # Decide if char-size is within range\n", " if len(name_C) < min_char_size:\n", " continue\n", " if len(name_C) > min_char_size + char_range:\n", " continue\n", " #-----#\n", " name_CB = must_start_with + name_C + name_B + must_end_with\n", " if is_Prefix>0:\n", " name_CB = must_start_with + ' ' + name_C + '-' + name_B + ' ' + must_end_with\n", " #-----#\n", " if(use == '🖼️image_encoding from image'):\n", " ids_CB = processor.tokenizer(text=name_CB, padding=use_token_padding, return_tensors=\"pt\")\n", " text_features = model.get_text_features(**ids_CB)\n", " text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True)\n", " logit_scale = model.logit_scale.exp()\n", " torch.matmul(text_features, image_features.t()) * logit_scale\n", " sim_CB = torch.nn.functional.cosine_similarity(text_features, image_features) * logit_scale\n", " #-----#\n", " if(use == '📝text_encoding from prompt'):\n", " ids_CB = processor.tokenizer(text=name_CB, padding=use_token_padding, return_tensors=\"pt\")\n", " text_features = model.get_text_features(**ids_CB)\n", " text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True)\n", " sim_CB = torch.nn.functional.cosine_similarity(text_features, text_features_A)\n", " #-----#\n", " #-----#\n", " if restrictions == \"Prefix only\":\n", " result = sim_CB\n", " result = result.item()\n", " dots[index] = result\n", " continue\n", " #-----#\n", " if(use == '🖼️image_encoding from image'):\n", " name_BC = must_start_with + name_B + name_C + must_end_with\n", " ids_BC = processor.tokenizer(text=name_BC, padding=use_token_padding, return_tensors=\"pt\")\n", " text_features = model.get_text_features(**ids_BC)\n", " text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True)\n", " logit_scale = model.logit_scale.exp()\n", " torch.matmul(text_features, image_features.t()) * logit_scale\n", " sim_BC = torch.nn.functional.cosine_similarity(text_features, image_features) * logit_scale\n", " #-----#\n", " if(use == '📝text_encoding from prompt'):\n", " name_BC = must_start_with + name_B + name_C + must_end_with\n", " ids_BC = processor.tokenizer(text=name_BC, padding=use_token_padding, return_tensors=\"pt\")\n", " text_features = model.get_text_features(**ids_BC)\n", " text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True)\n", " sim_BC = torch.nn.functional.cosine_similarity(text_features, text_features_A)\n", " #-----#\n", " result = sim_CB\n", " if(sim_BC > sim_CB):\n", " is_trail[index] = 1\n", " result = sim_BC\n", " #-----#\n", " #result = absolute_value(result.item())\n", " result = result.item()\n", " dots[index] = result\n", " #----#\n", " sorted, indices = torch.sort(dots,dim=0 , descending=True)\n", " # @markdown ----------\n", " # @markdown # Print options\n", " list_size = 100 # param {type:'number'}\n", " print_ID = False # @param {type:\"boolean\"}\n", " print_Similarity = True # @param {type:\"boolean\"}\n", " print_Name = True # @param {type:\"boolean\"}\n", " print_Divider = True # @param {type:\"boolean\"}\n", " print_Suggestions = False # @param {type:\"boolean\"}\n", " #----#\n", " if (print_Divider):\n", " print('//---//')\n", " #----#\n", " print('')\n", "\n", " used_reference = f'the text_encoding for {prompt_A}'\n", " if(use == '🖼️image_encoding from image'):\n", " used_reference = 'the image input'\n", " print(f'These token pairings within the range ID = {_start} to ID = {_start + RANGE} most closely match {used_reference}: ')\n", " print('')\n", " #----#\n", " aheads = \"{\"\n", " trails = \"{\"\n", " tmp = \"\"\n", " #----#\n", " max_sim_ahead = 0\n", " max_sim_trail = 0\n", " sim = 0\n", " max_name_ahead = ''\n", " max_name_trail = ''\n", " #----#\n", " for index in range(min(list_size,RANGE)):\n", " id = _start + indices[index].item()\n", " name = db_vocab[f'{id}']\n", " #-----#\n", " if (name.find('')<=-1):\n", " name = name + '-'\n", " if(is_trail[index]>0):\n", " trails = trails + name + \"|\"\n", " else:\n", " aheads = aheads + name + \"|\"\n", " #----#\n", " sim = sorted[index].item()\n", " #----#\n", " if(is_trail[index]>0):\n", " if sim>max_sim_trail:\n", " max_sim_trail = sim\n", " max_name_trail = name\n", " max_name_trail = max_name_trail.strip()\n", "\n", " else:\n", " if sim>max_sim_ahead:\n", " max_sim_ahead = sim\n", " max_name_ahead = name\n", " #------#\n", " trails = (trails + \"&&&&\").replace(\"|&&&&\", \"}\").replace(\"\", \" \").replace(\"{&&&&\", \"\")\n", " aheads = (aheads + \"&&&&\").replace(\"|&&&&\", \"}\").replace(\"\", \" \").replace(\"{&&&&\", \"\")\n", " #-----#\n", "\n", " if(print_Suggestions):\n", " print(f\"place these items ahead of prompt : {aheads}\")\n", " print(\"\")\n", " print(f\"place these items behind the prompt : {trails}\")\n", " print(\"\")\n", "\n", " tmp = must_start_with + ' ' + max_name_ahead + name_B + ' ' + must_end_with\n", " tmp = tmp.strip().replace('', ' ')\n", " print(f\"max_similarity_ahead = {round(max_sim_ahead,2)} % when using '{tmp}' \")\n", " print(\"\")\n", " tmp = must_start_with + ' ' + name_B + max_name_trail + ' ' + must_end_with\n", " tmp = tmp.strip().replace('', ' ')\n", " print(f\"max_similarity_trail = {round(max_sim_trail,2)} % when using '{tmp}' \")\n", " #-----#\n", " #STEP 2\n", " import random\n", " #-----#\n", " for index in range(NUM_PERMUTATIONS):\n", " name_inner = ''\n", " if index == 0 : name_inner = name_B\n", " if index == 1: name_inner = max_name_ahead\n", " if index == 2: name_inner = max_name_trail\n", " if index == 3: name_inner = name_B + max_name_trail\n", " if index == 4: name_inner = max_name_ahead + name_B\n", " if index == 5: name_inner = max_name_ahead + name_B + max_name_trail\n", " if name_inner == '': name_inner = max_name_ahead + name_B + max_name_trail\n", "\n", " name = must_start_with + name_inner + must_end_with\n", " #----#\n", " ids = processor.tokenizer(text=name, padding=use_token_padding, return_tensors=\"pt\")\n", " #----#\n", " sim = 0\n", " if(use == '🖼️image_encoding from image'):\n", " text_features = model.get_text_features(**ids)\n", " text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True)\n", " logit_scale = model.logit_scale.exp()\n", " torch.matmul(text_features, image_features.t()) * logit_scale\n", " sim = torch.nn.functional.cosine_similarity(text_features, image_features) * logit_scale\n", " #-----#\n", " if(use == '📝text_encoding from prompt'):\n", " text_features = model.get_text_features(**ids)\n", " text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True)\n", " sim = torch.nn.functional.cosine_similarity(text_features, text_features_A)\n", " #-----#\n", " results_name[iter*NUM_PERMUTATIONS + index] = name\n", " results_sim[iter*NUM_PERMUTATIONS + index] = sim\n", " results_name_B[iter*NUM_PERMUTATIONS + index] = name_inner.replace('',' ')\n", " #------#\n", " #name_B = results_name_B[iter*NUM_PERMUTATIONS + random.randint(0,3)]\n", " tmp = iter*NUM_PERMUTATIONS\n", " _name_B=''\n", " if results_sim[tmp+1]>results_sim[tmp+2]: _name_B = results_name_B[tmp + 3]\n", " if results_sim[tmp+2]>results_sim[tmp+1]: _name_B = results_name_B[tmp + 4]\n", "\n", " if _name_B != name_B:\n", " name_B=_name_B\n", " else:\n", " name_B = results_name_B[tmp + 5]\n", "\n", "#--------#\n", "print('')\n", "if(use == '🖼️image_encoding from image' and colab_image_path != \"\"):\n", " from google.colab.patches import cv2_imshow\n", " cv2_imshow(image_A)\n", "#-----#\n", "print('')\n", "sorted, indices = torch.sort(results_sim,dim=0 , descending=True)\n", "\n", "for index in range(ITERS*NUM_PERMUTATIONS):\n", " name_inner = results_name[indices[index].item()]\n", " print(must_start_with + name_inner + must_end_with)\n", " print(f'similiarity = {round(sorted[index].item(),2)} %')\n", " print('------')\n", "#------#\n", "db_vocab.close() #close the file" ], "metadata": { "collapsed": true, "id": "fi0jRruI0-tu", "cellView": "form" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# @title (Optional) ⚡Actively set which Vocab list to use for the interrogator\n", "token_name = \"\" # @param {\"type\":\"string\",\"placeholder\":\"Write a token_name used earlier\"}\n", "VOCAB_FILENAME = 'tokens_most_similiar_to_' + token_name.replace('','').strip()\n", "print(f'Using a vocab ordered to most similiar to the token {token_name}')" ], "metadata": { "id": "FYa96UCQuE1U" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# @title 💫 Compare Text encodings\n", "prompt_A = \"banana\" # @param {\"type\":\"string\",\"placeholder\":\"Write a prompt\"}\n", "prompt_B = \"bike \" # @param {\"type\":\"string\",\"placeholder\":\"Write a prompt\"}\n", "use_token_padding = True # param {type:\"boolean\"} <----- Enabled by default\n", "#-----#\n", "from transformers import AutoTokenizer\n", "tokenizer = AutoTokenizer.from_pretrained(\"openai/clip-vit-large-patch14\",\n", "clean_up_tokenization_spaces = False)\n", "#-----#\n", "from transformers import CLIPProcessor, CLIPModel\n", "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-large-patch14\" , clean_up_tokenization_spaces = True)\n", "model = CLIPModel.from_pretrained(\"openai/clip-vit-large-patch14\")\n", "#----#\n", "inputs = tokenizer(text = prompt_A, padding=True, return_tensors=\"pt\")\n", "text_features_A = model.get_text_features(**inputs)\n", "text_features_A = text_features_A / text_features_A.norm(p=2, dim=-1, keepdim=True)\n", "name_A = prompt_A\n", "#----#\n", "inputs = tokenizer(text = prompt_B, padding=True, return_tensors=\"pt\")\n", "text_features_B = model.get_text_features(**inputs)\n", "text_features_B = text_features_B / text_features_B.norm(p=2, dim=-1, keepdim=True)\n", "name_B = prompt_B\n", "#----#\n", "import torch\n", "sim_AB = torch.nn.functional.cosine_similarity(text_features_A, text_features_B)\n", "#----#\n", "print(f'The similarity between the text_encoding for A:\"{prompt_A}\" and B: \"{prompt_B}\" is {round(sim_AB.item()*100,2)} %')" ], "metadata": { "id": "QQOjh5BvnG8M", "collapsed": true, "cellView": "form" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "You can write an url or upload a file locally from your device to use as reference. The image will by saved in the 'sd_tokens' folder. Note that the 'sd_tokens' folder will be deleted upon exiting this runtime." ], "metadata": { "id": "hyK423TQCRup" } }, { "cell_type": "markdown", "source": [ "\n", "\n", "# How does this notebook work?\n", "\n", "Similiar vectors = similiar output in the SD 1.5 / SDXL / FLUX model\n", "\n", "CLIP converts the prompt text to vectors (“tensors”) , with float32 values usually ranging from -1 to 1.\n", "\n", "Dimensions are \\[ 1x768 ] tensors for SD 1.5 , and a \\[ 1x768 , 1x1024 ] tensor for SDXL and FLUX.\n", "\n", "The SD models and FLUX converts these vectors to an image.\n", "\n", "This notebook takes an input string , tokenizes it and matches the first token against the 49407 token vectors in the vocab.json : [https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/tokenizer](https://www.google.com/url?q=https%3A%2F%2Fhuggingface.co%2Fblack-forest-labs%2FFLUX.1-dev%2Ftree%2Fmain%2Ftokenizer)\n", "\n", "It finds the “most similiar tokens” in the list. Similarity is the theta angle between the token vectors.\n", "\n", "