Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / chunked /content_aware_chunking /_pad_truncation /chunk_28.txt

Ahmadzei

added 3 more tables for large emb model

5fa1a76 9 months ago

raw

history blame

3.64 kB

	\| Truncation \| Padding \| Instruction \|
	\|--------------------------------------\|-----------------------------------\|---------------------------------------------------------------------------------------------\|
	\| no truncation \| no padding \| tokenizer(batch_sentences) \|
	\| \| padding to max sequence in batch \| tokenizer(batch_sentences, padding=True) or \|
	\| \| \| tokenizer(batch_sentences, padding='longest') \|
	\| \| padding to max model input length \| tokenizer(batch_sentences, padding='max_length') \|
	\| \| padding to specific length \| tokenizer(batch_sentences, padding='max_length', max_length=42) \|
	\| \| padding to a multiple of a value \| tokenizer(batch_sentences, padding=True, pad_to_multiple_of=8) \|
	\| truncation to max model input length \| no padding \| tokenizer(batch_sentences, truncation=True) or \|
	\| \| \| tokenizer(batch_sentences, truncation=STRATEGY) \|
	\| \| padding to max sequence in batch \| tokenizer(batch_sentences, padding=True, truncation=True) or \|
	\| \| \| tokenizer(batch_sentences, padding=True, truncation=STRATEGY) \|
	\| \| padding to max model input length \| tokenizer(batch_sentences, padding='max_length', truncation=True) or \|
	\| \| \| tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY) \|
	\| \| padding to specific length \| Not possible \|
	\| truncation to specific length \| no padding \| tokenizer(batch_sentences, truncation=True, max_length=42) or \|
	\| \| \| tokenizer(batch_sentences, truncation=STRATEGY, max_length=42) \|
	\| \| padding to max sequence in batch \| tokenizer(batch_sentences, padding=True, truncation=True, max_length=42) or \|
	\| \| \| tokenizer(batch_sentences, padding=True, truncation=STRATEGY, max_length=42) \|
	\| \| padding to max model input length \| Not possible \|
	\| \| padding to specific length \| tokenizer(batch_sentences, padding='max_length', truncation=True, max_length=42) or \|
	\| \| \| tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY, max_length=42) \|

	\| Truncation \| Padding \| Instruction \|
	\|--------------------------------------\|-----------------------------------\|---------------------------------------------------------------------------------------------\|
	\| no truncation \| no padding \| tokenizer(batch_sentences) \|
	\| \| padding to max sequence in batch \| tokenizer(batch_sentences, padding=True) or \|
	\| \| \| tokenizer(batch_sentences, padding='longest') \|
	\| \| padding to max model input length \| tokenizer(batch_sentences, padding='max_length') \|
	\| \| padding to specific length \| tokenizer(batch_sentences, padding='max_length', max_length=42) \|
	\| \| padding to a multiple of a value \| tokenizer(batch_sentences, padding=True, pad_to_multiple_of=8) \|
	\| truncation to max model input length \| no padding \| tokenizer(batch_sentences, truncation=True) or \|
	\| \| \| tokenizer(batch_sentences, truncation=STRATEGY) \|
	\| \| padding to max sequence in batch \| tokenizer(batch_sentences, padding=True, truncation=True) or \|
	\| \| \| tokenizer(batch_sentences, padding=True, truncation=STRATEGY) \|
	\| \| padding to max model input length \| tokenizer(batch_sentences, padding='max_length', truncation=True) or \|
	\| \| \| tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY) \|
	\| \| padding to specific length \| Not possible \|
	\| truncation to specific length \| no padding \| tokenizer(batch_sentences, truncation=True, max_length=42) or \|
	\| \| \| tokenizer(batch_sentences, truncation=STRATEGY, max_length=42) \|
	\| \| padding to max sequence in batch \| tokenizer(batch_sentences, padding=True, truncation=True, max_length=42) or \|
	\| \| \| tokenizer(batch_sentences, padding=True, truncation=STRATEGY, max_length=42) \|
	\| \| padding to max model input length \| Not possible \|
	\| \| padding to specific length \| tokenizer(batch_sentences, padding='max_length', truncation=True, max_length=42) or \|
	\| \| \| tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY, max_length=42) \|