How to use this model with vectordb
I try to use this model with vectordb after embedding it with HuggingFaceEmbedding. Here is the code
model_name = "dangvantuan/vietnamese-embedding"
model_kwargs = {"device": "cuda"}
try to access the sentence transformers from HuggingFace: https://huggingface.co/api/models/sentence-transformers/all-mpnet-base-v2
try:
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
except Exception as ex:
print("Exception: ", ex)
# alternatively, we will access the embeddings models locally
local_model_path = "transformers/default/1/vietnamese-embedding"
print(f"Use alternative (local) model: {local_model_path}\n")
embeddings = HuggingFaceEmbeddings(model_name=local_model_path, model_kwargs=model_kwargs)
The embedding process was fine, but the vectordb part wasnt. Here is the code and error:
Code:
import('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')
import chromadb
import os
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")
Error:
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [14,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize
failed.
RuntimeError Traceback (most recent call last)
Cell In[13], line 6
4 import chromadb
5 import os
----> 6 vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")
File /opt/conda/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py:878, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
876 texts = [doc.page_content for doc in documents]
877 metadatas = [doc.metadata for doc in documents]
--> 878 return cls.from_texts(
879 texts=texts,
880 embedding=embedding,
881 metadatas=metadatas,
882 ids=ids,
883 collection_name=collection_name,
884 persist_directory=persist_directory,
885 client_settings=client_settings,
886 client=client,
887 collection_metadata=collection_metadata,
888 **kwargs,
889 )
File /opt/conda/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py:842, in Chroma.from_texts(cls, texts, embedding, metadatas, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
836 chroma_collection.add_texts(
837 texts=batch[3] if batch[3] else [],
838 metadatas=batch[2] if batch[2] else None,
839 ids=batch[0],
840 )
841 else:
--> 842 chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
843 return chroma_collection
File /opt/conda/lib/python3.10/site-packages/langchain_community/vectorstores/chroma.py:277, in Chroma.add_texts(self, texts, metadatas, ids, **kwargs)
275 texts = list(texts)
276 if self._embedding_function is not None:
--> 277 embeddings = self._embedding_function.embed_documents(texts)
278 if metadatas:
279 # fill metadatas with empty dicts if somebody
280 # did not specify metadata for all texts
281 length_diff = len(texts) - len(metadatas)
File /opt/conda/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py:103, in HuggingFaceEmbeddings.embed_documents(self, texts)
101 sentence_transformers.SentenceTransformer.stop_multi_process_pool(pool)
102 else:
--> 103 embeddings = self.client.encode(
104 texts, show_progress_bar=self.show_progress, **self.encode_kwargs
105 )
107 return embeddings.tolist()
File /opt/conda/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py:517, in SentenceTransformer.encode(self, sentences, prompt_name, prompt, batch_size, show_progress_bar, output_value, precision, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
514 features.update(extra_features)
516 with torch.no_grad():
--> 517 out_features = self.forward(features)
518 if self.device.type == "hpu":
519 out_features = copy.deepcopy(out_features)
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py:219, in Sequential.forward(self, input)
217 def forward(self, input):
218 for module in self:
--> 219 input = module(input)
220 return input
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File /opt/conda/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py:118, in Transformer.forward(self, features)
115 if "token_type_ids" in features:
116 trans_features["token_type_ids"] = features["token_type_ids"]
--> 118 output_states = self.auto_model(**trans_features, return_dict=False)
119 output_tokens = output_states[0]
121 features.update({"token_embeddings": output_tokens, "attention_mask": features["attention_mask"]})
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File /opt/conda/lib/python3.10/site-packages/transformers/models/roberta/modeling_roberta.py:825, in RobertaModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
818 # Prepare head mask if needed
819 # 1.0 in head_mask indicate we keep the head
820 # attention_probs has shape bsz x n_heads x N x N
821 # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
822 # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
823 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
--> 825 embedding_output = self.embeddings(
826 input_ids=input_ids,
827 position_ids=position_ids,
828 token_type_ids=token_type_ids,
829 inputs_embeds=inputs_embeds,
830 past_key_values_length=past_key_values_length,
831 )
832 encoder_outputs = self.encoder(
833 embedding_output,
834 attention_mask=extended_attention_mask,
(...)
842 return_dict=return_dict,
843 )
844 sequence_output = encoder_outputs[0]
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File /opt/conda/lib/python3.10/site-packages/transformers/models/roberta/modeling_roberta.py:122, in RobertaEmbeddings.forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
120 position_embeddings = self.position_embeddings(position_ids)
121 embeddings += position_embeddings
--> 122 embeddings = self.LayerNorm(embeddings)
123 embeddings = self.dropout(embeddings)
124 return embeddings
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
1551 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1552 else:
-> 1553 return self._call_impl(*args, **kwargs)
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
1557 # If we don't have any hooks, we want to skip the rest of the logic in
1558 # this function, and just call forward.
1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1560 or _global_backward_pre_hooks or _global_backward_hooks
1561 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562 return forward_call(*args, **kwargs)
1564 try:
1565 result = None
File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py:202, in LayerNorm.forward(self, input)
201 def forward(self, input: Tensor) -> Tensor:
--> 202 return F.layer_norm(
203 input, self.normalized_shape, self.weight, self.bias, self.eps)
File /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py:2576, in layer_norm(input, normalized_shape, weight, bias, eps)
2572 if has_torch_function_variadic(input, weight, bias):
2573 return handle_torch_function(
2574 layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
2575 )
-> 2576 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
@dimacat
This model "dangvantuan/vietnamese-embedding" likely has a maximum token limit (context length) that you're exceeding. When the number of tokens in your input text goes beyond this limit, the model cannot process the input correctly, leading to the CUDA-related errors you observed.
To Fix this problem, you can switch to using the "dangvantuan/vietnamese-embedding-LongContext" model, which is designed to handle longer input sequences with a context length of up to 8000 tokens. This should be sufficient to accommodate your input data without causing the same issue.
@dangvantuan
Thank for your answer, but the model now got out of memory problem. Anyway to fix it:
OutOfMemoryError: CUDA out of memory. Tried to allocate 734.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 38.12 MiB is free. Process 3864 has 14.70 GiB memory in use. Of the allocated memory 14.44 GiB is allocated by PyTorch, and 137.07 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)