Details of training dataset(s)

#1
by timesler - opened

Hi all, thanks for your great work on this model. I was wondering if you could share any information on the training datasets used for v2, and specifically the license the data uses?

Protect AI org

Hi @timesler , thanks for your kind words and interest in our model!

We maintain a balance between transparency and confidentiality regarding our datasets. The training data for the model includes public datasets and customized prompt injections derived from research and community feedback.

We comply with all necessary licensing, attributing datasets as required: https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2#dataset

We appreciate your understanding and are here for any further questions!

Best regards,
Oleksandr

Thank you for your response and for sharing that info! Looking forward to trying out the model

timesler changed discussion status to closed

Would you be also open-sourcing the training data? Would be very helpful for the community to use various techniques to extend your data to create more variations.

Sign up or log in to comment