DeepSoftwareAnalytics commited on
Commit
37dbe65
1 Parent(s): 6650961

add readme

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CoCoSoDa: Effective Contrastive Learning for Code Search
2
+
3
+ Our approach adopts the pre-trained model as the base code/query encoder and optimizes it using multimodal contrastive learning and soft data augmentation.
4
+
5
+ CoCoSoDa is comprised of the following four components:
6
+ * **Pre-trained code/query encoder** captures the semantic information of a code snippet or a natural language query and maps it into a high-dimensional embedding space.
7
+ as the code/query encoder.
8
+ * **Momentum code/query encoder** encodes the samples (code snippets or queries) of current and previous mini-batches to enrich the negative samples.
9
+
10
+ * **Soft data augmentation** is to dynamically mask or replace some tokens in a sample (code/query) to generate a similar sample as a form of data augmentation.
11
+
12
+ * **Multimodal contrastive learning loss function** is used as the optimization objective and consists of inter-modal and intra-modal contrastive learning loss. They are used to minimize the distance of the representations of similar samples and maximize the distance of different samples in the embedding space.
13
+
14
+
15
+
16
+ ## Usage
17
+
18
+ ```
19
+ import torch
20
+ from transformers import RobertaTokenizer, RobertaConfig, RobertaModel
21
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
22
+ tokenizer = RobertaTokenizer.from_pretrained("DeepSoftwareAnalytics/CoCoSoDa")
23
+ model = RobertaModel.from_pretrained("DeepSoftwareAnalytics/CoCoSoDa")
24
+ ```
25
+
26
+
27
+ ## Reference
28
+
29
+ Shi, E., Wang, Y., Gu, W., Du, L., Zhang, H., Han, S., ... & Sun, H. (2022). [CoCoSoDa: Effective Contrastive Learning for Code Search](https://arxiv.org/abs/2204.03293). ICSE2023.