Model card
Browse files
README.md
ADDED
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: graph-ml
|
4 |
+
tags:
|
5 |
+
- graphs
|
6 |
+
- ultra
|
7 |
+
- knowledge graph
|
8 |
+
---
|
9 |
+
|
10 |
+
## Description
|
11 |
+
ULTRA is a foundation model for knowledge graph (KG) reasoning. A single pre-trained ULTRA model performs link prediction tasks on **any** multi-relational graph with any entity / relation vocabulary. Performance-wise averaged on 50+ KGs, a single pre-trained ULTRA model is better in the **0-shot** inference mode than many SOTA models trained specifically on each graph. Following the pretrain-finetune paradigm of foundation models, you can run a pre-trained ULTRA checkpoint **immediately in the zero-shot manner** on any graph as well as **use more fine-tuning**.
|
12 |
+
|
13 |
+
ULTRA provides **unified, learnable, transferable** representations for any KG. Under the hood, ULTRA employs graph neural networks and modified versions of NBFNet. ULTRA does not learn any entity and relation embeddings specific to a downstream graph but instead obtains relative relation representations based on interactions between relations.
|
14 |
+
|
15 |
+
arxiv: https://arxiv.org/abs/2310.04562
|
16 |
+
GitHub: https://github.com/DeepGraphLearning/ULTRA
|
17 |
+
|
18 |
+
|
19 |
+
## Checkpoints
|
20 |
+
Here on HuggingFace, we provide 3 pre-trained ULTRA checkpoints (all ~169k params) varying by the amount of pre-training data.
|
21 |
+
|
22 |
+
| Model | Training KGs |
|
23 |
+
| ------| --------------|
|
24 |
+
| [ultra_3g](https://huggingface.co/mgalkin/ultra_3g) | 3 graphs |
|
25 |
+
| [ultra_4g](https://huggingface.co/mgalkin/ultra_4g) | 4 graphs |
|
26 |
+
| [ultra_50g](https://huggingface.co/mgalkin/ultra_50g) | 50 graphs |
|
27 |
+
|
28 |
+
* [ultra_3g](https://huggingface.co/mgalkin/ultra_3g) and [ultra_4g](https://huggingface.co/mgalkin/ultra_4g) are the PyG models reported in the github repo;
|
29 |
+
* [ultra_50g](https://huggingface.co/mgalkin/ultra_50g) is a new ULTRA checkpoint pre-trained on 50 different KGs (transductive and inductive) for 1M steps to maximize the performance on any unseen downstream KG.
|
30 |
+
|
31 |
+
## ⚡️ Your Superpowers
|
32 |
+
|
33 |
+
ULTRA performs **link prediction** (KG completion): given a query `(head, relation, ?)`, it ranks all nodes in the graph as potential `tails`.
|
34 |
+
|
35 |
+
|
36 |
+
1. Install the dependencies as listed in the Installation instructions on the [GitHub repo](https://github.com/DeepGraphLearning/ULTRA#installation).
|
37 |
+
2. Clone this model repo to find the `UltraLinkPrediction` class in `modeling.py` and load the checkpoint (all the necessary model code is in this model repo as well).
|
38 |
+
|
39 |
+
* Run **zero-shot inference** on any graph:
|
40 |
+
|
41 |
+
```python
|
42 |
+
from modeling import UltraLinkPrediction
|
43 |
+
from ultra.datasets import CoDExSmall
|
44 |
+
from ultra.eval import test
|
45 |
+
model = UltraLinkPrediction.from_pretrained("mgalkin/ultra_3g")
|
46 |
+
dataset = CoDExSmall(root="./datasets/")
|
47 |
+
test(model, mode="test", dataset=dataset, gpus=None)
|
48 |
+
# Expected results for ULTRA 4g
|
49 |
+
# mrr: 0.464
|
50 |
+
# hits@10: 0.666
|
51 |
+
```
|
52 |
+
|
53 |
+
* You can also **fine-tune** ULTRA on each graph, please refer to the [github repo](https://github.com/DeepGraphLearning/ULTRA#run-inference-and-fine-tuning) for more details on training / fine-tuning
|
54 |
+
* The model code contains 57 different KGs, please refer to the [github repo](https://github.com/DeepGraphLearning/ULTRA#datasets) for more details on what's available.
|
55 |
+
|
56 |
+
## Performance
|
57 |
+
|
58 |
+
**Averaged zero-shot performance of ultra-3g and ultra-4g**
|
59 |
+
<table>
|
60 |
+
<tr>
|
61 |
+
<th rowspan=2 align="center">Model </th>
|
62 |
+
<th colspan=2 align="center">Inductive (e) (18 graphs)</th>
|
63 |
+
<th colspan=2 align="center">Inductive (e,r) (23 graphs)</th>
|
64 |
+
<th colspan=2 align="center">Transductive (16 graphs)</th>
|
65 |
+
</tr>
|
66 |
+
<tr>
|
67 |
+
<th align="center"> Avg MRR</th>
|
68 |
+
<th align="center"> Avg Hits@10</th>
|
69 |
+
<th align="center"> Avg MRR</th>
|
70 |
+
<th align="center"> Avg Hits@10</th>
|
71 |
+
<th align="center"> Avg MRR</th>
|
72 |
+
<th align="center"> Avg Hits@10</th>
|
73 |
+
</tr>
|
74 |
+
<tr>
|
75 |
+
<th>ULTRA (3g) PyG</th>
|
76 |
+
<td align="center">0.420</td>
|
77 |
+
<td align="center">0.562</td>
|
78 |
+
<td align="center">0.344</td>
|
79 |
+
<td align="center">0.511</td>
|
80 |
+
<td align="center">0.329</td>
|
81 |
+
<td align="center">0.479</td>
|
82 |
+
</tr>
|
83 |
+
<tr>
|
84 |
+
<th>ULTRA (4g) PyG</th>
|
85 |
+
<td align="center">0.444</td>
|
86 |
+
<td align="center">0.588</td>
|
87 |
+
<td align="center">0.344</td>
|
88 |
+
<td align="center">0.513</td>
|
89 |
+
<td align="center">WIP</td>
|
90 |
+
<td align="center">WIP</td>
|
91 |
+
</tr>
|
92 |
+
<tr>
|
93 |
+
<th>ULTRA (50g) PyG (pre-trained on 50 KGs)</th>
|
94 |
+
<td align="center">0.444</td>
|
95 |
+
<td align="center">0.580</td>
|
96 |
+
<td align="center">0.395</td>
|
97 |
+
<td align="center">0.554</td>
|
98 |
+
<td align="center">0.389</td>
|
99 |
+
<td align="center">0.549</td>
|
100 |
+
</tr>
|
101 |
+
</table>
|
102 |
+
Fine-tuning ULTRA on specific graphs brings, on average, further 10% relative performance boost both in MRR and Hits@10. See the paper for more comparisons.
|
103 |
+
|
104 |
+
**ULTRA 50g Performance**
|
105 |
+
|
106 |
+
ULTRA 50g was pre-trained on 50 graphs, so we can't really apply the zero-shot evaluation protocol to the graphs.
|
107 |
+
However, we can compare with Supervised SOTA models trained from scratch on each dataset:
|
108 |
+
|
109 |
+
| Model | Avg MRR, Transductive graphs (16)| Avg Hits@10, Transductive graphs (16)|
|
110 |
+
| ----- | ---------------------------------| -------------------------------------|
|
111 |
+
| Supervised SOTA models | 0.371 | 0.511 |
|
112 |
+
| ULTRA 50g (single model) | **0.389** | **0.549** |
|
113 |
+
|
114 |
+
That is, instead of training a big KG embedding model on your graph, you might want to consider running ULTRA (any of the checkpoints) as its performance might already be higher 🚀
|
115 |
+
|
116 |
+
## Useful links
|
117 |
+
|
118 |
+
Please report the issues in the [official GitHub repo of ULTRA](https://github.com/DeepGraphLearning/ULTRA)
|