File size: 1,589 Bytes
9855500
 
 
 
 
 
7f0f462
9855500
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
language: en
tags:
- tapex
license: apache-2.0
datasets:
- tab_fact
inference: false
---

TAPEX-large model fine-tuned on WTQ. This model was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. Original repo can be found [here](https://github.com/microsoft/Table-Pretraining).

To load it and run inference, you can do the following:

```
from transformers import BartTokenizer, BartForSequenceClassification
import pandas as pd

tokenizer = BartTokenizer.from_pretrained("nielsr/tapex-large-finetuned-tabfact")
model = BartForSequenceClassification.from_pretrained("nielsr/tapex-large-finetuned-tabfact")

# create table
data = {'Actors': ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], 'Number of movies': ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)

# turn into dict
table_dict = {"header": list(table.columns), "rows": [list(row.values) for i,row in table.iterrows()]}

# turn into format TAPEX expects
# define the linearizer based on this code: https://github.com/microsoft/Table-Pretraining/blob/main/tapex/processor/table_linearize.py
linearizer = IndexedRowTableLinearize()
linear_table = linearizer.process_table(table_dict)

# add sentence
sentence = "George Clooney has 69 movies"
joint_input = sentence + " " + linear_table

# encode 
encoding = tokenizer(joint_input, return_tensors="pt")

# forward pass
outputs = model(**encoding)

# print prediction
logits = outputs.logits
print(logits.argmax(-1))
```