title = "
In this step, the goal is to identify which tokens in the generated text were influenced by the preceding context.
First, a context-aware generation is produced using the model's inputs augmented with available context. Then, the same generation is force-decoded using the contextless inputs. During both processes, a contrastive metric (KL-divergence is used as default for the Context sensitivity metric
parameter) are collected for every generated token. Intuitively, higher metric scores indicate that the current generation step was more influenced by the presence of context.
The generated tokens are ranked according to their metric scores, and the most salient tokens are selected for the next step (This demo provides a Context sensitivity threshold
parameter to select tokens above N
standard deviations from the in-example metric average, and Context sensitivity top-k
to pick the K most salient tokens.)
In the example shown in the figure, elle
is selected as the only context-sensitive token by the procedure.
Once context-sensitive tokens are identified, the next step is to link every one of these tokens to specific contextual cues that justified its prediction.
This is achieved by means of contrastive feature attribution (Yin and Neubig, 2022). More specifically, for a given context-sensitive token, a contrastive alternative to it is generated in absence of input context, and a function of the probabilities of the pair is used to identify salient parts of the context (By default, in this demo we use saliency
, i.e. raw gradients, for the Attribution method
and contrast_prob_diff
, i.e. the probability difference between the two options, for the Attributed function
).
Gradients are collected and aggregated to obtain a single score per context token, which is then used to rank the tokens and select the most influential ones (This demo provides a Attribution threshold
parameter to select tokens above N
standard deviations from the in-example metric average, and Attribution top-k
to pick the K most salient tokens.)
In the example shown in the figure, the attribution process links elle
to dishes
and assiettes
in the source and target contexts, respectively. This makes sense intuitively, as they
in the original input is gender-neutral in English, and the presence of its gendered coreferent disambiguates the choice for the French pronoun in the translation.
This demo provides a convenient UI for the Inseq implementation of PECoRe (the inseq attribute-context
CLI command).
In the demo tab, fill in the input and context fields with the text you want to analyze, and click the Run PECoRe
button to produce an output where the tokens selected by PECoRe in the model generation and context are highlighted. For more details on the parameters and their meaning, check the Parameters
tab.
Consider the following example, showing inputs and outputs of the CORA Multilingual QA model provided as default in the interface, using default settings.
The PECoRe CTI step identified two context-sensitive tokens in the generation (287
and ,
), while the CCI step associated each of those with the most influential tokens in the context. It can be observed that in both cases the matching tokens stating the number of inhabitants are identified as salient (,
and 287
for the generated 287
, while 235
is also found salient for the generated ,
). In this case, the influential context found by PECoRe is lexically equal to the generated output, but in principle better LMs might not use their inputs verbatim, hence the interest for using model internals with PECoRe.
"Why wasn't 235
found as context-sensitive, when it intuitively is?" you might ask. In this case, it's due to the generation being quite short, which makes its CTI score less salient than those of other tokens. The permissivness of result selection is an adjustable parameter (see points below).
📂 Download output
button allows you to download the full JSON output produced by the Inseq CLI. It includes, among other things, the full set of CTI and CCI scores produced by PECoRe, tokenized versions of the input context and generated output and the full arguments used for the CLI call.🔍 Download HTML
button allows you to download an HTML view of the output similar to the one visualized in the demo.
Context sensitivity threshold
parameter to ensure only very sensitive tokens are picked up in longer replies.Attribution threshold
parameter to be more lenient in the selection for shorter contexts.Generation context
parameter. This is a requirement for the demo because the splitting between output context and current cannot be reliably performed in an automatic way. However, the inseq attribute-context
CLI command actually support various strategies, including prompting users for a split and/or trying an automatic source-target alignment. This demo is useful for testing out various models and methods for PECoRe attribution, but the inseq attribute-context
CLI command is the way to go if you want to run experiments on several examples, or if you want to exploit the full customizability of the Inseq API.
The snippets provided below are updated based on the current parameter configuration of the demo, and allow you to use Python and Shell code to call the Inseq CLI. We recommend using the Python version for repeated evaluation, since it allows for model-preloading.
""" faq = """Q: Why should I use PECoRe rather than lexical/semantic matching, NLI or citation prompting for attributing model generation?
A: The main difference concerns faithfulness: all these techniques rely on different forms of surface-level matching to produce plausible citations, but do not guarantee that the model is actually using such information during generation. PECoRe does guarantee a variable degree of faithfulness to model inner workings, depending on the CTI/CCI metrics used.
Q: Can PECoRe be used for my task?
A: PECoRe is designed to be task-agnostic, and can be used with any generative language model for tasks where a division where a contextual component can clearly be identified in the input (e.g. retrieved paragraphs in RAG) or the output (e.g. reasoning steps in chain-of-thought prompting). The current Inseq implementation supports only text as a modality, but conceptually the PECoRe framework can easily be extended to attribute multimodal context components.
Q: What are the main limitations of PECoRe?
A: PECoRe is limited by the need for a present/absent context (either in the input or in the output) for contrastive comparison, and by the choice of parameters (especially results selection ones) that can require specific tuning for different models and tasks.
Q: Why is it important to separate {context}
and {current}
tags from other tokens with whitespace in input/output templates?
A: Taking the default CORA template <Q>: {current} <P>: {context}
as an example, the whitespace after :
for both tags serves the purpose of ensuring that, when tokenized in isolation, the same token will be used in both cases. If this wasn't present, you might end up having e.g. Test
for the full tokenization (as no whitespace precedes it) and ▁Test
for the partial one (as initial tokens are always prefixed with ▁
in SentencePiece). This might succeed but produce unexpected results if both option are tokenized with the same number of tokens, or fail altogether if the number of tokens for the space-prefixed and the spaceless version differs. Note that this is not necessary if the template includes simply the tag itself (e.g. {current}
)