# Neural end-to-end coreference resolution
Neural network approach to [[Coreference Resolution]].
Lee et al. 2017 . End-to-end Neural Coreference Resolution. EMNLP.
- Mention-ranking paradigm, i.e. output a probability distribution over candidate mentions
- considers all text spans of certain length (e.g. bigrams, trigrams) as possible mentions
- coreference of all mentions considered (not only pronouns)
- end-to-end trainable neural architecture, based on an LSTM sentence encoder
Assign each span $i$ an antecedent $y_{i}$
- out of all possible spans in $Y_{i}=\{1, \ldots, i-1, \epsilon\}$
- empty token $\epsilon$ is included to indicate the span $i$ is non-referential or discourse-new
To do this, for each pair of spans $i$ and $j$
- the model assigns a score $s(i, j)$ for their coreference link
- and computes a distribution $P\left(y_{i}\right)$ over the antecedents of $i$
$
P\left(y_{i}\right)=\frac{e^{s\left(i, y_{i}\right)}}{\sum_{y^{\prime} \in Y(i)} e^{s\left(i, y^{\prime}\right)}}
$
The score $s(i, j)$ includes three factors:
- $m(i)$ : whether span $i$ is a mention
- $m(j)$ : whether span $j$ is a mention
- $c(i, j)$ : whether $j$ is the antecedent of $i$
$
s(i, j)=m(i)+m(j)+c(i, j)
$
$s(i, \epsilon)$ is set to $0,$ i.e. the model predicts the antecedent with the highest positive score or abstains.
Compute $m(i), m(j)$ and $c(i, j)$ based on the vectors $g_{i}$ and $g_{j}$ which represent the spans $i$ and $j$
span representations are constructed from hidden states of the LSTM encoder:
$
g_{i}=\left[h_{\mathcal{S T A R T}(i)}, h_{E N D(i)}, h_{A T(i)}, \phi(i)\right]
$
where $\phi(i)$ is a single feature: the length of the span
$
\begin{array}{c}
m(i)=w_{m} \cdot \mathrm{FFNN}_{m}\left(g_{i}\right) \\
c(i, j)=w_{c} \cdot \mathrm{FFNN}_{c}\left(\left[g_{i}, g_{j}, g_{i} \odot g_{j}, \phi(i, j)\right]\right)
\end{array}
$
$\phi(i, j)-$ distance between the spans in text
![[neural-coreference-resolution.jpg]]
---
## References