Neural end-to-end coreference resolution

# Neural end-to-end coreference resolution Neural network approach to [[Coreference Resolution]]. Lee et al. 2017 . End-to-end Neural Coreference Resolution. EMNLP. - Mention-ranking paradigm, i.e. output a probability distribution over candidate mentions - considers all text spans of certain length (e.g. bigrams, trigrams) as possible mentions - coreference of all mentions considered (not only pronouns) - end-to-end trainable neural architecture, based on an LSTM sentence encoder Assign each span $i$ an antecedent $y_{i}$ - out of all possible spans in $Y_{i}=\{1, \ldots, i-1, \epsilon\}$ - empty token $\epsilon$ is included to indicate the span $i$ is non-referential or discourse-new To do this, for each pair of spans $i$ and $j$ - the model assigns a score $s(i, j)$ for their coreference link - and computes a distribution $P\left(y_{i}\right)$ over the antecedents of $i$ $ P\left(y_{i}\right)=\frac{e^{s\left(i, y_{i}\right)}}{\sum_{y^{\prime} \in Y(i)} e^{s\left(i, y^{\prime}\right)}} $ The score $s(i, j)$ includes three factors: - $m(i)$ : whether span $i$ is a mention - $m(j)$ : whether span $j$ is a mention - $c(i, j)$ : whether $j$ is the antecedent of $i$ $ s(i, j)=m(i)+m(j)+c(i, j) $ $s(i, \epsilon)$ is set to $0,$ i.e. the model predicts the antecedent with the highest positive score or abstains. Compute $m(i), m(j)$ and $c(i, j)$ based on the vectors $g_{i}$ and $g_{j}$ which represent the spans $i$ and $j$ span representations are constructed from hidden states of the LSTM encoder: $ g_{i}=\left[h_{\mathcal{S T A R T}(i)}, h_{E N D(i)}, h_{A T(i)}, \phi(i)\right] $ where $\phi(i)$ is a single feature: the length of the span $ \begin{array}{c} m(i)=w_{m} \cdot \mathrm{FFNN}_{m}\left(g_{i}\right) \\ c(i, j)=w_{c} \cdot \mathrm{FFNN}_{c}\left(\left[g_{i}, g_{j}, g_{i} \odot g_{j}, \phi(i, j)\right]\right) \end{array} $ $\phi(i, j)-$ distance between the spans in text ![[neural-coreference-resolution.jpg]] --- ## References