# Tree LSTM
Three papers:
- Kai Sheng Tai, Richard Socher, and Christopher D. Manning. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. ACL 2015
- Child-Sum Tree-LSTM
- N-ary Tree-LSTM
- Phong Le and Willem Zuidema. Compositional distributional semantics with long short term memory. "SEM 2015 .
- Xiaodan Zhu, Parinaz Sobihani, and Hongyu Guo. Long short-term memory over recursive structures. ICML 2015 .
Gates and memory cell updates are dependent on the states of a node's children, rather than on the states of the previous words.
Instead of a single forget gate, Tree-LSTM unit contains one forget gate for each child to selectively incorporate information from each child.
![[tree-lstm.jpg]]
## Child-Sum Tree-LSTM
- sum over the hidden representations of all children of a node (no children order)
- can be used for a variable number of children
- shares parameters between children
- suitable for dependency trees
$
\begin{aligned}
\tilde{h}_{j} &=\sum_{k \in C(j)} h_{k} \\
f_{j k} &=\sigma\left(U^{f} h_{k}+W^{f} x_{j}\right) \\
i_{j} &=\sigma\left(U^{i} \tilde{h}_{j}+W^{i} x_{j}\right) \\
o_{j} &=\sigma\left(U^{o} \tilde{h}_{j}+W^{o} x_{j}\right) \\
\tilde{c}_{j} &=\tanh \left(U^{c} \tilde{h}_{j}+W^{c} x_{j}\right) \\
c_{j} &=i_{j} \odot \tilde{c}_{j}+\sum_{k \in C(j)} f_{j k} \odot \tilde{c}_{k} \\
h_{j} &=o_{j} \odot \tanh \left(c_{j}\right)
\end{aligned}
$
## N-ary Tree-LSTM
- discriminates between children node positions (weighted sum)
- fixed maximum branching factor: can be used with N children at most
- different parameters for each child
- suitable for constituency trees
$
\begin{array}{l}
f_{j k}=\sigma\left(W^{f} x_{j}+\sum_{l=1}^{N} U_{k l}^{f} h_{j l}\right) \\
i_{j}=\sigma\left(W^{i} x_{j}+\sum_{l=1}^{N} U_{l}^{i} h_{j l}\right) \\
o_{j}=\sigma\left(W^{o} x_{j}+\sum_{l=1}^{N} U_{l}^{o} h_{j l}\right) \\
\tilde{c}_{j}=\tanh \left(W^{c} x_{j}+\sum_{l=1}^{N} U_{l}^{c} h_{j l}\right) \\
c_{j}=i_{j} \odot \tilde{c}_{j}+\sum_{l=1}^{N} f_{j l} \odot c_{l} \\
h_{j}=o_{j} \odot \tanh \left(c_{j}\right)
\end{array}
$
---
## References