# Tree LSTM Three papers: - Kai Sheng Tai, Richard Socher, and Christopher D. Manning. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. ACL 2015 - Child-Sum Tree-LSTM - N-ary Tree-LSTM - Phong Le and Willem Zuidema. Compositional distributional semantics with long short term memory. "SEM 2015 . - Xiaodan Zhu, Parinaz Sobihani, and Hongyu Guo. Long short-term memory over recursive structures. ICML 2015 . Gates and memory cell updates are dependent on the states of a node's children, rather than on the states of the previous words. Instead of a single forget gate, Tree-LSTM unit contains one forget gate for each child to selectively incorporate information from each child. ![[tree-lstm.jpg]] ## Child-Sum Tree-LSTM - sum over the hidden representations of all children of a node (no children order) - can be used for a variable number of children - shares parameters between children - suitable for dependency trees $ \begin{aligned} \tilde{h}_{j} &=\sum_{k \in C(j)} h_{k} \\ f_{j k} &=\sigma\left(U^{f} h_{k}+W^{f} x_{j}\right) \\ i_{j} &=\sigma\left(U^{i} \tilde{h}_{j}+W^{i} x_{j}\right) \\ o_{j} &=\sigma\left(U^{o} \tilde{h}_{j}+W^{o} x_{j}\right) \\ \tilde{c}_{j} &=\tanh \left(U^{c} \tilde{h}_{j}+W^{c} x_{j}\right) \\ c_{j} &=i_{j} \odot \tilde{c}_{j}+\sum_{k \in C(j)} f_{j k} \odot \tilde{c}_{k} \\ h_{j} &=o_{j} \odot \tanh \left(c_{j}\right) \end{aligned} $ ## N-ary Tree-LSTM - discriminates between children node positions (weighted sum) - fixed maximum branching factor: can be used with N children at most - different parameters for each child - suitable for constituency trees $ \begin{array}{l} f_{j k}=\sigma\left(W^{f} x_{j}+\sum_{l=1}^{N} U_{k l}^{f} h_{j l}\right) \\ i_{j}=\sigma\left(W^{i} x_{j}+\sum_{l=1}^{N} U_{l}^{i} h_{j l}\right) \\ o_{j}=\sigma\left(W^{o} x_{j}+\sum_{l=1}^{N} U_{l}^{o} h_{j l}\right) \\ \tilde{c}_{j}=\tanh \left(W^{c} x_{j}+\sum_{l=1}^{N} U_{l}^{c} h_{j l}\right) \\ c_{j}=i_{j} \odot \tilde{c}_{j}+\sum_{l=1}^{N} f_{j l} \odot c_{l} \\ h_{j}=o_{j} \odot \tanh \left(c_{j}\right) \end{array} $ --- ## References