Github repository: github.com/JHart96/keras_elmo_embedding_layer.
Pre-trained word embeddings have been highly successful in numerous natural language processing tasks. Traditional word embeddings are essentially lookup tables mapping a word to its corresponding feature vector. As a result, the context of a word is not taken into account when generating its embedding. Allen AI recently released a paper introducing a method to overcome this limitation: Embeddings from Language Models (ELMo). The paper can be found here: arxiv.org/abs/1802.05365.
ELMo embeddings work by first training a bidirectional LSTM on a large corpus for a general language task. The original paper used the 1 billion word benchmark dataset, but Allen AI have since released a version trained on 5.5 billion words. Once trained, the word embeddings are generated by feeding sentences into the bidirectional LSTM, and taking the internal states of the layers. Embeddings can be taken from the internal states of any one of the 3 layers, or they can be taken as a weighted sum of all 3 layers. These weights can be learned for task-specific training.
By taking the internal states of the network, we get a context-dependent representation of the word. ELMo embeddings work extremely well in practice. In their evaluation of 6 common NLP tasks, Allen AI found that the addition of ELMo embeddings to existing models led to significant improvements over state-of-the-art in every task.
I’ve written a Keras layer that makes it easy to include ELMo embeddings in any existing Keras model. The layer is based on a Tensorflow Hub module (tensorflow.org/hub/modules/google/elmo/2), but provides an interface, making it completely interchangeable with a standard Keras embedding layer. The layer can output either a mean-pooled embedding of the sentence, or an embedding of each word.
Full documentation and code for the layer is available in the repository on Github. Below, I’ll give a few examples of how it works. These examples are modified from the Keras examples directory (github.com/keras-team/keras/tree/master/examples).
Example: Sentiment Analysis I
To show how the layer works in practice, below is an example of a model for sentiment analysis on the IMDB dataset built into Keras. This first example uses sentence-level embeddings, which are a mean pooling of the word-level embeddings, this mode is called “default”.
Example: Sentiment Analysis II
If you’re looking for word-level embeddings, here is an example using ELMo embeddings as the input to a convolutional neural network for the same task as above.
ELMo embeddings are significantly more computationally expensive than traditional embeddings, but their ability to easily boost performance on NLP tasks makes them useful to have in the toolkit. I hope you found this post useful, be sure to check out the repository over on Github: github.com/JHart96/keras_elmo_embedding_layer.