Extended Dependency Based Skip-gram

The Extended Dependency Based Skip-gram is a method for training word embeddings using structural information from dependency graphs as described in [1]. In addition to standard word embeddings, it produces embeddings of dependency context features (e.g. det_a, compound_programming, compound_inv_language) which were found to be useful features in several sentence classification tasks. Please cite the paper if you use the code or the pre-trained embeddings.

Code for training extended dependency skip-gram embeddings: ext_vec.tar.gz

Download embeddings trained on Wikipedia August 2015 dump: wiki_extvec.gz (1.56 GB)
Includes embeddings of words and dependency contexts appearing more than 100 times in the corpus.
The dependency types used are Universal Dependencies.
Inverse relations are encoded with the string "_inv_" between the dependency type and the word.
Training corpus size: ~2 billion words
Word vocabulary size: 222,496
Dependency context vocabulary size: 1,253,524
Embedding dimensionality: 300

[1] Alexandros Komninos and Suresh Manandhar. 2016. Dependency Based Embeddings for Sentence Classification Tasks. In Proceedings of NAACL.[PDF]