The Extended Dependency Based Skip-gram is a method for training word embeddings using structural information from dependency graphs as described in [1]. In addition to standard word embeddings, it produces embeddings of dependency context features (e.g. det_a, compound_programming, compound_inv_language) which were found to be useful features in several sentence classification tasks. Please cite the paper if you use the code or the pre-trained embeddings.
Code for training extended dependency skip-gram embeddings: ext_vec.tar.gz
Download embeddings trained on Wikipedia
August 2015 dump: wiki_extvec.gz
(1.56 GB)
Includes embeddings of words and dependency contexts appearing
more than 100 times in the corpus.
The dependency types used are Universal Dependencies.
Inverse relations are encoded with the string "_inv_"
between the dependency type and the word.
Training corpus size: ~2 billion words
Word vocabulary size: 222,496
Dependency context vocabulary size: 1,253,524
Embedding dimensionality: 300