Step 1. Add the JitPack repository to your build file
Add it in your root settings.gradle at the end of repositories:
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
mavenCentral()
maven { url 'https://jitpack.io' }
}
}
Add it in your settings.gradle.kts at the end of repositories:
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
mavenCentral()
maven { url = uri("https://jitpack.io") }
}
}
Add to pom.xml
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
Add it in your build.sbt at the end of resolvers:
resolvers += "jitpack" at "https://jitpack.io"
Add it in your project.clj at the end of repositories:
:repositories [["jitpack" "https://jitpack.io"]]
Step 2. Add the dependency
dependencies {
implementation 'com.github.linguatools:disco:v3.0.0'
}
dependencies {
implementation("com.github.linguatools:disco:v3.0.0")
}
<dependency>
<groupId>com.github.linguatools</groupId>
<artifactId>disco</artifactId>
<version>v3.0.0</version>
</dependency>
libraryDependencies += "com.github.linguatools" % "disco" % "v3.0.0"
:dependencies [[com.github.linguatools/disco "v3.0.0"]]
Java API for word embeddings
This is the source code repository for the linguatools DISCO API. For more information on DISCO visit http://www.linguatools.de/disco/disco_en.html.
Download the source code by cloning this repository:
git clone git@github.com:linguatools/disco.git
Go into the repository folder and build the executable jar with dependencies:
cd disco/
./gradlew shadowJar
For instructions on command line usage call DISCO API without any parameters:
java -jar build/libs/disco-3.0.0-all.jar
or consult the web page.
Download a fastText vector file in text format and unpack it:
wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.de.300.vec.gz
gunzip cc.de.300.vec.gz
Download DISCO Builder:
wget http://www.linguatools.de/disco/DISCOBuilder-1.1.1.tar.bz2
tar jxf DISCOBuilder-1.1.1.tar.bz2
Convert the vector file into a DISCO DenseMatrix:
java -Xmx8g -cp DISCOBuilder-1.1.1/DISCOBuilder-1.1.1-all.jar de.linguatools.disco.builder.Import -in cc.de.300.vec -out cc.de.300.col.denseMatrix -wsType COL
Query the new DISCO word space from the command line with the DISCO API:
java -Xmx4g -jar ~/repos-linguatools/disco/build/libs/disco-3.0.0-all.jar cc.de.300.col.denseMatrix/cc.de.300-COL.denseMatrix -s Haus Wohnung COSINE
0.64413786
To include DISCO in your Maven or Gradle project see below or visit the DISCO page on JitPack.
Add this to your build.gradle
file:
repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
compile 'com.github.linguatools:disco:v3.0.0'
}
Add this to your pom.xml
file:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependency>
<groupId>com.github.linguatools</groupId>
<artifactId>disco</artifactId>
<version>v3.0.0</version>
</dependency>
DISCO disco = DISCO.load("cc.de.300-COL.denseMatrix");
float sim = disco.semanticSimilarity("Haus", "Häuschen",
DISCO.getVectorSimilarity(SimilarityMeasure.COSINE));
System.out.println("similarity between 'Haus' and 'Häuschen': "+sim);
// get word vector for "Haus" as map
Map<String,Float> wordVectorHaus = disco.getWordvector("Haus");
// get word embedding for "Haus" as float array
float[] wordEmbeddingHaus = ((DenseMatrix) disco).getWordEmbedding("Haus");
// solve analogy x is to "Frau" as "König" is to "Mann"
List<ReturnDataCol> result = Compositionality.solveAnalogy("Frau", "König", "Mann", disco);