scalanlp/nak


The Nak Machine Learning Library

Download


Step 1. Add the JitPack repository to your build file

Add it in your root settings.gradle at the end of repositories:

	dependencyResolutionManagement {
		repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
		repositories {
			mavenCentral()
			maven { url 'https://jitpack.io' }
		}
	}

Add it in your settings.gradle.kts at the end of repositories:

	dependencyResolutionManagement {
		repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
		repositories {
			mavenCentral()
			maven { url = uri("https://jitpack.io") }
		}
	}

Add to pom.xml

	<repositories>
		<repository>
		    <id>jitpack.io</id>
		    <url>https://jitpack.io</url>
		</repository>
	</repositories>

Add it in your build.sbt at the end of resolvers:

 
    resolvers += "jitpack" at "https://jitpack.io"
        
    

Add it in your project.clj at the end of repositories:

 
    :repositories [["jitpack" "https://jitpack.io"]]
        
    

Step 2. Add the dependency

	dependencies {
		implementation 'com.github.scalanlp:nak:v1.2.1'
	}
	dependencies {
		implementation("com.github.scalanlp:nak:v1.2.1")
	}
	<dependency>
	    <groupId>com.github.scalanlp</groupId>
	    <artifactId>nak</artifactId>
	    <version>v1.2.1</version>
	</dependency>

                            
    libraryDependencies += "com.github.scalanlp" % "nak" % "v1.2.1"
        
        

                            
    :dependencies [[com.github.scalanlp/nak "v1.2.1"]]
        
        

Readme


Nak Build Status

Nak is a Scala/Java library for machine learning and related tasks, with a focus on having an easy to use API for some standard algorithms. It is formed from Breeze, Liblinear Java, and Scalabha. It is currently undergoing a pretty massive evolution, so be prepared for quite big changes in the API for this and probably several future versions.

We'd love to have some more contributors: if you are interested in helping out, please see the #helpwanted issues or suggest your own ideas.

What's inside

Nak currently provides implementations for k-means clustering and supervised learning with logistic regression and support vector machines. Other models and algorithms that were formerly in [breeze.learn] are now in Nak.

See the Nak wiki for (some preliminary and unfortunately sparse) documentation.

The latest stable release of Nak is 1.2.1. Changes from the previous release include:

  • breeze-learn pulled into Nak
  • K-means from breeze-learn and Nak merged.
  • Added locality sensitive hashing

See the CHANGELOG for changes in previous versions.

Using Nak

In SBT:

libraryDependencies += "org.scalanlp" % "nak" % "1.2.1"

In Maven:

<dependency>
   <groupId>org.scalanlp</groupId>
   <artifactId>nak</artifactId>
   <version>1.2.1</version>
</dependency>

Example

Here's an example of how easy it is to train and evaluate a text classifier using Nak. See TwentyNewsGroups.scala for more details.

def main(args: Array[String]) {
  val newsgroupsDir = new File(args(0))
  implicit val isoCodec = scala.io.Codec("ISO-8859-1")
  val stopwords = Set("the","a","an","of","in","for","by","on")

  val trainDir = new File(newsgroupsDir, "20news-bydate-train")
  val trainingExamples = fromLabeledDirs(trainDir).toList
  val config = LiblinearConfig(cost=5.0)
  val featurizer = new BowFeaturizer(stopwords)
  val classifier = trainClassifier(config, featurizer, trainingExamples)

  val evalDir = new File(newsgroupsDir, "20news-bydate-test")
  val maxLabelNews = maxLabel(classifier.labels) _
  val comparisons = for (ex <- fromLabeledDirs(evalDir).toList) yield
    (ex.label, maxLabelNews(classifier.evalRaw(ex.features)), ex.features)
  val (goldLabels, predictions, inputs) = comparisons.unzip3
  println(ConfusionMatrix(goldLabels, predictions, inputs))
}

Questions or suggestions?

Post a message to the scalanlp-discuss mailing list or create an issue.