Step 1. Add the JitPack repository to your build file
Add it in your root settings.gradle at the end of repositories:
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
mavenCentral()
maven { url 'https://jitpack.io' }
}
}
Add it in your settings.gradle.kts at the end of repositories:
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
mavenCentral()
maven { url = uri("https://jitpack.io") }
}
}
Add to pom.xml
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
Add it in your build.sbt at the end of resolvers:
resolvers += "jitpack" at "https://jitpack.io"
Add it in your project.clj at the end of repositories:
:repositories [["jitpack" "https://jitpack.io"]]
Step 2. Add the dependency
dependencies {
implementation 'com.github.haifengl:smile:2.6.0'
}
dependencies {
implementation("com.github.haifengl:smile:2.6.0")
}
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile</artifactId>
<version>2.6.0</version>
</dependency>
libraryDependencies += "com.github.haifengl" % "smile" % "2.6.0"
:dependencies [[com.github.haifengl/smile "2.6.0"]]
SMILE (Statistical Machine Intelligence & Learning Engine) is a comprehensive, high-performance machine learning framework for the JVM. SMILE v5+ requires Java 25; v4.x requires Java 21; all previous versions require Java 8. SMILE also provides idiomatic APIs for Scala and Kotlin. With advanced data structures and algorithms, SMILE delivers state-of-the-art performance across every aspect of machine learning.
| Area | Highlights | |---|---| | LLM | LLaMA-3 inference, tiktoken BPE tokenizer, OpenAI-compatible REST server, SSE chat streaming | | Deep Learning | LibTorch/GPU backend, EfficientNet-V2 image classification, custom layer API | | Classification | SVM, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, Logistic Regression, Neural Networks, RBF Networks, MaxEnt, KNN, Naïve Bayes, LDA/QDA/RDA | | Regression | SVR, Gaussian Process, Regression Trees, GBDT, Random Forest, RBF, OLS, LASSO, ElasticNet, Ridge | | Clustering | BIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical, SIB, SOM, Spectral, Min-Entropy | | Manifold Learning | IsoMap, LLE, Laplacian Eigenmap, t-SNE, UMAP, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection, ICA | | Feature Engineering | Genetic Algorithm selection, Ensemble selection, TreeSHAP, SNR, Sum-Squares ratio, data transformations, formula API | | NLP | Sentence / word tokenization, Bigram test, Phrase & Keyword extraction, Stemmer, POS tagging, Relevance ranking | | Association Rules | FP-growth frequent itemset mining | | Sequence Learning | Hidden Markov Model, Conditional Random Field | | Nearest Neighbor | BK-Tree, Cover Tree, KD-Tree, SimHash, LSH | | Numerical Methods | Linear algebra, numerical optimization (BFGS, L-BFGS), interpolation, wavelets, RBF, distributions, hypothesis tests | | Visualization | Swing plots (scatter, line, bar, box, histogram, surface, heatmap, contour, …) and declarative Vega-Lite charts |
Each module has its own detailed user guide. Click the README link for the module overview, or drill into individual topic guides.
base/ — FoundationData structures, math, linear algebra, statistical utilities, I/O
| Document | Topics | |---|---| | README | Module overview and dependency setup | | DATA_FRAME.md | DataFrame API — creation, selection, transformation | | DATA_IO.md | CSV, JSON, Parquet, Arrow, JDBC, Avro readers/writers | | DATA_TRANSFORMATION.md | Scalers, encoders, imputers, feature transforms | | DATASET.md | Built-in benchmark and real-world datasets | | FORMULA.md | R-style formula language for model matrices | | DISTRIBUTIONS.md | Probability distributions (Normal, Poisson, Beta, …) | | HYPOTHESIS_TESTING.md | t-test, chi-squared, ANOVA, KS-test, … | | DISTANCES.md | Euclidean, Mahalanobis, Hamming, edit distance, … | | NEAREST_NEIGHBOR.md | KD-Tree, Cover Tree, BK-Tree, LSH | | KERNELS.md | Gaussian, polynomial, Laplacian, and other kernel functions | | RBF.md | Radial basis function networks | | INTERPOLATION.md | Linear, cubic spline, bilinear, bicubic | | GRAPH.md | Adjacency list/matrix graph, BFS/DFS, spanning trees | | SORT.md | Quick sort, heap sort, counting sort, index sort | | HASH.md | Locality-sensitive hashing, SimHash | | RNG.md | Random number generators, sampling, permutations | | BFGS.md | L-BFGS and BFGS numerical optimizers | | ICA.md | Independent Component Analysis | | TENSOR.md | N-dimensional array (CPU tensor without LibTorch) | | WAVELET.md | DWT, CWT, and wavelet families | | GAP.md | GAP statistic for optimal cluster count estimation | | COMPRESSED_SENSING.md | Compressed sensing and basis pursuit |
core/ — Machine Learning AlgorithmsClassification, regression, clustering, manifold learning, and more
| Document | Topics | |---|---| | README | Module overview | | CLASSIFICATION.md | SVM, Random Forest, AdaBoost, GBDT, KNN, Naïve Bayes, LDA, … | | REGRESSION.md | SVR, Gaussian Process, LASSO, Ridge, ElasticNet, GBDT, … | | CLUSTERING.md | K-Means, DBSCAN, BIRCH, SOM, Spectral Clustering, … | | FEATURE_ENGINEERING.md | Feature selection, PCA, ICA, projection, encoding | | MANIFOLD.md | t-SNE, UMAP, IsoMap, LLE, Laplacian Eigenmap | | ANOMALY_DETECTION.md | IsolationForest, one-class SVM, local outlier factor | | ASSOCIATION_RULE_MINING.md | FP-growth, association rules, frequent itemsets | | SEQUENCE.md | HMM (Baum-Welch, Viterbi), CRF | | TIME_SERIES.md | ARIMA, box-plots, autocorrelation | | REGRESSION.md | Full regression API reference | | TRAINING.md | Cross-validation, bootstrap, hyper-parameter search | | VALIDATION.md | Hold-out, k-fold, leave-one-out evaluation | | VALIDATION_METRICS.md | Accuracy, AUC, F1, RMSE, MAE, confusion matrix | | HYPER_PARAMETER_OPTIMIZATION.md | Grid search, random search, Bayesian optimization | | VECTOR_QUANTIZATION.md | LVQ, Neural Gas, SOM as vector quantizers | | ONNX.md | Exporting and importing models via ONNX |
deep/ — Deep Learning & LLMsLibTorch-backed GPU/CPU tensor operations, neural network layers, LLaMA-3 inference, EfficientNet
| Document | Topics | |---|---| | README | Full deep-learning & LLM user guide (tensors, layers, loss, optimizer, EfficientNet, LLaMA) |
The deep/README.md covers:
smile.deep.tensor — Tensor factory, indexing, arithmetic, AutoScope memory management, dtype/devicesmile.deep.layer — Linear, Conv2d, pooling, normalization (BN/GN/RMS), dropout, embedding, sequential blockssmile.deep.activation — ReLU, GELU, SiLU, Tanh, Sigmoid, Softmax, GLU, HardShrink, …smile.deep.Loss — MSE, cross-entropy, BCE, Huber, KL, hinge, and moresmile.deep.Optimizer — SGD, Adam, AdamW, RMSpropsmile.deep.Model — Abstract base class + training loopsmile.deep.metric — Accuracy, Precision, Recall, F1Score with macro/micro/weighted averagingsmile.llm — Message, Role, FinishReason, ChatCompletion records; sinusoidal & RoPE positional encodingssmile.llm.tokenizer — Tokenizer interface, Tiktoken BPE implementation (LLaMA-3 compatible)smile.llm.llama — Full LLaMA-3 stack: Llama.build(), generate(), chat(), streaming via SubmissionPublishersmile.vision — VisionModel, ImageDataset, EfficientNet.V2S/M/L() pretrained models, ImageNet labelssmile.vision.transform — Transform interface, ImageClassification pipeline, resize/crop/toTensor helpersnlp/ — Natural Language ProcessingText normalization, tokenization, POS tagging, stemming, relevance ranking
| Document | Topics | |---|---| | README | Module overview | | TOKENIZER.md | Sentence splitter, word tokenizer, regex tokenizer | | POS.md | Part-of-speech tagging (Brill tagger, HMM tagger) | | STEM.md | Porter, Lancaster, Lovins stemmers; lemmatization | | COLLOCATION.md | Bigram/trigram statistical tests, phrase extraction | | RELEVANCE.md | TF-IDF, BM25, keyword extraction | | TAXONOMY.md | WordNet integration, synsets, hypernyms |
plot/ — Data VisualizationSwing-based interactive plots and declarative Vega-Lite charts
| Document | Topics |
|---|---|
| README | Swing plotting API — scatter, line, bar, box, histogram, heatmap, surface, contour, wireframe |
| VEGA.md | Declarative smile.plot.vega (Vega-Lite) — JSON spec generation, web/Jupyter rendering |
serve/ — Inference ServerQuarkus-based REST inference service with OpenAI-compatible API and SSE streaming
| Document | Topics |
|---|---|
| README | Building and running the server, /chat/completions endpoint, SSE streaming, configuration |
studio/ — Interactive Shell & Desktop IDEREPL / notebook environment for Java, Scala, and Kotlin
| Document | Topics |
|-------------------------------|---|
| README.md | Desktop Studio notebook UI, cell types, output rendering |
| CLI | CLI entry points (smile, smile shell, smile scala, smile kotlin, smile server) |
scala/ — Scala APIIdiomatic Scala shim — concise wrappers, symbolic operators, Scala collections integration
| Document | Topics |
|---|---|
| README | API overview, smile.classification, smile.regression, smile.clustering, smile.plot in Scala |
kotlin/ — Kotlin APIIdiomatic Kotlin shim — extension functions, named parameters, builder DSLs
| Document | Topics | |---|---| | README | API overview, extension functions, Kotlin-style builders | | packages.md | Full package-by-package listing of all Kotlin extension functions |
json/ — JSON Library (Scala)Lightweight zero-dependency JSON library for Scala with a clean DSL
| Document | Topics | |---|---| | README | Parsing, building, pattern matching, path navigation, serialization |
spark/ — Apache Spark IntegrationUse SMILE models inside Spark ML pipelines
| Document | Topics |
|---|---|
| README | SmileTransformer, SmileClassifier, SmileRegressor; training and scoring in Spark DataFrames |
<!-- Core ML algorithms -->
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-core</artifactId>
<version>6.1.0</version>
</dependency>
<!-- Deep learning + LLMs (requires LibTorch) -->
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-deep</artifactId>
<version>6.1.0</version>
</dependency>
<!-- Natural language processing -->
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-nlp</artifactId>
<version>6.1.0</version>
</dependency>
<!-- Data visualization -->
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-plot</artifactId>
<version>6.1.0</version>
</dependency>
libraryDependencies += "com.github.haifengl" %% "smile-scala" % "6.1.0"
dependencies {
implementation("com.github.haifengl:smile-kotlin:6.1.0")
}
Several algorithms (manifold learning, Gaussian Process, MLP, some clustering) require BLAS and LAPACK.
Linux (Ubuntu / Debian)
sudo apt update
sudo apt install libopenblas-dev libarpack2-dev
macOS (Homebrew)
brew install arpack
# If macOS SIP strips DYLD_LIBRARY_PATH, copy the dylib to your working dir:
cp /opt/homebrew/lib/libarpack.dylib .
Windows — pre-built DLLs are included in the bin/ directory of the
release package.
Add that directory to PATH.
GPU (CUDA) — make sure the LibTorch CUDA native libraries are on
java.library.path and that your Bytedeco pytorch classifier matches
your CUDA version (e.g., linux-x86_64-gpu-cuda12.4).
import smile.classification.RandomForest;
import smile.data.formula.Formula;
import smile.io.Read;
// Load data
var data = Read.csv("src/test/resources/iris.csv");
// Train a random forest
var forest = RandomForest.fit(Formula.lhs("species"), data);
// Predict
int label = forest.predict(data.get(0));
System.out.println("Predicted class: " + label);
For deep learning and LLM examples, see deep/README.md. For visualization examples, see plot/README.md.
SMILE ships with an interactive desktop Studio (notebook-style) and a set of CLI shells. See studio/README.md for full documentation.
Download a pre-packaged release from the releases page, then:
cd bin
path/to/smile/bin/setup # install required native dependencies
path/to/smile/bin/smile # launch SMILE Studio from your project directory
Other entry points:
| Command | Description |
|-----------------|---|
| smile | Desktop notebook IDE |
| smile shell | Java REPL with all SMILE packages pre-imported |
| smile scala | Scala REPL |
| smile train | Train a supervised learning model |
| smile predict | Predict on a file using a saved model |
| smile serve | Start the LLM inference server |
To increase the JVM heap:
path/to/smile/bin/smile -J-Xmx30G
Most SMILE models implement java.io.Serializable. You can serialize a
trained model to disk and load it in a production environment or inside a
Spark job:
// Save
try (var out = new ObjectOutputStream(new FileOutputStream("model.ser"))) {
out.writeObject(forest);
}
// Load
try (var in = new ObjectInputStream(new FileInputStream("model.ser"))) {
var loaded = (RandomForest) in.readObject();
}
SMILE provides two visualization layers:
smile.plot.swing — Swing-based interactive 2D/3D plots. See plot/README.md.smile.plot.vega — Declarative Vega-Lite charts for browsers and Jupyter. See plot/VEGA.md.<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-plot</artifactId>
<version>6.1.0</version>
</dependency>
SMILE employs a dual license model designed to meet the development and distribution needs of both commercial distributors (OEMs, ISVs, VARs) and open source projects. For details, see LICENSE. To acquire a commercial license, contact smile.sales@outlook.com.
| Channel | Purpose |
|---|---|
| GitHub Discussions | Questions, ideas, show-and-tell |
| Stack Overflow [smile] | Technical Q&A |
| Issue Tracker | Bug reports and feature requests |
| Online Docs | Tutorials and programming guides |
| Java API · Scala API · Kotlin API · Clojure API | API Javadoc |
Please read CONTRIBUTING.md for build and test instructions.