Smile is a comprehensive JVM machine learning library and statistical computing toolkit. It provides a suite of algorithms for classification, regression, and clustering, implemented natively for Java, Scala, and Kotlin. The project also functions as a deep learning framework, a natural language processing library, and an inference engine for large language models.
The library distinguishes itself through GPU acceleration via LibTorch bindings and support for the ONNX model interchange format. It includes specialized capabilities for large language model inference, featuring Byte-Pair Encoding tokenization and an OpenAI-compatible REST API with server-sent event streaming. Additionally, it allows trained models to be wrapped as transformers for integration into Apache Spark pipelines.
The toolkit covers a broad surface of data science capabilities, including linear algebra, numerical optimization, and statistical hypothesis testing. It provides tools for data preprocessing, dimensionality reduction, and signal processing, as well as interactive 2D and 3D visualization. For linguistic analysis, it supports part-of-speech tagging, stemming, and keyword extraction.
The project provides idiomatic JVM language APIs and includes a desktop environment with an interactive shell for exploratory data analysis and model training.