This project is a collection of utilities designed for machine learning experiment tracking, data versioning, and the observability of large language model applications. It provides a client for recording hyperparameters and metrics during training to visualize performance trends and compare different model versions.
The tool includes a model evaluation framework that uses custom scorers and automated judges to assess the quality of generated text outputs. It also provides observability tools to monitor and debug the execution flow and runtime behavior of language model applications.
The system manages the broader machine learning lifecycle, covering the process of training, fine-tuning, and deploying models. This includes tracking dataset changes across iterations to maintain data lineage and providing the infrastructure to host experiment tracking platforms on cloud or private environments.