Docker Stacks

This project is a collection of pre-configured Docker images that provide ready-to-run environments for interactive computing and data science. It functions as a scientific computing stack and a polyglot notebook server, bundling language interpreters and libraries for Python, R, and Julia within a containerized system to ensure reproducible research environments.

The collection uses a layered image hierarchy to provide versioned software dependencies and support for hardware acceleration across different CPU architectures. It allows for the creation of custom images based on a foundation of pre-configured tools, supporting both single-machine and distributed data processing.

The project covers a broad capability surface including the deployment of interactive workspaces via centralized hubs, the integration of deep learning frameworks and scientific computing libraries, and the orchestration of distributed workloads through Spark clusters. It also includes utilities for managing volume permissions, user identity synchronization, and the conversion of notebooks to PDF.

Features

Interactive Environment Deployment - Deploys ready-to-run servers and hubs using pre-configured images containing data science tools.

Interactive Computing Environments - Deploys ready-to-use Docker images that launch interactive Jupyter servers for data analysis and coding.

Polyglot Data Science Environments - Bundles Python, R, and Julia interpreters into a single image for multi-language data science.

Custom Image Builds - Allows users to build project-specific images using a foundation of pre-configured scientific tools.

Big Data Processing - Provides distributed binaries and language support for large-scale data processing with Spark.

Environment Bootstrapping - Provides base images with package managers and entry point scripts to bootstrap custom data science containers.

Interactive Notebooks - Launches interactive notebook servers immediately using pre-configured image stacks.

Containerized Notebook Servers - Ships a containerized polyglot notebook server supporting multiple languages and distributed processing.

Containerized Workspaces - Provides isolated containerized workspaces for team collaboration and interactive computing.

Host-Synced User Identities - Configures internal container user IDs and group settings to match host system permissions for seamless file ownership.

Ready-to-Run Docker Images - Provides ready-to-run Docker images that launch immediate interactive computing workspaces.

Containerized Server Deployments - Starts containerized interactive servers with a choice of available frontends.

Portable Application Launches - Launches servers from pre-configured images to provide an environment without manual installation.

Development Environment Reproducibility - Creates reproducible project environments using version-controlled configurations that distribute images across networks.

Research Environment Reproducibility - Ensures consistent software dependencies and hardware acceleration across platforms for reproducible research.

Runtime Environment Extensions - Supports the creation of inherited images to permanently install specific packages and dependencies.

Scientific Computing - Bundles a comprehensive suite of data science libraries for technical analysis and visualization.

Layered Image Composition - Implements a layered image hierarchy to build specialized data science environments from a minimal base.

Spark Cluster Connectivity - Configures network settings and environment variables to establish stable connections with standalone Spark clusters.

Hardware Acceleration Support - Provides images compatible with multiple CPU architectures and hardware-accelerated variants.

Deep Learning - Provides pre-configured environments integrated with deep learning frameworks and hardware acceleration for neural network workloads.

Image Dependency Hierarchies - Defines dependencies between images and maps tags to appropriate images within a build chain.

Distributed Data Processing - Integrates Spark clusters and distributed binaries into containers for large-scale data processing.

Docker Volume Persistence - Uses Docker volume mounting to ensure user files and notebooks persist across container restarts.

Persistent Volume Mapping - Uses bind mounts and volume mapping to decouple user notebooks and data from the ephemeral container.

Interactive Frontend Configurations - Allows specification of which interactive interface or startup command to use upon container execution.

Language-Specific Package Managers - Enables the installation of libraries into the default environment using language-specific package managers.

Server Orchestrators - Orchestrates the launch of interactive servers with specific frontends via startup scripts.

Package Dependency Management - Provides mechanisms to declare external packages via configuration to add required libraries to a session.

Runtime Version Customization - Enables building environments with specific software versions to meet project compatibility requirements.

Config-Driven Image Building - Uses configuration files to standardize the generation of multiple container images from a single source.

Global Server Settings - Passes custom arguments to startup scripts to modify server behavior and authentication.

Multi-User Hub Configurations - Configures centralized systems like JupyterHub to manage and launch user-specific container environments.

Container Entrypoint Execution - Provides startup scripts that execute at container launch to configure user identities and server settings.

Multi-Architecture Images - Produces image manifests tailored for different CPU architectures and hardware-accelerated computing.

Custom Container and VM Image Creation - Standardizes the production of custom image stacks using templates and orchestration tools.

Volume Management - Updates ownership and group settings of mounted volumes at runtime to ensure correct file access.

User Permission Mapping - Aligns internal container user and group IDs with the host system to ensure seamless file ownership.

Distributed Compute Environments - Supports executing applications in both single-machine and distributed modes using specialized image stacks.

Container Startup Hooks - Executes shell scripts or binaries during the container startup sequence to customize the environment.

Secure Connection Handlers - Supports mounting certificates and keys into containers to enable encrypted HTTPS traffic.

Multi-User Hosting Environments - Provides multi-user hosting environments for groups to share interactive computing tools.

jupyterdocker-stacks

Features

Star history