# Columnar In-Memory Data Formats

> Search results for `columnar in-memory format for sharing data between tools` on awesome-repositories.com. 115 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/columnar-in-memory-format-for-sharing-data-between-tools

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/columnar-in-memory-format-for-sharing-data-between-tools).**

## Results

- [dask/dask](https://awesome-repositories.com/repository/dask-dask.md) (13,746 ⭐) — Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements.

The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
- [clickhouse/clickhouse](https://awesome-repositories.com/repository/clickhouse-clickhouse.md) (48,229 ⭐) — ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring.

The platform distinguishes itself through ad
- [duckdb/duckdb](https://awesome-repositories.com/repository/duckdb-duckdb.md) (38,805 ⭐) — DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation.

The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adapti
- [citusdata/citus](https://awesome-repositories.com/repository/citusdata-citus.md) (12,562 ⭐) — Citus is a PostgreSQL extension that transforms a standard database into a distributed system. It functions as a sharding framework and distributed SQL engine, enabling horizontal scaling by partitioning tables across a cluster of nodes. By utilizing a coordinator-worker topology, the system manages metadata and routes queries to the appropriate nodes, allowing for parallel execution of complex operations across distributed data shards.

The platform distinguishes itself through its specialized support for multi-tenant architectures and real-time analytical processing. It enables tenant-based
- [lance-format/lance](https://awesome-repositories.com/repository/lance-format-lance.md) (6,699 ⭐) — Lance is a columnar data format and storage layer designed for high-performance random access and the persistence of multimodal data. It functions as a vector database storage system, a multimodal data store, and a versioned dataset manager.

The project distinguishes itself as a hybrid search engine that combines vector similarity search and full-text indexing on a single dataset. It provides unified storage for diverse data types including images, audio, and video, utilizing a system that lazy-loads large binary objects only when requested.

The system manages dataset evolution through schem
- [makflwana/iocs-in-csv-format](https://awesome-repositories.com/repository/makflwana-iocs-in-csv-format.md) (12 ⭐) — The repository contains IOCs in CSV format for APT, Cyber Crimes, Malware and Trojan and whatever I found as part of hunting and research
- [deepfakes/faceswap](https://awesome-repositories.com/repository/deepfakes-faceswap.md) (55,289 ⭐) — Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames.

The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated process
- [anthropics/claude-code](https://awesome-repositories.com/repository/anthropics-claude-code.md) (132,728 ⭐) — Anthropic's terminal-native AI coding agent.
- [facebook/react](https://awesome-repositories.com/repository/facebook-react.md) (245,669 ⭐) — React is a JavaScript library for building user interfaces based on a component-driven architecture and unidirectional data flow.
- [perspective-dev/perspective](https://awesome-repositories.com/repository/perspective-dev-perspective.md) (10,981 ⭐) — Perspective is a columnar data analytics engine and high-performance visualization component powered by WebAssembly. It provides a system for analyzing and visualizing large or streaming datasets through interactive data grids and charts, utilizing a compiled binary to achieve near-native performance within the browser.

The project distinguishes itself through a WebSocket-based data streaming interface and deep Apache Arrow integration, which minimize memory overhead when synchronizing tables between servers and clients. It acts as a remote query proxy capable of translating visualization con
- [rahix/shared-bus](https://awesome-repositories.com/repository/rahix-shared-bus.md) (135 ⭐) — Crate for sharing buses between multiple devices
- [randomfractals/pro-data-tools](https://awesome-repositories.com/repository/randomfractals-pro-data-tools.md) (41 ⭐) — Random Fractals Inc. Data Tools 🛠️ is a collection of public data visualization extensions, data viewers, VS Code Notebook renderers, and code snippets for devs and data scientists using VS Code IDE, published under our Random Fractals Inc. ☂️ org.
- [dragonflydb/dragonfly](https://awesome-repositories.com/repository/dragonflydb-dragonfly.md) (30,688 ⭐) — Dragonfly is a high-performance, multi-model in-memory data store designed to serve as a drop-in replacement for existing database infrastructures. By utilizing a multi-threaded, shared-nothing architecture and a fiber-based concurrency model, it maximizes CPU utilization and minimizes latency for read and write operations. The system supports a wide range of data structures, including strings, hashes, lists, sets, sorted sets, and JSON documents, while maintaining full compatibility with standard industry wire protocols and client libraries.

What distinguishes Dragonfly is its focus on effic
- [pola-rs/polars](https://awesome-repositories.com/repository/pola-rs-polars.md) (38,855 ⭐) — Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters.

The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
- [apache/fory](https://awesome-repositories.com/repository/apache-fory.md) (4,234 ⭐) — Fory is a cross-language serialization framework and binary data serializer designed to convert complex object graphs into a compact binary format for high-performance data exchange. It includes an IDL-based schema compiler to transform interface definition language files into type-safe native data models and a schema evolution manager to maintain forward and backward compatibility.

The project features a zero-copy data access layer that allows reading specific fields from binary rows without deserializing the entire object. It supports dual-mode serialization, enabling a toggle between a por
- [haxefoundation/format](https://awesome-repositories.com/repository/haxefoundation-format.md) (139 ⭐) — The format library contains support for different file-formats for the Haxe programming language.
- [ant-design/ant-design](https://awesome-repositories.com/repository/ant-design-ant-design.md) (98,362 ⭐) — Ant Design is an enterprise-grade component library and design system framework built for developing complex, data-heavy web applications. It provides a comprehensive collection of pre-built, state-driven interface elements that map data properties to rendered components, ensuring consistent interaction patterns and visual language across large-scale projects.

The library distinguishes itself through a robust styling architecture that utilizes design tokens and hierarchical configuration providers to propagate global settings like themes, locale, and layout direction. By employing component-l
- [apache/arrow](https://awesome-repositories.com/repository/apache-arrow.md) (16,529 ⭐) — Arrow is a cross-language development platform for in-memory data. It provides a standardized, language-independent columnar memory format designed to accelerate analytical operations and improve memory efficiency on modern computing hardware. By utilizing a schema-driven approach, the framework enables the efficient organization of both flat and nested data structures.

The project functions as an analytical data processing engine that facilitates high-performance computation directly on memory-resident datasets. It distinguishes itself through a zero-copy architecture, which allows multiple
- [parvardegr/sharing](https://awesome-repositories.com/repository/parvardegr-sharing.md) (1,834 ⭐) — Sharing is a command-line tool to share directories and files from the CLI to iOS and Android devices without the need of an extra client app
- [google-gemini/gemini-cli](https://awesome-repositories.com/repository/google-gemini-gemini-cli.md) (105,341 ⭐) — This project provides a command-line interface for managing autonomous agent workflows, task orchestration, and system-level automation. It includes a comprehensive framework for defining agent skills, managing persistent memory, and delegating tasks to specialized subagents. Users can configure complex planning modes, execute shell commands with safety constraints, and integrate external tools through standardized protocols.

The platform supports non-interactive execution via a headless mode and provides an event-driven hook framework for custom lifecycle automation. It features centralized
- [finos/perspective](https://awesome-repositories.com/repository/finos-perspective.md) (10,967 ⭐) — Perspective is a columnar data analytics library and streaming data visualization engine. It provides an interactive data grid component and notebook analytics widgets designed for processing high-volume data and rendering interactive charts and grids.

The system utilizes a high-performance query engine to enable real-time data analysis and streaming dataset visualization. It supports the creation of customizable dashboards and reports that update automatically as new data arrives without requiring full dataset reloads.

The project covers large-scale dataset analytics through a schema-driven
- [denoland/fresh](https://awesome-repositories.com/repository/denoland-fresh.md) (13,776 ⭐) — Fresh is a full-stack, type-safe web framework built for TypeScript that prioritizes server-side rendering and edge-ready deployment. It generates full HTML content on the server for every request, ensuring immediate page visibility and search engine accessibility while utilizing streaming response generation to reduce latency.

The framework distinguishes itself through an islands-based architecture that performs partial hydration, sending minimal JavaScript to the client by only activating interactive components. It manages state across these components using a reactive signals system, which
- [universaldatatool/universal-data-tool](https://awesome-repositories.com/repository/universaldatatool-universal-data-tool.md) (2,068 ⭐) — Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.
- [soybeanjs/soybean-admin](https://awesome-repositories.com/repository/soybeanjs-soybean-admin.md) (14,503 ⭐) — Soybean Admin is a type-safe frontend management boilerplate and dashboard template built with Vue 3, Vite, and TypeScript. It provides a pre-configured foundation for creating enterprise administrative interfaces, utilizing the NaiveUI component framework and UnoCSS for utility-first styling.

The project distinguishes itself through automated workflow tools, including file-system-based route generation and a command-line interface for automating git commits and project deployments. It implements a comprehensive security model featuring both static and dynamic role-based access control to res
- [dotnet/format](https://awesome-repositories.com/repository/dotnet-format.md) (1,947 ⭐) — Home for the dotnet-format command
- [casey/just](https://awesome-repositories.com/repository/casey-just.md) (34,302 ⭐) — This project is a command-line task runner designed to manage project-specific workflows through a centralized, configuration-driven interface. It functions as a declarative tool for organizing build logic, environment variables, and task dependencies into a structured format, enabling the automation of complex development pipelines.

The tool distinguishes itself by providing a shell-agnostic execution layer that ensures consistent behavior across Windows, macOS, and Linux. It supports advanced workflow orchestration by constructing directed acyclic graphs to manage task prerequisites, while
- [krayin/laravel-crm](https://awesome-repositories.com/repository/krayin-laravel-crm.md) (21,404 ⭐) — This project is a modular, open-source customer relationship management platform built on the Laravel framework. It serves as a comprehensive business application framework designed for tracking sales pipelines, managing business entities, and automating marketing workflows. By providing a self-hosted solution, it enables organizations to maintain full control over their contact data, sales leads, and communication history.

The platform distinguishes itself through a highly extensible architecture that allows developers to modify core behavior without altering the underlying source code. It u
- [benyamindsmith/ig.degree.betweenness](https://awesome-repositories.com/repository/benyamindsmith-ig-degree-betweenness.md) (40 ⭐) — Implementation of the "Node Degree+Edge" Betweenness Community Detection Algorithm for 'igraph' Objects with R
- [donnemartin/data-science-ipython-notebooks](https://awesome-repositories.com/repository/donnemartin-data-science-ipython-notebooks.md) (29,166 ⭐) — This project is a collection of interactive Python notebooks and educational resources designed for mastering data science, machine learning, and numerical computing. It provides a series of practical guides and tutorials covering deep learning, big data processing, and statistical analysis.

The repository features specialized instructional suites for implementing classical machine learning algorithms, building deep learning model architectures, and managing AWS cloud infrastructure. It includes dedicated notebooks for data visualization and numerical computing exercises.

The project covers
- [pulsejet/memories](https://awesome-repositories.com/repository/pulsejet-memories.md) (3,697 ⭐) — Memories is a self-hosted photo and video management system designed for organizing, indexing, and sharing media libraries from a private server. It functions as an AI-powered media organizer that uses artificial intelligence for face recognition and object tagging to automatically categorize large collections.

The system distinguishes itself through deep metadata integration and specialized processing, featuring a geographic photo viewer that plots media on a map using GPS data and reverse geocoding. It also includes a self-hosted video transcoder that converts files into adaptive HLS stream
- [flutter/flutter](https://awesome-repositories.com/repository/flutter-flutter.md) (177,056 ⭐) — This project is a multi-platform UI framework designed for building applications that target mobile, web, and desktop environments from a single codebase. It utilizes a declarative paradigm where the user interface is defined as a function of application state, supported by a layered architecture that includes a high-performance rendering engine and a multi-platform compilation model.

The framework provides a comprehensive suite of developer tools, including hot reloading for real-time code injection and diagnostic utilities for monitoring application state and performance. It features a modu
- [encoredev/encore](https://awesome-repositories.com/repository/encoredev-encore.md) (12,049 ⭐) — Encore is a distributed systems framework designed to unify backend development, infrastructure provisioning, and observability. It functions as an infrastructure-as-code platform that allows developers to define cloud resources, databases, and messaging topics directly within their application code. By analyzing these declarations at compile-time, the system automatically manages the deployment of cloud resources and security policies, ensuring parity between local development and production environments.

The platform distinguishes itself through its integrated development experience, which
- [magento/data-migration-tool](https://awesome-repositories.com/repository/magento-data-migration-tool.md) (339 ⭐) — We're pleased you're considering moving from the world's #1 eCommerce platform—Magento 1.x—to the eCommerce platform for the future, Magento 2. We're also excited to share the details about this process, which we refer to as migration.
- [apexcharts/apexcharts.js](https://awesome-repositories.com/repository/apexcharts-apexcharts-js.md) (15,096 ⭐) — ApexCharts is a comprehensive JavaScript charting library designed for building interactive, responsive, and data-driven visualizations within web applications. It functions as a versatile data visualization framework that supports a wide range of chart types, including categorical, statistical, and financial plots, enabling developers to construct complex dashboards and real-time monitoring interfaces.

The library distinguishes itself through a deep commitment to accessibility and high-performance interactivity. It provides built-in support for keyboard navigation, screen readers, and high-c
- [beavailable/share](https://awesome-repositories.com/repository/beavailable-share.md) (49 ⭐) — Share and receive files effortlessly over HTTP
- [pandas-dev/pandas](https://awesome-repositories.com/repository/pandas-dev-pandas.md) (49,039 ⭐) — Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations.

The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
- [bvaughn/react-virtualized](https://awesome-repositories.com/repository/bvaughn-react-virtualized.md) (27,072 ⭐) — react-virtualized is a library of components for rendering massive lists and tables by drawing only the elements visible in the viewport. It provides specialized layout managers including a windowed grid component and a dynamic height list manager.

The project includes a masonry layout engine for packing items of varying heights and widths, as well as an infinite scroll interface for incrementally fetching and appending data.

The library covers a broad range of virtualization capabilities, including frozen grid elements, reverse list rendering, and synchronized viewport scrolling. It also su
- [ammar64/sharing](https://awesome-repositories.com/repository/ammar64-sharing.md) (137 ⭐) — Share files and apps over HTTP. You need the other device to be connected to the same network. just toggle on the server and scan the QR Code on other device and you're good to go. Files sent from browser to the app can be found in Sharing/ folder in your internal storage. You can always disable…
- [appsmithorg/appsmith](https://awesome-repositories.com/repository/appsmithorg-appsmith.md) (40,051 ⭐) — Appsmith is a low-code platform designed for building internal business tools, such as operational dashboards and administrative panels. It enables developers to construct dynamic user interfaces by dragging and dropping modular widgets onto a canvas and binding them directly to backend data sources. The platform utilizes a reactive framework that automatically updates interface elements and triggers functions whenever underlying data or widget properties change, eliminating the need for manual event handling.

The platform distinguishes itself through a server-side proxy architecture that exe
- [mikefarah/yq](https://awesome-repositories.com/repository/mikefarah-yq.md) (14,913 ⭐) — This tool is a command-line processor designed for querying, updating, and transforming structured data files. It functions as a versatile engine for manipulating YAML, JSON, TOML, and XML documents, allowing users to perform complex operations directly from the terminal. By utilizing a path-based expression language, it enables precise navigation and modification of data structures within configuration files and infrastructure-as-code workflows.

What distinguishes this tool is its ability to perform in-place document mutations while preserving original formatting, comments, and metadata. It
- [mtrebi/memory-allocators](https://awesome-repositories.com/repository/mtrebi-memory-allocators.md) (1,977 ⭐) — Custom memory allocators in C++ to improve the performance of dynamic memory allocation
- [datahub-project/datahub](https://awesome-repositories.com/repository/datahub-project-datahub.md) (12,141 ⭐) — DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations.

The platform distinguishes itself through its focus on grounding artificial intelligence and autono
- [cinnamon/kotaemon](https://awesome-repositories.com/repository/cinnamon-kotaemon.md) (25,139 ⭐) — Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document processing, retrieval-augmented generation, and multi-step reasoning. It provides a comprehensive platform for developing document-based question answering systems, allowing users to chain language models, prompt templates, and external tools into complex, automated pipelines.

The system distinguishes itself through a highly modular architecture that emphasizes component-based composition and schema-driven data exchange. It supports autonomous agents capable of decomposing complex q
- [esri/gis-tools-for-hadoop](https://awesome-repositories.com/repository/esri-gis-tools-for-hadoop.md) (524 ⭐) — gis-tools-for-hadoop The GIS Tools for Hadoop are a collection of GIS tools that leverage the Spatial Framework for Hadoop for spatial analysis of big data. The tools make use of the Geoprocessing Tools for Hadoop toolbox, to provide access to the Hadoop system from the ArcGIS Geoprocessing…
- [wesm/pydata-book](https://awesome-repositories.com/repository/wesm-pydata-book.md) (24,668 ⭐) — This project serves as a comprehensive textbook and educational resource for data analysis using the Python ecosystem. It provides a structured guide to manipulating, cleaning, and processing datasets, focusing on the core tools required for numerical computing and statistical analysis.

The repository distinguishes itself by offering a collection of practical code examples and workflows that demonstrate how to perform complex data tasks. It covers the application of vectorized numerical computations, the management of time-indexed data, and the creation of statistical visualizations to commun
- [camel-ai/camel](https://awesome-repositories.com/repository/camel-ai-camel.md) (17,253 ⭐) — This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer.

The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
- [rustcrypto/formats](https://awesome-repositories.com/repository/rustcrypto-formats.md) (322 ⭐) — Cryptography-related format encoders/decoders: DER, PEM, PKCS, PKIX
- [memorilabs/memori](https://awesome-repositories.com/repository/memorilabs-memori.md) (15,358 ⭐) — Memori is an AI agent memory middleware platform designed to provide persistent, context-aware recall for language models. It functions as a non-intrusive layer that intercepts outbound model requests to automatically capture interaction history and execution traces, ensuring that agents maintain continuity across sessions without requiring modifications to existing application logic.

The platform distinguishes itself through a dual-model storage architecture that maintains information as both structured relational primitives for precise fact retrieval and rolling narrative summaries for situ
- [cwida/duckdb](https://awesome-repositories.com/repository/cwida-duckdb.md) (38,822 ⭐) — DuckDB is an embedded, in-process analytical SQL database and OLAP database management system. It functions as a data engine for Parquet and CSV files, allowing users to execute complex SQL queries on large datasets without requiring a separate server process.

The system is designed for local analytical processing and embedded data science workflows. It enables the direct querying and analysis of Parquet and CSV files from disk, bypassing the need to load data into a permanent database.

The engine provides high-performance analytical SQL execution, including support for window functions and
- [crewaiinc/crewai](https://awesome-repositories.com/repository/crewaiinc-crewai.md) (53,687 ⭐) — CrewAI is a multi-agent orchestration framework designed for building autonomous systems that execute complex, multi-step workflows. It provides a development platform where specialized agents are defined with specific roles, goals, and tool sets to perform tasks collaboratively. By leveraging a declarative workflow engine, the system manages task dependencies, state transitions, and execution logic, allowing for the creation of structured, stateful sequences of operations.

The framework distinguishes itself through its hierarchical management capabilities, which utilize manager agents to coo
