15 repositorios
Systems for managing relational data with indexed schemas.
Distinguishing note: Focuses on application-level data modeling, not general database management.
Explore 15 awesome GitHub repositories matching data & databases · Structured Data Management. Refine with filters or upvote what's useful.
Claude-mem is an agentic memory persistence system designed to provide AI assistants with long-term context across multiple development sessions. It functions as a background orchestrator that captures, summarizes, and indexes interaction history, allowing models to maintain continuity and recall technical decisions from past tasks. By utilizing a vector-augmented context engine, the system injects relevant historical observations into active sessions, ensuring that AI agents remain informed without exceeding finite token budgets. The project distinguishes itself through an endless memory arc
Manages session data and observations using structured database tables with indexed columns for efficient retrieval.
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
Organizes composite data types that store multiple fields within a single column.
This project is a web-based platform designed for creating, managing, and sharing professional resumes. It functions as a structured document builder that integrates artificial intelligence to assist with content generation, editing, and analysis. Users can maintain a collection of resumes, customize their visual presentation through various templates, and export them into multiple formats for job applications. The platform distinguishes itself through its autonomous AI agent capabilities, which can perform research, suggest incremental edits, and apply data patches directly to documents. It
Organizes and imports resume data to maintain a structured collection of documents.
This project is a collection of interactive Python notebooks and educational resources designed for mastering data science, machine learning, and numerical computing. It provides a series of practical guides and tutorials covering deep learning, big data processing, and statistical analysis. The repository features specialized instructional suites for implementing classical machine learning algorithms, building deep learning model architectures, and managing AWS cloud infrastructure. It includes dedicated notebooks for data visualization and numerical computing exercises. The project covers
Explains how to organize heterogeneous data into structured arrays to manage complex records.
This project is a community-driven directory of developer portfolios designed to serve as a resource for professional identity development and design inspiration. It functions as a structured data repository that collects and organizes personal website metadata, enabling users to discover and share examples of professional online presence. The platform operates through a collaborative model where content is managed via version control workflows. By utilizing pull requests, the project facilitates community-driven growth, allowing contributors to submit and maintain portfolio entries within a
Organizes and maintains collections of developer profiles in structured, machine-readable formats.
PostgreSQL is an object-relational database management system designed for the persistent storage and retrieval of structured information. It functions as an ACID-compliant database server, utilizing standard query language protocols to maintain data consistency and reliability across large-scale application datasets. The system distinguishes itself through an extensible architecture that allows for the definition of custom data types, operators, and indexing methods. It employs multi-version concurrency control to enable simultaneous read and write operations without blocking, supported by a
Organizes complex information into interconnected sets for reliable, high-performance access.
Grav is a flat-file content management system that eliminates the need for a traditional database by storing site content and configuration in human-readable Markdown and YAML files. Built as a modular PHP web framework, it uses a hierarchical page routing system where the physical directory structure directly determines the site's URL paths. The platform is distinguished by its event-driven plugin architecture and a command-line interface that prioritizes system administration, deployment, and maintenance tasks. It utilizes a blueprint-driven system to generate administrative forms from stru
Integrates flexible data storage and management systems to handle complex content types.
The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane. The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It
Organizes tabular data with automated maintenance for analytics integration.
Grist is a relational spreadsheet platform that combines the flexibility of a spreadsheet with the power of a relational database. At its core, it manages structured data across multiple linked tables, using a relational database engine to organize information while providing a familiar grid interface. The platform supports Python-based formulas for complex calculations and data transformations, with automatic recalculation when referenced cells change. The system is designed for self-hosted deployment, storing data in either portable SQLite files or enterprise-grade PostgreSQL databases. It
Combines a relational database with a spreadsheet interface to organize data and calculate values using Python formulas.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Provides systems for managing multimodal data with indexed schemas and structured layout.
This project is a comprehensive library of practical Python code examples and patterns. It provides a collection of scripts and snippets designed to demonstrate a wide range of programming tasks, from basic syntax to advanced implementation patterns. The repository focuses on several core domains, including the implementation of concurrency and multithreading examples, data analysis snippets for cleaning and manipulating tabular data, and various data visualization examples. It also covers automation scripts for file system management and a variety of general programming patterns. Additional
Implements JSON object serialization and database connection management for local storage.
TaskWeaver is an LLM agent framework that interprets natural language requests and executes them as Python code, SQL queries, or shell commands. It functions as a conversational code interpreter that maintains stateful data structures across turns, generating executable code from user prompts within a session-based environment. The system is designed as a self-hosted AI agent platform that can be deployed in Docker, managing sessions and providing a web UI for data analytics and automation tasks. The framework distinguishes itself through a role-based multi-agent architecture that divides the
TaskWeaver maintains rich data structures like pandas DataFrames across conversation turns for iterative analysis.
Este proyecto es una colección de scripts en Python y ejemplos de código fuente diseñados para aprender los fundamentos de la programación mediante la aplicación práctica. Funciona como un kit de herramientas para web scraping y automatización de navegadores, junto con una biblioteca de utilidades para el procesamiento de datos. El repositorio incluye scripts para simular interacciones humanas con el fin de automatizar tareas web repetitivas y procesos de reserva en línea. También proporciona una base de datos estructurada de divisiones administrativas, incluyendo provincias, ciudades y distritos, para la gestión de datos geográficos y la validación de direcciones. La colección abarca capacidades para extraer datos estructurados e imágenes de sitios web utilizando tanto controladores de navegador como peticiones de red. Otras utilidades permiten la manipulación de archivos de hoja de cálculo y la gestión de archivos comprimidos.
Organizes geographic data for provinces, cities, and districts using structured data formats.
dtale es una cuadrícula interactiva basada en web y visualizador para dataframes de pandas, diseñado como una herramienta de análisis de datos exploratorio. Proporciona una interfaz basada en navegador para analizar estructuras de datos tabulares, permitiendo a los usuarios calcular estadísticas, detectar valores atípicos y calcular correlaciones sin escribir código manual. El proyecto funciona como un visor de datos integrado que puede integrarse en aplicaciones web a través de iframes o rutas personalizadas, con soporte específico para Django, Flask y Streamlit. Permite la exploración de conjuntos de datos a través de una combinación de una cuadrícula de datos interactiva y una biblioteca de visualización de datos capaz de generar histogramas, diagramas de caja y gráficos de dispersión 3D. La plataforma cubre una amplia gama de capacidades de gestión y análisis de datos, incluyendo limpieza de datos tabulares, remodelación y filtrado interactivo. Incluye herramientas de observabilidad para el análisis de datos faltantes, cálculo de correlación y puntuación de poder predictivo. Para la gestión de sesiones, admite el seguimiento de múltiples instancias y la persistencia del estado en procesos de trabajo concurrentes. La interfaz está protegida por autenticación de nombre de usuario y contraseña y admite la ingesta de datos desde archivos delimitados, hojas de cálculo y almacenes de datos ArcticDB.
Persists and shares the state of analyzed pandas DataFrames across multiple sessions or worker processes.
Colanode es una plataforma de colaboración local-first diseñada para documentos compartidos, chat y bases de datos. Proporciona una suite autohospedada para la colaboración en equipo y la gestión del conocimiento, permitiendo a los usuarios mantener el control total sobre sus datos y privacidad en su propia infraestructura. La plataforma se distingue por un motor de sincronización que utiliza WebSockets para la transmisión de datos en tiempo real y un enfoque local-first para asegurar que el trabajo continúe sin conexión. Incorpora recuperación impulsada por IA a través de búsqueda semántica basada en vectores, permitiendo a los usuarios encontrar información basada en el significado a través de documentos y mensajes. El sistema cubre una amplia gama de capacidades, incluyendo edición de texto enriquecido colaborativo, modelado de contenido jerárquico y gestión de bases de datos estructuradas con vistas como tableros kanban y calendarios. Gestiona los medios a través de un almacén de documentos compatible con S3 y organiza las entidades en relaciones flexibles de padre-hijo. El software se puede desplegar a través de Docker Compose o Kubernetes utilizando Helm charts y admite la integración con proxies inversos externos para el enrutamiento de tráfico de producción.
Organizes information using custom fields and dynamic views such as tables, kanban boards, and calendars.