29 مستودعات
Capabilities for querying and loading data directly from flat files like CSV and Parquet.
Distinct from File-to-Table Importers: Focuses on the ability to treat files as database tables via SQL, which is distinct from schema validation or generic file importers.
Explore 29 awesome GitHub repositories matching data & databases · File-Based Data Import. Refine with filters or upvote what's useful.
DuckDB هو نظام إدارة قواعد بيانات SQL تحليلي مضمن داخل العملية (in-process) ونظام OLAP. يعمل كمحرك بيانات لملفات Parquet و CSV، مما يسمح للمستخدمين بتنفيذ استعلامات SQL معقدة على مجموعات بيانات كبيرة دون الحاجة إلى عملية خادم منفصلة. تم تصميم النظام للمعالجة التحليلية المحلية وسير عمل علوم البيانات المضمنة. وهو يتيح الاستعلام المباشر وتحليل ملفات Parquet و CSV من القرص، متجاوزاً الحاجة إلى تحميل البيانات في قاعدة بيانات دائمة. يوفر المحرك تنفيذ SQL تحليلي عالي الأداء، بما في ذلك دعم وظائف النافذة والاستعلامات الفرعية المتداخلة. وهو يدمج تخطيط تخزين عمودي وتنفيذ استعلام متجه للتعامل مع معالجة البيانات واستكشافها على نطاق واسع. يمكن الوصول إلى قاعدة البيانات عبر واجهة سطر أوامر مستقلة وارتباطات خاصة بلغات Python و R و Java و Wasm.
Enables direct loading and querying of CSV and Parquet files by referencing them within SQL queries.
This project is a collection of educational resources and reference implementations for the Apache Flink stream processing framework. It provides a learning resource focused on mastering distributed stream processing through implementation guides, performance tuning tutorials, and practical examples. The repository features detailed walkthroughs for building real-time data pipelines using the DataStream and Table APIs. It includes specific integration examples for connecting Apache Flink with Kafka brokers and Elasticsearch indices, as well as reference implementations for real-time deduplica
Reads text or formatted files from a path to process data once or continuously as a stream.
This project is an automated machine learning framework and toolkit designed for training and tuning custom models for classification, regression, and recommendations. It functions as a multimodal machine learning toolkit capable of processing and training models using a combination of text, image, audio, and sensor data. The framework distinguishes itself as a multimodal data processor that can handle and visualize large datasets on a single machine using column-oriented disk storage. It includes a core machine learning model generator that converts trained models into formats compatible wit
Loads information from local or remote files into a scalable tabular structure.
Akaunting is a modular business enterprise resource planning system and self-hosted accounting software. It provides a comprehensive platform for small business financial management, centering on a double-entry bookkeeping system with a general ledger and chart of accounts. The platform is designed for extensibility through a module-based architecture and a dedicated marketplace for procuring third-party applications. It supports multi-tenant data isolation and utilizes role-based access control to manage granular user permissions. Its capability surface covers a wide range of business opera
Provides a way to save custom field mappings for specific file structures to reuse for future data uploads.
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Allows opening datasets from local CSV and Excel files or notebook variables for analysis.
AI Town is a TypeScript-based simulation engine used to create virtual environments where autonomous characters interact and socialize. It functions as a framework for orchestrating multiple AI agents within a persistent digital world, utilizing language models and a game engine to drive character behavior and social interactions. The project differentiates itself through a dedicated agent sandbox and a vector database agent store, which allow for the management of agent memories and world state. It integrates generative AI for background music and provides tools for simulation world design,
Provides capabilities to load data from files into specified database tables.
Apache JMeter is a Java-based performance testing tool and multi-protocol traffic simulator used to analyze the stability and scalability of servers and networks. It functions as a distributed load testing framework that coordinates remote worker nodes from a single controller to generate high volumes of concurrent traffic. The project is distinguished by its ability to simulate traffic across diverse backend systems, including HTTP, JDBC, LDAP, JMS, FTP, and TCP. It provides a headless command-line interface for automated execution and a reporting system that transforms raw sample logs into
Loads comma-separated values from external files to provide unique data for test iterations.
RedisInsight is a graphical user interface and management tool for browsing, analyzing, and administering Redis databases. It provides a visual environment for exploring key-value data structures, managing database instances, and performing data analysis across different operating systems and deployments. The tool distinguishes itself by providing dedicated visual managers for complex operations, including a vector database manager for configuring embeddings and similarity searches, a query workbench for executing raw commands and Lua scripts, and a performance monitoring dashboard for tracki
Reads content from standard input or text files to set key values and execute commands.
Dejavu is a containerized administration panel and web interface for managing data within Elasticsearch and OpenSearch clusters. It serves as a search index management tool for browsing, editing, and deleting records through a visual explorer rather than raw API queries. The project distinguishes itself by providing a search interface prototyping tool. This allows users to visually design search screens to test result relevancy and export the final layout configuration as usable code. The tool covers broad data management capabilities, including structured data import from CSV or JSON files
Supports loading structured data from CSV or JSON files into search clusters via guided mapping.
JimuReport is an open-source reporting and dashboard engine designed to be embedded directly into Spring Boot applications. Its core identity centers on generating data reports and full-screen dashboards from natural language descriptions, eliminating the need for manual design. The platform also provides a conversational query interface that translates plain-language questions into database queries, returning results as tables and charts without requiring SQL knowledge. What distinguishes JimuReport is its integration of AI skills that can be installed with a single command, enabling report
Loads data from Excel, CSV, and JSON files as a data source for report generation.
Dawarich is a self-hosted location history manager and travel journaling platform. It functions as a personal travel archive that collects GPS coordinates and movement data, providing a private alternative to proprietary tracking services. The system utilizes a PostgreSQL geospatial database to store coordinates, visits, and custom geofence boundaries. The project distinguishes itself as a geospatial data converter and visualization tool, capable of transforming location history between formats such as GPX, KML, and GeoJSON. It allows users to organize GPS tracks and geotagged photos into nam
Combines multiple GPS exchange files into a single unified track within the browser.
Azure Data Studio is a cross-platform SQL database management IDE used for writing queries, managing schemas, and administering relational databases. It functions as a comprehensive environment for relational database management, providing a structured interface for executing SQL queries and browsing database objects. The platform is distinguished by its interactive data notebooks, which combine executable code cells, narrative text, and visualizations for data analysis. It also includes specialized tools for database migration, allowing users to assess and transfer schemas and data from on-p
Includes a guided wizard to load data from CSV, text, and JSON files into database tables.
AlaSQL is a JavaScript SQL database engine that allows for the filtering, grouping, and joining of in-memory object arrays and JSON data. It functions as an in-memory SQL database and client-side data processor, enabling the execution of SQL statements against JavaScript arrays and external data sources in both browser and server environments. The project serves as a universal data query tool capable of performing relational joins across diverse sources, such as merging Google Spreadsheets, SQLite files, and remote APIs into a single result set. It also acts as an IndexedDB SQL wrapper, allow
Imports data from legacy .xls files directly into the database for SQL querying.
IPFS Desktop هو عميل رسومي لتخزين واسترجاع وإدارة المحتوى على شبكة من الأقران. يعمل كمدير نظام ملفات موزع وأداة إدارة عقدة، مما يسمح للمستخدمين بتشغيل عقدة محلية وإدارة البيانات المعنونة بالمحتوى دون استخدام سطر الأوامر. يتضمن التطبيق مصوراً للشبكة لمراقبة الأقران المتصلين وتحليل طوبولوجيا الشبكة العالمية. كما يعمل كمعالج بروتوكول نظام قادر على اعتراض ومعالجة عناوين ipfs وipns. يغطي البرنامج تخزين الملفات اللامركزي واستضافة المحتوى الموزع، ويتميز بأدوات لاستيراد البيانات عبر السحب والإفلات أو الأرشيفات الثنائية. يوفر أدوات لتنظيم الملفات، وتثبيت المحتوى عن بعد، وجمع القمامة لإدارة مساحة القرص. تشمل قدرات إدارة الشبكة تكوين التناظر الدائم، وPubSub، وإعدادات IPNS.
Enables loading specific data sets into a node using Content Addressable Archive files.
Speedscope is a web-based performance profiler that visualizes profiling data through interactive flamegraphs and timeline views. It ingests performance profiles from a wide range of sources, including Chrome, Firefox, Safari, Node.js, .NET Core, Instruments, Hermes, GHC, and Ruby, normalizing them into a common schema for unified analysis. The tool distinguishes itself with a canvas-based rendering engine that draws flamegraphs without DOM nodes for each frame, and a WebAssembly-based rendering pipeline for high-performance drawing. It offers left-heavy stack sorting to surface the most time
Captures JavaScript performance data from Hermes engine in development mode via the React Native Dev Menu and saves it as a .cpuprofile file for analysis.
SQLiteStudio is an open-source graphical tool for browsing, editing, and managing SQLite database files. It combines a full-featured SQL editor with syntax highlighting, a visual database schema designer for creating entity-relationship diagrams, and a plugin-based extensibility platform that allows adding custom functionality through C/C++, JavaScript, Tcl, or Python. The application distinguishes itself through its multi-language scripting engine, which embeds JavaScript, Tcl, and Python interpreters to enable user-defined functions and scripts within SQL queries. It supports encrypted data
Imports data from files into database tables, creating the table if it does not exist.
InvenTree is an open-source inventory management platform built on Django, designed for tracking parts, stock levels, and supply chain operations through a web interface and REST API. The system uses barcodes—including QR codes, 1D barcodes, and Data Matrix codes—as primary identifiers for scanning, linking, and triggering inventory actions, and extends core functionality through a Python plugin framework supporting custom actions, UI panels, barcode handlers, and scheduled tasks. The platform distinguishes itself through a comprehensive plugin-based extensibility system that allows custom in
InvenTree uses a data import wizard to step through selecting a file and mapping its data to create parts.
Apache Hive is a SQL-on-Hadoop data warehouse that enables querying and managing petabytes of data stored in distributed storage such as HDFS and cloud storage services. It provides a familiar SQL interface for batch analytics and reporting, supported by a core set of components including the HiveServer2 Thrift service for remote query execution, the Hive Metastore Service for central metadata management, the Hive ACID Transaction Engine for concurrent read-write operations, and the Hive LLAP Interactive Engine for low-latency analytical processing. The WebHCat REST API offers an HTTP interfac
Imports data from HDFS files into Hive tables using external table definitions or load commands.
هذا المشروع عبارة عن منهج تعليمي للتعلم الآلي ومنصة تعليمية يتم تقديمها من خلال دفاتر Jupyter التفاعلية. يعمل كدليل شامل لإتقان مجموعة أدوات علوم البيانات في Python، ويوفر دروساً منظمة للحوسبة العددية، ومعالجة البيانات الجدولية، والتصور الإحصائي. يتضمن المنهج أدلة تنفيذ محددة لـ Scikit-Learn ودورة عملية حول TensorFlow لبناء وتدريب ونشر الشبكات العصبية ونماذج رؤية الحاسب. ويغطي العملية الشاملة لبناء النماذج التنبؤية، من صياغة المشكلة الأولية وتصنيف المهام إلى نشر النماذج عبر واجهات الويب التفاعلية. يغطي المشروع سطح إمكانات واسع بما في ذلك الحوسبة العددية مع المصفوفات متعددة الأبعاد، وتحليل البيانات الاستكشافي، وروتينات معالجة البيانات مسبقاً. ويوفر سير عمل مفصلاً للتعلم الخاضع للإشراف وغير الخاضع للإشراف، وخطوط أنابيب التعلم الآلي المؤتمتة، وتحسين المعلمات الفائقة، وتقييم النموذج باستخدام مقاييس التصنيف والتحقق المتبادل. يتم تنظيم المحتوى التعليمي كسلسلة من الدفاتر التي تتداخل فيها أكواد Python مع التفسيرات السردية لتوثيق سير عمل علوم البيانات.
Provides capabilities for querying and loading data directly from flat files like CSV and Excel.
Reth is a modular, high-performance Ethereum execution layer client written in Rust. It serves as a full Ethereum node that syncs, validates, and serves blockchain data, functioning as an archive node implementation, a high-throughput RPC node server, and a snapshot sync tool. The project is built around a modular component architecture that allows assembling custom node behavior by swapping independent Rust crates for consensus, execution, mempool, and networking. The client distinguishes itself through a staged sync pipeline that downloads headers and bodies online before processing the res
Reads previously exported chain data files and writes their blocks and state into the local database.