Why is cwida/duckdb a recommended File-Based Data Import GitHub Repositories repository?

Enables direct loading and querying of CSV and Parquet files by referencing them within SQL queries.

Why is zhisheng17/flink-learning a recommended File-Based Data Import GitHub Repositories repository?

Reads text or formatted files from a path to process data once or continuously as a stream.

Why is apple/turicreate a recommended File-Based Data Import GitHub Repositories repository?

Loads information from local or remote files into a scalable tabular structure.

Why is akaunting/akaunting a recommended File-Based Data Import GitHub Repositories repository?

Provides a way to save custom field mappings for specific file structures to reuse for future data uploads.

Why is microsoft/vscode-copilot-chat a recommended File-Based Data Import GitHub Repositories repository?

Allows opening datasets from local CSV and Excel files or notebook variables for analysis.

Why is a16z-infra/ai-town a recommended File-Based Data Import GitHub Repositories repository?

Provides capabilities to load data from files into specified database tables.

Why is apache/jmeter a recommended File-Based Data Import GitHub Repositories repository?

Loads comma-separated values from external files to provide unique data for test iterations.

Why is redis/redisinsight a recommended File-Based Data Import GitHub Repositories repository?

Reads content from standard input or text files to set key values and execute commands.

Why is appbaseio/dejavu a recommended File-Based Data Import GitHub Repositories repository?

Supports loading structured data from CSV or JSON files into search clusters via guided mapping.

Why is jeecgboot/jimureport a recommended File-Based Data Import GitHub Repositories repository?

Loads data from Excel, CSV, and JSON files as a data source for report generation.

29 مستودعات

Awesome GitHub RepositoriesFile-Based Data Import

Capabilities for querying and loading data directly from flat files like CSV and Parquet.

Distinct from File-to-Table Importers: Focuses on the ability to treat files as database tables via SQL, which is distinct from schema validation or generic file importers.

Explore 29 awesome GitHub repositories matching data & databases · File-Based Data Import. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

cwida/duckdb
cwida/duckdb
38,822عرض على GitHub
DuckDB هو نظام إدارة قواعد بيانات SQL تحليلي مضمن داخل العملية (in-process) ونظام OLAP. يعمل كمحرك بيانات لملفات Parquet و CSV، مما يسمح للمستخدمين بتنفيذ استعلامات SQL معقدة على مجموعات بيانات كبيرة دون الحاجة إلى عملية خادم منفصلة. تم تصميم النظام للمعالجة التحليلية المحلية وسير عمل علوم البيانات المضمنة. وهو يتيح الاستعلام المباشر وتحليل ملفات Parquet و CSV من القرص، متجاوزاً الحاجة إلى تحميل البيانات في قاعدة بيانات دائمة. يوفر المحرك تنفيذ SQL تحليلي عالي الأداء، بما في ذلك دعم وظائف النافذة والاستعلامات الفرعية المتداخلة. وهو يدمج تخطيط تخزين عمودي وتنفيذ استعلام متجه للتعامل مع معالجة البيانات واستكشافها على نطاق واسع. يمكن الوصول إلى قاعدة البيانات عبر واجهة سطر أوامر مستقلة وارتباطات خاصة بلغات Python و R و Java و Wasm.
Enables direct loading and querying of CSV and Parquet files by referencing them within SQL queries.
C++
عرض على GitHub38,822
zhisheng17/flink-learning
zhisheng17/flink-learning
15,071عرض على GitHub
This project is a collection of educational resources and reference implementations for the Apache Flink stream processing framework. It provides a learning resource focused on mastering distributed stream processing through implementation guides, performance tuning tutorials, and practical examples. The repository features detailed walkthroughs for building real-time data pipelines using the DataStream and Table APIs. It includes specific integration examples for connecting Apache Flink with Kafka brokers and Elasticsearch indices, as well as reference implementations for real-time deduplica
Reads text or formatted files from a path to process data once or continuously as a stream.
Javaclickhouseelasticsearchflink
عرض على GitHub15,071
apple/turicreate
apple/turicreate
11,171عرض على GitHub
This project is an automated machine learning framework and toolkit designed for training and tuning custom models for classification, regression, and recommendations. It functions as a multimodal machine learning toolkit capable of processing and training models using a combination of text, image, audio, and sensor data. The framework distinguishes itself as a multimodal data processor that can handle and visualize large datasets on a single machine using column-oriented disk storage. It includes a core machine learning model generator that converts trained models into formats compatible wit
Loads information from local or remote files into a scalable tabular structure.
C++
عرض على GitHub11,171
akaunting/akaunting
akaunting/akaunting
9,604عرض على GitHub
Akaunting is a modular business enterprise resource planning system and self-hosted accounting software. It provides a comprehensive platform for small business financial management, centering on a double-entry bookkeeping system with a general ledger and chart of accounts. The platform is designed for extensibility through a module-based architecture and a dedicated marketplace for procuring third-party applications. It supports multi-tenant data isolation and utilizes role-based access control to manage granular user permissions. Its capability surface covers a wide range of business opera
Provides a way to save custom field mappings for specific file structures to reuse for future data uploads.
PHPaccountingakauntingbalance
عرض على GitHub9,604
microsoft/vscode-copilot-chat
microsoft/vscode-copilot-chat
9,493عرض على GitHub
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Allows opening datasets from local CSV and Excel files or notebook variables for analysis.
TypeScript
عرض على GitHub9,493
a16z-infra/ai-town
a16z-infra/ai-town
9,285عرض على GitHub
AI Town is a TypeScript-based simulation engine used to create virtual environments where autonomous characters interact and socialize. It functions as a framework for orchestrating multiple AI agents within a persistent digital world, utilizing language models and a game engine to drive character behavior and social interactions. The project differentiates itself through a dedicated agent sandbox and a vector database agent store, which allow for the management of agent memories and world state. It integrates generative AI for background music and provides tools for simulation world design,
Provides capabilities to load data from files into specified database tables.
TypeScript
عرض على GitHub9,285
apache/jmeter
apache/jmeter
9,233عرض على GitHub
Apache JMeter is a Java-based performance testing tool and multi-protocol traffic simulator used to analyze the stability and scalability of servers and networks. It functions as a distributed load testing framework that coordinates remote worker nodes from a single controller to generate high volumes of concurrent traffic. The project is distinguished by its ability to simulate traffic across diverse backend systems, including HTTP, JDBC, LDAP, JMS, FTP, and TCP. It provides a headless command-line interface for automated execution and a reporting system that transforms raw sample logs into
Loads comma-separated values from external files to provide unique data for test iterations.
Javajavaperformancetest
عرض على GitHub9,233
redis/redisinsight
redis/RedisInsight
8,556عرض على GitHub
RedisInsight is a graphical user interface and management tool for browsing, analyzing, and administering Redis databases. It provides a visual environment for exploring key-value data structures, managing database instances, and performing data analysis across different operating systems and deployments. The tool distinguishes itself by providing dedicated visual managers for complex operations, including a vector database manager for configuring embeddings and similarity searches, a query workbench for executing raw commands and Lua scripts, and a performance monitoring dashboard for tracki
Reads content from standard input or text files to set key values and execute commands.
TypeScriptdatabase-guiredisredis-gui
عرض على GitHub8,556
appbaseio/dejavu
appbaseio/dejavu
8,465عرض على GitHub
Dejavu is a containerized administration panel and web interface for managing data within Elasticsearch and OpenSearch clusters. It serves as a search index management tool for browsing, editing, and deleting records through a visual explorer rather than raw API queries. The project distinguishes itself by providing a search interface prototyping tool. This allows users to visually design search screens to test result relevancy and export the final layout configuration as usable code. The tool covers broad data management capabilities, including structured data import from CSV or JSON files
Supports loading structured data from CSV or JSON files into search clusters via guided mapping.
JavaScript
عرض على GitHub8,465
jeecgboot/jimureport
jeecgboot/jimureport
8,059عرض على GitHub
JimuReport is an open-source reporting and dashboard engine designed to be embedded directly into Spring Boot applications. Its core identity centers on generating data reports and full-screen dashboards from natural language descriptions, eliminating the need for manual design. The platform also provides a conversational query interface that translates plain-language questions into database queries, returning results as tables and charts without requiring SQL knowledge. What distinguishes JimuReport is its integration of AI skills that can be installed with a single command, enabling report
Loads data from Excel, CSV, and JSON files as a data source for report generation.
Javaaibibigscreen
عرض على GitHub8,059
freika/dawarich
Freika/dawarich
8,030عرض على GitHub
Dawarich is a self-hosted location history manager and travel journaling platform. It functions as a personal travel archive that collects GPS coordinates and movement data, providing a private alternative to proprietary tracking services. The system utilizes a PostgreSQL geospatial database to store coordinates, visits, and custom geofence boundaries. The project distinguishes itself as a geospatial data converter and visualization tool, capable of transforming location history between formats such as GPX, KML, and GeoJSON. It allows users to organize GPS tracks and geotagged photos into nam
Combines multiple GPS exchange files into a single unified track within the browser.
Rubygoogle-mapsgpsloggerhacktoberfest
عرض على GitHub8,030
microsoft/azuredatastudio
microsoft/azuredatastudio
7,694عرض على GitHub
Azure Data Studio is a cross-platform SQL database management IDE used for writing queries, managing schemas, and administering relational databases. It functions as a comprehensive environment for relational database management, providing a structured interface for executing SQL queries and browsing database objects. The platform is distinguished by its interactive data notebooks, which combine executable code cells, narrative text, and visualizations for data analysis. It also includes specialized tools for database migration, allowing users to assess and transfer schemas and data from on-p
Includes a guided wizard to load data from CSV, text, and JSON files into database tables.
TypeScriptazureazure-data-studioelectron
عرض على GitHub7,694
alasql/alasql
AlaSQL/alasql
7,278عرض على GitHub
AlaSQL is a JavaScript SQL database engine that allows for the filtering, grouping, and joining of in-memory object arrays and JSON data. It functions as an in-memory SQL database and client-side data processor, enabling the execution of SQL statements against JavaScript arrays and external data sources in both browser and server environments. The project serves as a universal data query tool capable of performing relational joins across diverse sources, such as merging Google Spreadsheets, SQLite files, and remote APIs into a single result set. It also acts as an IndexedDB SQL wrapper, allow
Imports data from legacy .xls files directly into the database for SQL querying.
JavaScript
عرض على GitHub7,278
ipfs/ipfs-desktop
ipfs/ipfs-desktop
6,539عرض على GitHub
IPFS Desktop هو عميل رسومي لتخزين واسترجاع وإدارة المحتوى على شبكة من الأقران. يعمل كمدير نظام ملفات موزع وأداة إدارة عقدة، مما يسمح للمستخدمين بتشغيل عقدة محلية وإدارة البيانات المعنونة بالمحتوى دون استخدام سطر الأوامر. يتضمن التطبيق مصوراً للشبكة لمراقبة الأقران المتصلين وتحليل طوبولوجيا الشبكة العالمية. كما يعمل كمعالج بروتوكول نظام قادر على اعتراض ومعالجة عناوين ipfs وipns. يغطي البرنامج تخزين الملفات اللامركزي واستضافة المحتوى الموزع، ويتميز بأدوات لاستيراد البيانات عبر السحب والإفلات أو الأرشيفات الثنائية. يوفر أدوات لتنظيم الملفات، وتثبيت المحتوى عن بعد، وجمع القمامة لإدارة مساحة القرص. تشمل قدرات إدارة الشبكة تكوين التناظر الدائم، وPubSub، وإعدادات IPNS.
Enables loading specific data sets into a node using Content Addressable Archive files.
JavaScript
عرض على GitHub6,539
jlfwong/speedscope
jlfwong/speedscope
6,501عرض على GitHub
Speedscope is a web-based performance profiler that visualizes profiling data through interactive flamegraphs and timeline views. It ingests performance profiles from a wide range of sources, including Chrome, Firefox, Safari, Node.js, .NET Core, Instruments, Hermes, GHC, and Ruby, normalizing them into a common schema for unified analysis. The tool distinguishes itself with a canvas-based rendering engine that draws flamegraphs without DOM nodes for each frame, and a WebAssembly-based rendering pipeline for high-performance drawing. It offers left-heavy stack sorting to surface the most time
Captures JavaScript performance data from Hermes engine in development mode via the React Native Dev Menu and saves it as a .cpuprofile file for analysis.
TypeScriptflamegraphflamegraphsperformance-profiling
عرض على GitHub6,501
pawelsalawa/sqlitestudio
pawelsalawa/sqlitestudio
6,428عرض على GitHub
SQLiteStudio is an open-source graphical tool for browsing, editing, and managing SQLite database files. It combines a full-featured SQL editor with syntax highlighting, a visual database schema designer for creating entity-relationship diagrams, and a plugin-based extensibility platform that allows adding custom functionality through C/C++, JavaScript, Tcl, or Python. The application distinguishes itself through its multi-language scripting engine, which embeds JavaScript, Tcl, and Python interpreters to enable user-defined functions and scripts within SQL queries. It supports encrypted data
Imports data from files into database tables, creating the table if it does not exist.
Ccppdatabasedatabase-management
عرض على GitHub6,428
inventree/inventree
inventree/InvenTree
6,350عرض على GitHub
InvenTree is an open-source inventory management platform built on Django, designed for tracking parts, stock levels, and supply chain operations through a web interface and REST API. The system uses barcodes—including QR codes, 1D barcodes, and Data Matrix codes—as primary identifiers for scanning, linking, and triggering inventory actions, and extends core functionality through a Python plugin framework supporting custom actions, UI panels, barcode handlers, and scheduled tasks. The platform distinguishes itself through a comprehensive plugin-based extensibility system that allows custom in
InvenTree uses a data import wizard to step through selecting a file and mapping its data to create parts.
Pythondjangohacktoberfestpython
عرض على GitHub6,350
apache/hive
apache/hive
6,012عرض على GitHub
Apache Hive is a SQL-on-Hadoop data warehouse that enables querying and managing petabytes of data stored in distributed storage such as HDFS and cloud storage services. It provides a familiar SQL interface for batch analytics and reporting, supported by a core set of components including the HiveServer2 Thrift service for remote query execution, the Hive Metastore Service for central metadata management, the Hive ACID Transaction Engine for concurrent read-write operations, and the Hive LLAP Interactive Engine for low-latency analytical processing. The WebHCat REST API offers an HTTP interfac
Imports data from HDFS files into Hive tables using external table definitions or load commands.
Javaapachebig-datadatabase
عرض على GitHub6,012
mrdbourke/zero-to-mastery-ml
mrdbourke/zero-to-mastery-ml
5,839عرض على GitHub
هذا المشروع عبارة عن منهج تعليمي للتعلم الآلي ومنصة تعليمية يتم تقديمها من خلال دفاتر Jupyter التفاعلية. يعمل كدليل شامل لإتقان مجموعة أدوات علوم البيانات في Python، ويوفر دروساً منظمة للحوسبة العددية، ومعالجة البيانات الجدولية، والتصور الإحصائي. يتضمن المنهج أدلة تنفيذ محددة لـ Scikit-Learn ودورة عملية حول TensorFlow لبناء وتدريب ونشر الشبكات العصبية ونماذج رؤية الحاسب. ويغطي العملية الشاملة لبناء النماذج التنبؤية، من صياغة المشكلة الأولية وتصنيف المهام إلى نشر النماذج عبر واجهات الويب التفاعلية. يغطي المشروع سطح إمكانات واسع بما في ذلك الحوسبة العددية مع المصفوفات متعددة الأبعاد، وتحليل البيانات الاستكشافي، وروتينات معالجة البيانات مسبقاً. ويوفر سير عمل مفصلاً للتعلم الخاضع للإشراف وغير الخاضع للإشراف، وخطوط أنابيب التعلم الآلي المؤتمتة، وتحسين المعلمات الفائقة، وتقييم النموذج باستخدام مقاييس التصنيف والتحقق المتبادل. يتم تنظيم المحتوى التعليمي كسلسلة من الدفاتر التي تتداخل فيها أكواد Python مع التفسيرات السردية لتوثيق سير عمل علوم البيانات.
Provides capabilities for querying and loading data directly from flat files like CSV and Excel.
Jupyter Notebookdata-sciencedeep-learningmachine-learning
عرض على GitHub5,839
paradigmxyz/reth
paradigmxyz/reth
5,652عرض على GitHub
Reth is a modular, high-performance Ethereum execution layer client written in Rust. It serves as a full Ethereum node that syncs, validates, and serves blockchain data, functioning as an archive node implementation, a high-throughput RPC node server, and a snapshot sync tool. The project is built around a modular component architecture that allows assembling custom node behavior by swapping independent Rust crates for consensus, execution, mempool, and networking. The client distinguishes itself through a staged sync pipeline that downloads headers and bodies online before processing the res
Reads previously exported chain data files and writes their blocks and state into the local database.
Rust
عرض على GitHub5,652

Awesome File-Based Data Import GitHub Repositories

cwida/duckdb

zhisheng17/flink-learning

apple/turicreate

akaunting/akaunting

microsoft/vscode-copilot-chat

a16z-infra/ai-town

apache/jmeter

redis/RedisInsight

appbaseio/dejavu

jeecgboot/jimureport

Freika/dawarich

microsoft/azuredatastudio

AlaSQL/alasql

ipfs/ipfs-desktop

jlfwong/speedscope

pawelsalawa/sqlitestudio

inventree/InvenTree

apache/hive

mrdbourke/zero-to-mastery-ml

paradigmxyz/reth

استكشف الوسوم الفرعية