Why is ray-project/ray a recommended In-Memory Data Loading GitHub Repositories repository?

Creates datasets from local Python objects or arrays to integrate existing workflows with distributed computing tasks.

Why is dask/dask a recommended In-Memory Data Loading GitHub Repositories repository?

Reads datasets directly into the cluster to avoid network overhead and memory issues caused by embedding large local objects.

Why is apache/datafusion a recommended In-Memory Data Loading GitHub Repositories repository?

Creates a DataFrame from programmatically defined rows or Arrow record batches without external storage.

Why is deepchem/deepchem a recommended In-Memory Data Loading GitHub Repositories repository?

Featurizes data already held in memory, such as lists or pandas DataFrames, and checkpoints results to disk.

Why is petyosi/react-virtuoso a recommended In-Memory Data Loading GitHub Repositories repository?

Provides scroll-triggered data loading for endless scrolling and bidirectional fetching in virtualized lists.

5 مستودعات

Awesome GitHub RepositoriesIn-Memory Data Loading

Methods for creating datasets from local objects or arrays.

Distinguishing note: Focuses on integrating local Python objects into distributed workflows.

Explore 5 awesome GitHub repositories matching data & databases · In-Memory Data Loading. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

ray-project/ray
ray-project/ray
42,895عرض على GitHub
Ray is a distributed computing framework designed to scale Python and Java applications across clusters by abstracting task scheduling and resource management. It functions as a resource-aware execution engine that manages task dependencies, placement, and fault tolerance across networked compute nodes. At its core, the system provides a stateful actor model, allowing developers to define classes that run in dedicated processes to maintain and mutate internal state across remote method calls. The framework distinguishes itself through a robust cross-language interoperability layer, enabling f
Creates datasets from local Python objects or arrays to integrate existing workflows with distributed computing tasks.
Pythondata-sciencedeep-learningdeployment
عرض على GitHub42,895
dask/dask
dask/dask
13,746عرض على GitHub
Dask هو إطار عمل للحوسبة المتوازية وجدول مهام موزع مصمم لتوسيع نطاق سير عمل علوم البيانات في Python من أجهزة فردية إلى مجموعات (clusters) كبيرة. يعمل كمدير موارد للمجموعة يقوم بتنسيق المنطق الحسابي من خلال تمثيل المهام وتبعياتها كرسوم بيانية موجهة غير دورية. تسمح هذه البنية للنظام بأتمتة توزيع أعباء العمل عبر الأجهزة المتاحة مع إدارة متطلبات التنفيذ المعقدة. يتميز المشروع بمحرك تقييم كسول يؤجل عمليات البيانات حتى يتم طلبها صراحة، مما يتيح تحسين الرسم البياني العالمي وتخصيص الموارد بكفاءة. يتضمن خاصية تسريب البيانات الواعية بالذاكرة لمنع تعطل النظام عند معالجة مجموعات البيانات التي تتجاوز الذاكرة المتاحة، ويستخدم دمج الرسم البياني للمهام لدمج تسلسلات العمليات في خطوات تنفيذ واحدة، مما يقلل من عبء الجدولة والاتصال بين العقد. توفر المنصة سطح قدرات شاملاً لتحليلات البيانات واسعة النطاق، بما في ذلك دعم التعلم الآلي الموزع، وتكامل الحوسبة عالية الأداء، ومعالجة البيانات المتوازية. توفر أدوات واسعة النطاق لإدارة دورة حياة المجموعة، وتوصيف الأداء، والمراقبة في الوقت الفعلي لتنفيذ المهام. يمكن للمستخدمين نشر هذه البيئات عبر بنية تحتية متنوعة، بما في ذلك الأجهزة المحلية، ومزودي السحابة، والأنظمة الحاوية، ومجموعات الحوسبة عالية الأداء.
Reads datasets directly into the cluster to avoid network overhead and memory issues caused by embedding large local objects.
Pythondasknumpypandas
عرض على GitHub13,746
apache/datafusion
apache/datafusion
8,908عرض على GitHub
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
Creates a DataFrame from programmatically defined rows or Arrow record batches without external storage.
Rustarrowbig-datadataframe
عرض على GitHub8,908
deepchem/deepchem
deepchem/deepchem
6,545عرض على GitHub
DeepChem is an open-source Python framework for applying deep learning to molecular, chemical, and biological data, serving as a comprehensive toolkit for drug discovery and materials science. At its core, it provides a featurizer-pipeline abstraction that converts raw molecular data into numerical representations, including graph-based molecular structures, SMILES tokenization vocabularies, and disk-sharded dataset persistence for handling large-scale data that exceeds RAM capacity. The framework distinguishes itself through integrated molecular docking workflows that automate pocket detecti
Featurizes data already held in memory, such as lists or pandas DataFrames, and checkpoints results to disk.
Pythonbiologydeep-learningdrug-discovery
عرض على GitHub6,545
petyosi/react-virtuoso
petyosi/react-virtuoso
6,348عرض على GitHub
React Virtuoso is a React component library for rendering large datasets efficiently through virtualized lists, grids, tables, and chat interfaces. It automatically measures variable-height items at runtime, computes accurate scroll offsets without requiring fixed sizes, and renders only the items within the visible viewport plus a configurable buffer zone. The library manages scroll position through a state machine that tracks direction, position, and anchor items to handle auto-scroll, sticky headers, and bidirectional loading. The library distinguishes itself with specialized components fo
Provides scroll-triggered data loading for endless scrolling and bidirectional fetching in virtualized lists.
TypeScriptchatcomponent-libraryfeed
عرض على GitHub6,348

Awesome In-Memory Data Loading GitHub Repositories

ray-project/ray

dask/dask

apache/datafusion

deepchem/deepchem

petyosi/react-virtuoso

استكشف الوسوم الفرعية