5 repository-uri
Collections of software frameworks for data science and analytics.
Distinguishing note: Focuses on general data science tooling.
Explore 5 awesome GitHub repositories matching artificial intelligence & ml · Data Science Frameworks. Refine with filters or upvote what's useful.
This project is a comprehensive, community-driven knowledge repository that serves as a centralized hub for data science resources. It provides a structured index of educational materials, software packages, and professional development tools designed to support both students and practitioners in navigating the data science landscape. The repository distinguishes itself through a hierarchical taxonomy that organizes a vast collection of external links into a human-readable, markdown-based document. By relying on distributed contributions, the project maintains an up-to-date snapshot of the fi
Provides a curated list of frameworks for data science and analytics.
This project is a Python-based framework that functions as a generative AI agent for programmatic data analysis. It enables users to interact with structured data sources through natural language prompts, translating these requests into executable code to perform analysis, data cleaning, and visualization. By maintaining conversational context across multi-turn interactions, the system allows for iterative exploration and the building of complex data narratives. The framework distinguishes itself through a robust semantic layer and secure execution model. It maps raw datasets to descriptive m
Provides a programmatic interface for integrating artificial intelligence into data workflows to automate reporting and analysis.
Kedro is a data science pipeline framework and production toolbox designed to build reproducible, modular workflows using software engineering best practices. It functions as a data engineering orchestrator and catalog manager, bridging the gap between interactive analysis and maintainable production pipelines. The framework distinguishes itself by using a data catalog to decouple data access from processing logic and providing tools to transition analysis from interactive notebooks into structured workflows. It includes a workflow visualization tool that generates visual maps of data pipelin
Builds reproducible and maintainable data science workflows using software engineering patterns for environmental consistency.
Kedro is a data science pipeline framework and orchestration tool designed to build reproducible and modular data engineering workflows. It functions as an MLOps project template and Python data workflow tool that enforces software engineering best practices to move projects from prototype to production. The system distinguishes itself through a centralized data catalog manager that abstracts data access and versioning across various file formats and cloud storage systems. It further separates processing logic from data access via a lazy-loading data registry and provides a standardized proje
Provides a modular framework for building reproducible data engineering and data science workflows using software engineering best practices.
Acest repository oferă o colecție de implementări Python pentru inferența cauzală, concepute pentru a estima impactul unor intervenții specifice folosind date observaționale. Acesta servește drept set de instrumente statistice pentru cercetători, permițându-le să izoleze semnalele cauzale de factorii de confuzie complecși în seturi de date care nu beneficiază de control experimental. Framework-ul permite aplicarea unor metodologii riguroase pentru studierea determinanților de sănătate și evaluarea intervențiilor de politică publică. Utilizând modelarea cauzală structurală și grafurile aciclice direcționate, biblioteca permite utilizatorilor să mapeze dependențele cauzale și să identifice variabilele necesare pentru o estimare imparțială. Aceasta suportă simularea rezultatelor contrafactuale pentru a compara potențiale rezultate în diferite scenarii de tratament, oferind o abordare structurată pentru înțelegerea relației cauză-efect. Toolkit-ul acoperă o gamă largă de tehnici de estimare statistică, inclusiv ponderarea prin probabilitate inversă, calculul g-formula și analiza de regresie parametrică. Aceste instrumente computaționale sunt organizate pentru a facilita analiza datelor observaționale în contexte de cercetare epidemiologică și socială. Proiectul este distribuit sub formă de colecție de Jupyter Notebooks care conțin aceste framework-uri și implementări statistice.
Provides a framework for applying rigorous statistical techniques to determine cause and effect relationships in research data.