74 dépôts
Hardware-level and algorithmic strategies to maximize processor throughput and minimize execution time.
Explore 74 awesome GitHub repositories matching software engineering & architecture · Computational Efficiency. Refine with filters or upvote what's useful.
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a high-performance processing pipeline, the application enables live face swapping and interactive video modifications during active streaming sessions or on pre-recorded media. The system distinguishes itself through a hardware-abstraction execution layer that dynamically routes co
Leverages GPU acceleration to power compute-intensive real-time media rendering.
This project is a general-purpose command-line filter that provides an interactive interface for processing standard input streams. It enables real-time fuzzy searching, data selection, and transformation, allowing users to navigate complex information or file systems directly within their terminal. By utilizing a pipe-oriented architecture, it integrates into existing shell pipelines and workflows to facilitate efficient data exploration. What distinguishes this tool is its highly extensible, event-driven design that allows for deep integration with external processes. It supports asynchrono
Work queues distribute search tasks across multiple CPU cores to maximize computational throughput.
Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into desktop, mobile, or server-side applications. By utilizing long short-term memory networks, the engine provides robust text extraction across more than one hundred languages and dozens of scripts. The project distinguishes itself through a sophisticated document layout analysis f
Distribute recognition workloads across multiple CPU cores using multi-threading to accelerate large-scale document processing tasks.
This project is a comprehensive, curated directory of high-quality libraries, tools, and educational resources for C and C++ development. It serves as an ecosystem discovery index, helping developers navigate the vast landscape of third-party components, frameworks, and technical documentation available for the language. The collection is distinguished by its focus on high-performance systems programming and technical mastery. It provides deep coverage of specialized domains including SIMD-accelerated data processing, compile-time template metaprogramming, and asynchronous event-driven archit
Highlights performance-critical libraries that leverage processor-level instructions to execute parallel operations on data.
Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade hardware. The platform distinguishes itself through hand-optimized kernels and automated computational graph techniques that maximize hardware throughput. It supports advanced training methodologies, including reinforcement learning for reasoning and efficient adapter-based fin
Executes low-level mathematical operations using hand-optimized kernels to maximize hardware throughput and minimize memory overhead.
ripgrep is a command-line utility designed for searching through large file trees and source code repositories. It functions as a recursive text processor that traverses directories to locate and display matching patterns, serving as a high-performance alternative to traditional search tools. The tool distinguishes itself through a focus on execution speed and intelligent file handling. It utilizes a finite automata-based regular expression engine to ensure linear time complexity and employs hardware-level acceleration for literal byte sequence scanning. By integrating with version control sy
Distributes search workloads across multiple CPU cores to maximize throughput during intensive text processing tasks.
This project serves as a comprehensive knowledge base and reference for distributed systems engineering and enterprise software architecture. It provides a structured collection of technical resources, design patterns, and methodologies intended to assist in the design, maintenance, and scaling of complex, high-performance software environments. The repository distinguishes itself by offering deep dives into core architectural concepts such as actor-based concurrency, aspect-oriented interception, and inversion-of-control containers. It emphasizes the practical application of distributed syst
Refine computational tasks to maximize processor utilization and prevent performance bottlenecks during complex processing.
This is a Python facial recognition library designed to detect, encode, and identify human faces in images and video. It functions as a biometric identification tool that converts facial features into numerical encodings to compare and match identities. The library provides a computer vision command line interface for batch processing face detection and recognition tasks across image directories. It also supports a GPU accelerated vision API that utilizes CUDA and NVIDIA hardware to increase the speed of facial analysis and identification. Its capabilities cover human face detection and faci
Distributes image processing tasks across multiple CPU cores or GPU hardware to increase total throughput.
This project is a Chinese text segmentation library and tokenizer designed to split Chinese sentences into individual words. It serves as a natural language processing tool for splitting characters into words, tagging parts of speech, and extracting keywords using statistical analysis. The library distinguishes itself through support for custom dictionary configuration and vocabulary file management, allowing users to override default segmentation rules for domain-specific accuracy. It also includes a TF-IDF keyword extractor to identify significant words and core topics within documents. Th
Implements multi-process parallel execution to distribute heavy text segmentation workloads across multiple CPU cores.
Facefusion is a modular framework designed for automated image and video manipulation, specializing in tasks such as face swapping, enhancement, and restoration. It functions as a computer vision processing pipeline that chains independent machine learning modules to perform complex transformations, including facial animation, age modification, and lip synchronization. The system is built to handle both real-time interactive feeds and large-scale batch processing tasks. The platform distinguishes itself through a highly extensible architecture that supports custom processing modules and inter
Leverages hardware acceleration backends to optimize intensive machine learning inference for visual content.
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
Compiles functions to merge operations and fuse kernels, reducing memory usage and increasing execution speed for complex workflows.
Anime4K is a collection of graphics shaders and image processing algorithms designed to enhance the visual quality of animated media. It functions as a real-time upscaling engine that increases the resolution of video content during playback, allowing for higher fidelity viewing without the need to permanently re-encode source files. The project distinguishes itself by utilizing hardware-accelerated rendering to perform complex image reconstruction directly on the graphics card. By employing a pass-based pipeline, it chains multiple processing stages to refine frames iteratively, ensuring tha
Leverages graphics hardware to enhance visual clarity and resolution of animated media during rendering.
Backtrader is a Python framework designed for the development, backtesting, and live execution of algorithmic trading strategies. It provides a comprehensive environment for quantitative finance, allowing users to simulate trading logic against historical market data or connect directly to brokerage platforms for automated real-time trading. The project distinguishes itself through a unified event-driven architecture that treats backtesting and live trading with the same API. This consistency is supported by a flexible data-feed abstraction layer that normalizes diverse financial sources, ena
Run multiple strategy iterations in parallel across CPU cores while minimizing memory usage to return essential performance metrics.
Crystal is a statically typed, compiled programming language designed for high performance and memory safety. It leverages an LLVM-based compiler to translate source code into optimized machine-executable binaries, while its type-inference-based static analysis enforces strict safety rules during the build process. The language distinguishes itself through a fiber-based concurrent runtime that manages lightweight execution units for asynchronous input and output without blocking the main process. It also features a powerful compile-time macro system that allows for the inspection and transfor
Targets specific hardware instructions or processor models for individual functions to achieve maximum performance gains.
Mamba is a deep learning framework designed for building and training sequence models that process long-range data dependencies with linear-time computational efficiency. By utilizing selective state space modeling, the library enables the construction of neural network architectures that replace traditional attention mechanisms with high-performance state space operations. The framework distinguishes itself through the use of data-dependent state gating, which allows the model to dynamically filter information flow based on the input sequence. To ensure high throughput, it incorporates hardw
Provides hardware-optimized custom kernels to maximize throughput for complex state space calculations.
This project is an open source Linux GPU kernel driver implemented as a loadable kernel module. It functions as a GPU firmware loader, providing the low-level driver services necessary to enable direct communication between the operating system and graphics processing units. The driver utilizes a dual-module architecture that separates GPL-licensed kernel code from proprietary firmware blobs. This system extracts and links signed binary firmware images into the kernel modules at driver load time. The project provides driver support for Turing-architecture GPUs and all subsequent newer hardwa
Employs compile-time directives to generate driver binaries optimized for specific graphics processor architectures.
AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs. The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer mo
Analyzes compute graphs to determine and insert efficient data layouts for optimized hardware performance.
Lean is an algorithmic trading engine and quantitative finance platform designed for the development, backtesting, and live execution of automated trading strategies. It provides a comprehensive framework for processing time-series market data, managing multi-asset portfolios, and conducting quantitative research across diverse financial markets. The platform distinguishes itself through a modular, event-driven architecture that decouples strategy logic from data ingestion and brokerage connectivity. By utilizing standardized interfaces for data providers and brokerage abstractions, it enable
Runs multiple strategy iterations in parallel to identify optimal parameters while managing resource consumption.
Abu is an algorithmic trading framework designed for the development, backtesting, and optimization of automated trading strategies. It functions as a quantitative financial analysis library that processes time-series data to identify market trends, volatility patterns, and key price levels. The platform distinguishes itself through a modular architecture that integrates diverse financial data sources and a rule-based engine for automated risk management. It enables users to construct complex trading signals by layering technical indicators and machine learning models, while simultaneously en
Optimizes strategy parameters using grid search and custom scoring to improve performance and stability.
ImageMagick is a comprehensive software suite for the creation, editing, composition, and conversion of digital images. It functions as both a command-line utility for batch processing and automation, and as a programming library that allows developers to integrate advanced image manipulation capabilities into external applications. The project is distinguished by its modular architecture, which supports hundreds of image formats through a pluggable coder system and external delegate libraries. It is designed for high-performance environments, utilizing memory-mapped pixel caching, stream-ori
Supports large-scale processing by offloading storage to remote servers and parallelizing tasks across hardware.