30 open-source projects similar to xai-org/x-algorithm, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best X Algorithm alternative.
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
The algorithm-ml is a machine learning ranking engine designed to personalize content feeds by calculating relevance scores for items based on user interests and historical interaction data. It functions as a recommendation system that processes user behavior and item metadata to determine the optimal order of content for individual users. The system utilizes a multi-stage ranking architecture that filters large pools of candidate items into smaller sets before applying computationally expensive scoring models. It employs gradient-boosted decision tree ensembles to capture non-linear relation
The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver a unified, ranked experience. The system utilizes a high-performance machine learning serving infrastructure to execute deep learning models that predict engagement probabilities in real-time. It distinguishes itself through a hybrid retrieval strategy that combines graph-traver
Deepface is a comprehensive deep learning library for facial recognition and demographic analysis. It provides a modular pipeline that handles the entire lifecycle of facial processing, including detection, geometric alignment, and the transformation of facial images into high-dimensional numerical vector embeddings for identity verification and similarity comparison. The library distinguishes itself through a model ensemble approach, which combines predictions from multiple pre-trained neural networks to improve classification accuracy and reduce bias. It also integrates advanced security fe
Gorse is a personalized recommendation engine server and machine learning pipeline designed to suggest items to users based on their behavior and preferences. It operates as a distributed system that separates training, candidate generation, and serving nodes to support high-throughput workloads. The system utilizes a multi-stage recommendation pipeline to refine results through retrieval, scoring, and reranking. It generates personalized suggestions using collaborative filtering, matrix factorization, and item-to-item similarity models, while also providing non-personalized and fallback reco
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
fun-rec is a learning guide and framework for building personalized recommendation systems, covering everything from deep learning ranking to generative recommendation paradigms. It provides instructional content on constructing industrial-grade architectures that span offline data processing and real-time online serving. The project distinguishes itself by focusing on generative recommendation, treating the suggestion process as a sequence-to-sequence task using large language models and transformer models to generate item identifiers rather than traditional ranking lists. It also emphasizes
Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document processing, retrieval-augmented generation, and multi-step reasoning. It provides a comprehensive platform for developing document-based question answering systems, allowing users to chain language models, prompt templates, and external tools into complex, automated pipelines. The system distinguishes itself through a highly modular architecture that emphasizes component-based composition and schema-driven data exchange. It supports autonomous agents capable of decomposing complex q
This project is a C++ vector similarity engine and implementation of the Hierarchical Navigable Small World algorithm. It provides a header-only library for performing approximate nearest neighbor searches in high-dimensional spaces, alongside Python bindings that expose these indexing and search capabilities to data science environments. The engine enables real-time embedding retrieval and high-dimensional similarity search using a multi-layered graph structure to balance search speed and accuracy. It supports custom distance metrics to calculate similarity between vectors in various mathema
USearch is a high-performance vector similarity search engine and approximate nearest neighbor index designed for dense embeddings. It functions as a low-level vector database core and high-dimensional vector indexer, providing the primitives necessary to store and retrieve vectors across massive datasets. The engine distinguishes itself through hardware-level SIMD acceleration for distance kernels and a proximity-graph indexing system that enables fast retrieval across billions of vectors. It supports multi-precision vector quantization to balance memory usage and accuracy, and utilizes memo
Annoy is a C++ library designed for approximate nearest neighbor search in high-dimensional vector spaces. It functions as a vector similarity search engine that constructs static, disk-based data structures to facilitate fast lookups. By mapping identifiers to vector data and persisting these structures to disk, the library enables efficient, memory-mapped access to large datasets. The project distinguishes itself through the use of random projection trees and distance-metric-based partitioning, which organize data into hierarchical binary trees to balance search precision against computatio
PaddleRec is a deep learning recommendation library and distributed model training framework based on the PaddlePaddle framework. It provides a suite of industrial-scale algorithms and models for user matching and personalized content ranking. The project includes a recommendation inference engine for exporting and serving trained models to production environments for real-time online requests. It enables the implementation of deep learning recommendation algorithms for processing massive behavioral datasets. The framework covers large-scale model training across distributed computing cluste
hnswlib is a header-only C++ library and vector indexing engine designed for high-dimensional approximate nearest neighbor search. It organizes large collections of embeddings into a searchable graph structure to enable rapid proximity queries and distance calculations. The system utilizes Hierarchical Navigable Small World graphs to achieve fast vector similarity search. It distinguishes itself by allowing the definition of custom distance metrics and similarity functions to adapt calculations to specific data requirements. The engine covers the full indexing lifecycle, including incrementa
TensorFlow Similarity is a Python framework designed for training neural networks to learn high-dimensional vector representations and perform similarity-based retrieval. It provides a comprehensive toolkit for metric learning, enabling the development of systems that group similar items together in vector space and identify them through distance-based comparisons. The library distinguishes itself by integrating specialized training techniques, such as contrastive and triplet-based learning, with robust data management tools that ensure stable model convergence. It supports self-supervised re
DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets. The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f
Gensim is a natural language processing toolkit designed for large-scale text analysis and the training of semantic vector embeddings. It provides a framework for identifying latent thematic structures within document collections and calculating semantic similarity between text segments using unsupervised statistical algorithms. The project is distinguished by its ability to handle datasets that exceed available system memory through incremental corpus streaming, which processes documents one at a time from disk. It utilizes sparse vector representations and dictionary-based token mapping to
Feast is a machine learning feature store and MLOps data infrastructure layer. It provides a centralized system for managing and serving features across offline training and online production environments, utilizing an online feature serving layer for low-latency retrieval. The project centers on a feature registry that acts as a central catalog for defining, governing, and discovering feature services. It employs a unified data access layer to decouple feature retrieval from physical storage and includes a point-in-time data generator to create historically accurate training datasets that pr
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
RediSearch is a Redis module that adds secondary indexing, full-text search, aggregation, and vector similarity search directly into the in-memory data store. It operates as an in-process search engine, extending the core key-value store with capabilities for indexing hash and JSON documents, enabling fast field-level lookups beyond primary key access. The module provides a full-text search engine built on inverted indexes, supporting stemming, fuzzy matching, and relevance scoring via tf-idf. It also includes a vector similarity search engine using a Hierarchical Navigable Small World graph
Vald is a distributed, cloud-native search engine designed for high-dimensional vector data. It functions as an approximate nearest neighbor search platform, enabling the identification of similar data points across massive datasets through horizontal scaling and distributed indexing. The system is built for container orchestration environments, utilizing custom resource controllers to automate cluster lifecycle management and infrastructure state. It employs graph-based indexing to perform rapid similarity lookups and supports zero-downtime operations by decoupling index construction from qu
zvec is an embedded vector database engine and indexing library designed for high-dimensional similarity search. It functions as a hybrid search engine and a retrieval-augmented generation knowledge base, allowing for the storage and retrieval of dense and sparse vectors. The system is distinguished by its hybrid retrieval pipeline, which fuses vector similarity, full-text keyword matching, and scalar metadata filtering into single query operations. It supports a plugin-based model integration system for registering custom embedding models and rerankers, as well as language bindings for nativ
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Forem is an open-source platform designed for building and managing technical communities. It functions as a social publishing engine that enables members to share long-form content, participate in threaded discussions, and engage through social interactions. The platform provides tools for organizations to maintain branded profiles, host community hackathons, and facilitate collaborative learning through structured educational tracks. Beyond its social features, Forem integrates advanced capabilities for AI agent workflow orchestration and codebase knowledge graphing. It allows developers to
This project is an edge computing development toolkit and serverless command line interface used to develop, test, and deploy serverless functions to a global edge network. It serves as an edge runtime bundler and resource orchestrator, managing the entire lifecycle of edge projects from local development to worldwide distribution. The toolkit distinguishes itself through distributed workflow management, coordinating stateful instances and the durable execution of long-running processes across the edge. It also provides specialized integrations for edge AI, including the management of vector
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
This project is a community-curated directory of open-source software designed for deployment in private server environments and home labs. It serves as a comprehensive resource for discovering independent, self-hosted alternatives to mainstream cloud services, enabling users to maintain full data ownership and control over their digital infrastructure. The directory is structured through a hierarchical taxonomy that organizes a vast collection of applications into logical categories, ranging from media management and data analytics to private communication and team productivity tools. It dis
TelegramGroup is a comprehensive automation framework designed for managing multiple messaging accounts and orchestrating complex administrative workflows. It functions by emulating client-side sessions to interact with platform APIs, enabling centralized control over user profiles, persistent session data, and distributed network routing through proxy infrastructure. The platform distinguishes itself through its modular architecture, which supports independent plugins for tasks such as artificial intelligence integration, content mirroring, and automated community moderation. It provides a s
This project is a community-driven knowledge repository and software resource directory focused on artificial intelligence and professional productivity tools. It functions as a markdown-based knowledge base that organizes information into a hierarchical taxonomy, allowing users to discover, compare, and evaluate software solutions based on specific business and technical requirements. The platform distinguishes itself through a decentralized peer-review model, where the directory is maintained and updated by the community via a pull-request workflow. This collaborative approach ensures that