12 Repos
Infrastructure configurations designed to isolate sensitive data processing within secure, private network boundaries.
Distinguishing note: Focuses on the deployment of isolated processing services for privacy compliance rather than general encryption libraries.
Explore 12 awesome GitHub repositories matching security & cryptography · Private Data Processing Environments. Refine with filters or upvote what's useful.
Shannon is an integrated security platform designed for autonomous penetration testing, static and dynamic analysis, and automated vulnerability remediation within self-hosted, private infrastructure. It functions as a unified security suite that orchestrates the entire lifecycle of vulnerability management, from initial discovery and reachability prioritization to the generation and verification of code-level patches. The platform distinguishes itself through its agentic approach to security, deploying autonomous agents to execute both black-box and white-box exploits against running applica
Deploys security testing tools within isolated environments to keep sensitive source code and analysis data within the local perimeter.
This repository serves as a comprehensive research platform and toolkit for advancing machine learning, quantum computing, and large-scale scientific data analysis. It provides foundational frameworks for developing complex algorithmic systems, offering the necessary infrastructure for distributed training, computational graph execution, and high-performance model development. The project distinguishes itself by integrating specialized research domains with robust, privacy-preserving methodologies. It supports diverse scientific discovery through tools for quantum simulation, physics-informed
Collects and combines sensitive information from multiple devices using secure, privacy-preserving cryptographic protocols.
Marker is a comprehensive document processing platform designed to automate the conversion, extraction, and structuring of data from complex files. It functions as an orchestration engine that chains modular processing steps into versioned, reusable pipelines, allowing organizations to standardize document handling and automate repetitive business tasks at scale. The platform distinguishes itself through its support for secure, private infrastructure deployment, enabling users to run containerized services within their own environments to maintain strict data privacy. It features specialized
Deploying containerized processing services within private environments to maintain data privacy and control over sensitive document workflows.
Prefect is a workflow orchestration platform designed to define, schedule, and monitor complex data pipelines as Python code. It functions as a container-native engine that wraps individual tasks in isolated environments, ensuring consistent dependencies and resource allocation across diverse infrastructure. By utilizing a state-machine-based orchestration model, the system tracks execution progress through discrete transitions and persistent event logs to maintain reliable and observable task processing. The platform distinguishes itself through a decoupled worker-API architecture, which sep
Stores tokens, usernames, and passwords to enable secure access to private repositories.
DocsGPT is a retrieval-augmented generation platform and private knowledge base used to build AI agents that perform grounded search and analysis. It functions as a multi-model AI orchestrator and enterprise agent builder, allowing for the integration of various local and cloud language models to customize reasoning and text generation. The project provides a visual environment for developing automated assistants using conditional logic and third-party API connectivity. It enables the creation of private AI agents capable of performing enterprise search and detailed document analysis using pr
Enables detailed analysis and insight extraction from private PDFs, office files, and images.
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
Hosts data processing pipelines within dedicated or private cloud infrastructure to ensure data security, regulatory compliance, and environment isolation.
Petals is a decentralized framework and inference engine for running large language models across a peer-to-peer network. It enables the execution of models that exceed the memory of any single machine by splitting computations and model layers across a collaborative swarm of GPUs. The system functions as a collaborative compute network where participants share local GPU resources and host model weights. It supports distributed prompt-tuning to adapt massive models to specific tasks and allows for the establishment of private compute swarms to process sensitive data within restricted, trusted
Enables the creation of restricted networks of trusted hardware to process sensitive data in isolation.
Kreuzberg is a document extraction engine that converts PDFs, Office files, images, and over 90 other formats into clean, structured text and metadata. It is built around a compiled Rust core that can be used as a native library, a command-line tool, a REST API server, or a WebAssembly module for browser-based processing. The system is designed to run entirely on self-hosted infrastructure, with no data leaving the user's environment. What distinguishes Kreuzberg is its breadth of integration surfaces and its pipeline architecture. It exposes extraction capabilities through native bindings fo
Processes documents entirely on self-hosted infrastructure with no data leaving the environment.
The Snyk CLI is a command-line security scanner that detects known vulnerabilities across open-source dependencies, proprietary application code, container images, and infrastructure-as-code configuration files. It also serves as a platform management tool, allowing users to configure organizations, users, SSO, and reporting from the terminal rather than the web dashboard. The CLI integrates directly into development workflows, enabling scanning within IDEs, build pipelines, and version control systems. It implements static analysis with interfile data flow analysis to find complex security f
Analyzes private Git repositories by deploying proxies that bridge scanning services and internal code.
Costrict is an AI software engineering agent and coding assistant designed for enterprise-grade development. It functions as a multi-model AI orchestrator that generates, completes, and reviews code, while serving as a remote development environment that bridges browser interfaces with remote directories for file management and terminal execution. The platform distinguishes itself through an AI code review system that utilizes multi-model verification and repository indexing to ensure code quality. It employs a structured agent approach that decomposes complex natural language requirements in
Implements isolated infrastructure configurations and end-to-end encryption to ensure data privacy.
Starred is a utility that automates the management and documentation of starred repositories. It functions by fetching repository metadata through the GitHub API and organizing these projects into structured, categorized lists based on programming language or topic. The tool distinguishes itself by maintaining these lists through automated, scheduled workflows that synchronize data directly to a dedicated repository. It supports the inclusion of private repositories in the generated output, ensuring that a user's complete collection is documented and backed up. The project provides a configu
Securely processes private codebases using temporary authentication tokens.
Dieses Projekt ist ein privates Dokumentenanalyse-Tool, das die konversationelle Interaktion mit PDF-Dateien ermöglicht, indem sämtliche Sprachmodell-Inferenz und -Verarbeitung vollständig auf der lokalen Maschine ausgeführt wird. Durch das Ausführen von Modellen direkt im Browser oder in der lokalen Umgebung wird sichergestellt, dass sensible Benutzerdaten offline bleiben und für externe Server oder Cloud-Anbieter unzugänglich sind. Das System nutzt Retrieval Augmented Generation (RAG), um kontextbezogene Antworten zu liefern, unterstützt durch lokale Dokumenttext-Extraktion und Vektor-Embedding-Indexierung. Diese Architektur ermöglicht semantische Suche und Informationsabruf, ohne auf externe Datenbankdienste oder Internetkonnektivität angewiesen zu sein. Über die grundlegenden Konversationsfähigkeiten hinaus enthält das Tool Observability-Funktionen, die die internen Schritte des Modell-Reasonings und der Retrieval-Ketten protokollieren. Dieses Execution-Tracing ermöglicht das Debugging von Performance-Problemen und die Optimierung der Antwortqualität während des Dokumentenanalyseprozesses.
Processes sensitive PDF files locally to answer questions without sending data to external servers.