30 open-source projects similar to princeton-nlp/tree-of-thought-llm, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Tree Of Thought Llm alternative.
This project is a reasoning framework and agent orchestrator that implements the Tree of Thoughts methodology to improve the logical output of large language models. It functions as a search-based problem solver, representing complex tasks as a state-space branching model where discrete thoughts serve as nodes and logical transitions serve as edges. The system coordinates multiple model agents to generate, evaluate, and prune candidate solutions. It employs depth-first search heuristics and recursive evaluation to explore multiple reasoning paths, filtering out low-quality branches to iterate
JARVIS is a system for large language model task orchestration, deployment management, and automation benchmarking. It utilizes a task orchestrator to decompose complex requests into actionable steps and coordinates various expert models to synthesize final responses. The project includes an AI model deployment manager to handle the local deployment of expert models across different hardware scales. It further provides an AI workflow API consisting of web endpoints used to trigger automated task workflows and retrieve results from model selection stages. The framework incorporates an automat
[Website](http://craftjarvis-jarvis1.github.io/) [Paper](https://arxiv.org/abs/2311.05997) [Twitter](https://twitter.com/jeasinema/status/1723900032653643796)
This is the official implementation of Graph of Thoughts: Solving Elaborate Problems with Large Language Models. This framework gives you the ability to solve complex problems by modeling them as a Graph of Operations (GoO), which is automatically executed with a Large Language Model (LLM) as…
Code for our ACL 2023 Paper "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models".
OptiLLM is an AI reasoning and optimization framework that functions as an API proxy to enhance the response quality of large language models. It intercepts requests to apply inference-time reasoning logic and output refinement before returning results to the client. The project distinguishes itself through a combination of inference-time search trees for logical verification and an anonymization pipeline that removes personally identifiable information from prompts. It further extends model capabilities by orchestrating external tools, including real-time code execution and autonomous web re
This project is a comprehensive framework for building, evaluating, and connecting autonomous agent systems. It provides a library of standardized architectural patterns for implementing complex agent workflows, including multi-agent orchestration, iterative reasoning, and memory management. By offering a unified interface for model providers, the framework allows for consistent agent execution across different artificial intelligence services. The framework distinguishes itself through a focus on rigorous benchmarking and deterministic control. It includes a suite of tools for evaluating age
Criterion is a statistics-driven microbenchmarking library and performance regression tool for Rust. It provides a framework for isolating and measuring small code segments, using statistical analysis to eliminate noise and ensure reliable, repeatable measurements of execution speed. The tool distinguishes itself through a performance visualization suite that generates HTML reports and graphs to track performance trends and throughput. It includes a system for comparing current execution times against stored baselines to identify and prevent performance drops. The library covers asynchronous
This project is a framework for the iterative optimization and validation of LLM agent skills. It functions as an agent capability orchestrator and prompt optimizer, utilizing an evaluation framework to measure performance through weighted rubrics and automated rewriting. The system distinguishes itself through a closed-loop optimization cycle that employs independent reviewer agents to prevent anchoring effects and a ratchet-based version control mechanism that automatically reverts changes if they fail to improve baseline scores. It also features exploratory structural rewriting to overcome
mm-cot is a multimodal language model reasoning framework designed for training and evaluating models that perform chain-of-thought reasoning across text and image data. It provides core systems for implementing step-by-step logical rationales to improve the accuracy of predictions, including a vision-language model trainer and a multimodal benchmark evaluator. The framework distinguishes itself through a decoupled rationale generation process that separates the training of logical justifications from the inference of final answers. It utilizes vision-transformer feature extraction and image
MemGPT is a memory management framework and external memory layer for large language models. It functions as a platform for building stateful AI agents that maintain a persistent identity and continuous context across multiple sessions. The system enables agents to bypass fixed context window limitations by using a virtual context windowing approach. This allows models to manage their own memory through internal commands to search, update, and delete stored information within a hierarchical structure of short-term working context and long-term archival storage. The framework provides a local
LangChain is an orchestration framework designed for building, managing, and deploying applications powered by large language models. It provides a unified integration layer that normalizes disparate model provider APIs into a consistent set of primitives, enabling developers to build complex, multi-step AI workflows that manage state, memory, and tool execution. The project distinguishes itself through a durable execution runtime that maintains persistent state across long-running processes by checkpointing progress to external storage. It models agent workflows as directed graphs, allowing
Cheer AI up with the "let's think step by step" prompt? More plz. Let’s think not just step by step, but also one by one.
1. STaR 2. Mesh Transformer JAX 1. Updates 3. Pretrained Models 1. GPT-J-6B 1. Links 2. Acknowledgments 3. License 4. Model Details 5. Zero-Shot Evaluations 4. Architecture and Usage 1. Fine-tuning 2. JAX Dependency 5. TODO
Decoupling Reasoning from Observations for Efficient Augmented Language Models
We instantiate Pangu on knowledge base question answering (KBQA), which is representative testbed for grounded language understanding with a highly complex and heterogeneous environment.
MetaGPT is an agentic workflow orchestrator and multi-agent framework designed to transform natural language requirements into complete software deliverables. It functions as an AI software engineering suite that automates the creation of technical documentation, data structures, and source code by treating natural language as a programming environment. The system distinguishes itself by assigning professional roles to large language models, creating specialized agent teams that collaborate through a shared communication structure. It utilizes standard operating procedures to convert organiza
Zep is a long-term memory layer and persistent storage system for large language model applications. It functions as a memory service and vector database orchestrator that manages chat history, user preferences, and context retrieval to reduce hallucinations in AI agents. The system maintains a temporal knowledge graph that stores interaction data as dated facts to track how user preferences and environments evolve over time. It combines these knowledge graphs with a store for persisting unstructured message data at the user and session levels. The platform provides capabilities for AI conte
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference. The library features extensive support for model optimization and
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
CrewAI is a multi-agent orchestration framework and autonomous agent workflow engine. It provides a system for coordinating autonomous AI agents with specific roles and goals to solve complex tasks through collaborative intelligence. The framework distinguishes itself through a collaborative AI agent system that enables multiple language model instances to share intelligence and execute multi-step objectives via role-playing. It incorporates human-in-the-loop mechanisms, allowing for manual review checkpoints to validate decisions and refine outcomes within autonomous execution paths. The pl
Haystack is an orchestration framework designed for building complex search and generative AI pipelines. It functions as an agentic workflow engine, enabling the construction of automated sequences that allow AI agents to perform multi-step reasoning and data analysis. The framework utilizes a modular, component-based architecture that connects processing steps into directed acyclic graphs. By employing a provider-agnostic integration layer, it decouples core logic from specific external AI services and vector databases, allowing for the flexible exchange of underlying technologies. This desi
This repo contains the source code for making plans based on problems decribed by natural language.
LangGraph is a framework for building stateful, multi-step agentic workflows by modeling application logic as a directed graph. It provides a runtime environment where complex tasks are orchestrated through interconnected nodes and edges, allowing developers to manage state transitions, persistent memory, and control flow across long-running automated processes. The platform distinguishes itself through its native support for human-in-the-loop automation, enabling developers to define breakpoints that pause execution for manual review, modification, or approval. It also features checkpoint-ba