TaskWeaver is an LLM agent framework that interprets natural language requests and executes them as Python code, SQL queries, or shell commands. It functions as a conversational code interpreter that maintains stateful data structures across turns, generating executable code from user prompts within a session-based environment. The system is designed as a self-hosted AI agent platform that can be deployed in Docker, managing sessions and providing a web UI for data analytics and automation tasks.
The framework distinguishes itself through a role-based multi-agent architecture that divides the system into specialized roles like Planner and Code Interpreter, which collaborate through structured message passing to complete complex tasks. It operates as a multi-model LLM gateway, connecting to OpenAI, Azure, Gemini, Ollama, and other LLM APIs for flexible model selection per task, with the ability to route requests to different models per component. TaskWeaver extends agent capabilities through plugin-based extensibility, wrapping custom algorithms as reusable plugins that the agent can call during code generation and execution, while using embedding-based plugin selection to load the most relevant plugins for each request.
The system supports code-driven data analytics by generating and executing Python code for data manipulation, analysis, and visualization tasks, with session-based state management that preserves context and data structures across multiple interaction rounds. It includes a code-generation-and-execution pipeline that runs in a sandboxed environment, with pre-execution code verification that inspects code for potential issues and provides fix suggestions before running it. The platform offers containerized deployment through Docker, packaging the agent and its dependencies for isolated, reproducible execution with host filesystem access and web-based interaction.
TaskWeaver provides a web-based chat interface and command-line interaction, with chat history compression that summarizes older conversation rounds to manage context window limits. It streams planning and execution events to an external dashboard for real-time observability of agent behavior and performance. The system is configured through a single JSON project file and supports custom role definition, domain-specific knowledge incorporation, and per-component LLM assignment.