Lucida is a multimodal AI assistant framework and containerized microservice orchestrator. It provides a platform for building agents that process and integrate speech, vision, and text inputs to perform intelligent tasks, supported by a retrieval-augmented generation system for storing and querying factual data from texts, URLs, and images. The framework features a state-graph workflow engine to route user requests through a sequence of microservices using a predefined state machine. It also includes an extensible plugin interface that allows for the integration of custom functional modules
This project is a framework for developing multimodal AI agents that function as programmable participants in real-time communication rooms. It enables the construction of agents that can see, hear, and speak by integrating speech-to-text, large language models, and text-to-speech pipelines to facilitate low-latency, natural conversations. The system is distinguished by its advanced orchestration of real-time media and conversational flow, including support for full-duplex speech, preemptive response generation, and sophisticated interruption management. It further differentiates itself throu