2. Orchestration & Memory
AIRS (AI-Oriented Resource Scheduler):Predicts short- and mid-term resource demand based on historical task patterns, real-time workload, and node conditions, enabling intelligent scheduling of GPU/TPU/CPU resources with priority given to latency-sensitive tasks.
Layered Context Storage (Session Memory):Abstracts conversational context into semantic blocks and manages them hierarchically (hot cache, warm cache, cold storage), achieving efficient read/write access and cost-effective retention.
Partial Inference & Recomposition:For long-context tasks, Cottonia supports splitting inference into multiple stages, caching intermediate states and reusing them when needed — avoiding redundant recomputation from scratch.
Last updated