Skip to content

Architecture

CodexA is organized as a layered system with 26 packages under semantic_code_intelligence/.

High-Level Overview

mermaid
graph TB
    subgraph "Interface Layer"
        CLI["CLI<br/>36 click commands"]
        WEB["Web UI<br/>REST API + browser"]
        BRIDGE["Bridge Server<br/>HTTP JSON protocol"]
        MCP["MCP Server<br/>Model Context Protocol"]
        LSP["LSP Server<br/>Language Server Protocol"]
        TUI["TUI<br/>Interactive terminal"]
    end

    subgraph "Intelligence Layer"
        TOOLS["Tool Protocol<br/>8 built-in tools"]
        LLM["LLM Providers<br/>OpenAI · Ollama · Mock"]
        CONTEXT["Context Builder<br/>Symbol · Call Graph · Deps"]
        EVOLUTION["Evolution Engine<br/>Self-improving loop"]
    end

    subgraph "Analysis Layer"
        CI["Quality & CI<br/>Metrics · Hotspots · Impact"]
        ANALYSIS["Code Analysis<br/>Explanation · Summary"]
        WORKFLOW["Workflow Intel<br/>Trace · Hotspots · Impact"]
    end

    subgraph "Core Layer"
        PARSING["Parsing<br/>tree-sitter · 12 languages"]
        INDEXING["Indexing<br/>Chunking · Scanning"]
        EMBEDDINGS["Embeddings<br/>sentence-transformers"]
        SEARCH["Search<br/>FAISS · BM25 · Hybrid"]
    end

    subgraph "Foundation Layer"
        STORAGE["Storage<br/>VectorStore · SymbolRegistry"]
        CONFIG["Config<br/>Pydantic models"]
        PLUGINS["Plugins<br/>22 hooks · PluginManager"]
        WORKSPACE["Workspace<br/>Multi-repo management"]
    end

    CLI --> TOOLS
    WEB --> TOOLS
    BRIDGE --> TOOLS
    MCP --> TOOLS
    TUI --> TOOLS

    TOOLS --> CONTEXT
    TOOLS --> LLM
    TOOLS --> CI
    TOOLS --> ANALYSIS

    EVOLUTION --> CI
    EVOLUTION --> LLM

    CI --> PARSING
    ANALYSIS --> CONTEXT
    CONTEXT --> SEARCH
    CONTEXT --> PARSING

    SEARCH --> EMBEDDINGS
    SEARCH --> INDEXING
    INDEXING --> PARSING

    PARSING --> STORAGE
    EMBEDDINGS --> STORAGE
    SEARCH --> STORAGE
    INDEXING --> STORAGE

    PLUGINS -.->|hooks| CLI
    PLUGINS -.->|hooks| SEARCH
    PLUGINS -.->|hooks| INDEXING
    PLUGINS -.->|hooks| CI
    PLUGINS -.->|hooks| TOOLS

Layer Description

Interface Layer

The entry points for users and AI agents. All interfaces share the same underlying tool protocol.

PackagePurpose
cli36 Click commands with --json, --pipe, --verbose flags
webLightweight HTTP server with REST API and browser UI
bridgeStateless JSON/HTTP bridge for IDE extensions
mcpModel Context Protocol server (official MCP SDK)
lspLanguage Server Protocol for editor integration
tuiTextual-based interactive terminal REPL

Intelligence Layer

Orchestrates AI-powered features and tool execution.

PackagePurpose
toolsToolExecutor, ToolInvocation, ToolRegistry — 8 built-in tools
llmProvider abstraction: OpenAI, Ollama, Mock with caching and streaming
contextContextBuilder, ContextWindow, CallGraph, DependencyMap
evolutionEvolutionEngine, BudgetGuard, TaskSelector, PatchGenerator

Analysis Layer

Code quality, metrics, and workflow intelligence.

PackagePurpose
ciQuality analysis, metrics snapshots, hotspot detection, impact analysis
analysisRepoSummary, CodeExplanation, LanguageStats
servicesIndexingResult, SearchResult — service-layer abstractions

Core Layer

Parsing, indexing, embedding, and search infrastructure.

PackagePurpose
parsingtree-sitter AST parsing for 12 languages, Symbol extraction
indexingCodeChunk, ScannedFile, SemanticChunk — file scanning and chunking
embeddingssentence-transformers model management (all-MiniLM-L6-v2)
searchFAISS vector search, BM25 keyword search, hybrid RRF fusion

Foundation Layer

Storage, configuration, plugins, and workspace management.

PackagePurpose
storageVectorStore, SymbolRegistry, IndexStats, QueryHistory
configPydantic models: AppConfig, EmbeddingConfig, SearchConfig, LLMConfig
pluginsPluginManager, PluginBase, 22 PluginHook points
workspaceWorkspace, WorkspaceManifest — multi-repo management

Data Flow

mermaid
sequenceDiagram
    participant User
    participant CLI
    participant ToolExecutor
    participant Search
    participant FAISS
    participant Embeddings

    User->>CLI: codex search "auth middleware"
    CLI->>ToolExecutor: execute(semantic_search, {query: "auth middleware"})
    ToolExecutor->>Search: hybrid_search(query, top_k=10)
    Search->>Embeddings: encode("auth middleware")
    Embeddings-->>Search: vector [384 dims]
    Search->>FAISS: similarity_search(vector, k=10)
    FAISS-->>Search: [(chunk_id, score), ...]
    Search->>Search: BM25 keyword search
    Search->>Search: RRF fusion (k=60)
    Search-->>ToolExecutor: HybridResult[]
    ToolExecutor-->>CLI: ToolExecutionResult
    CLI-->>User: Formatted results (rich / JSON / pipe)

Indexing Pipeline

mermaid
flowchart LR
    A[Source Files] -->|scan| B[ScannedFile]
    B -->|parse| C[tree-sitter AST]
    C -->|extract| D[Symbol Registry]
    B -->|chunk| E[CodeChunk]
    E -->|embed| F[Vectors 384d]
    F -->|store| G[FAISS Index]

    H[PluginHook.PRE_INDEX] -.-> A
    I[PluginHook.ON_CHUNK] -.-> E
    J[PluginHook.POST_INDEX] -.-> G

Directory Layout

semantic_code_intelligence/
├── __init__.py
├── analysis/        # Code explanation & summary
├── bridge/          # HTTP bridge server
├── ci/              # Quality, metrics, hotspots, impact, gates
├── cli/             # 36 Click commands
├── config/          # Pydantic configuration models
├── context/         # Context builder, call graph, deps
├── daemon/          # Background file watcher & indexer
├── docs/            # Documentation generation utilities
├── embeddings/      # Sentence-transformer model management
├── evolution/       # Self-improving development loop
├── indexing/        # Code chunking & file scanning
├── llm/             # LLM providers (OpenAI, Ollama, Mock)
├── lsp/             # Language Server Protocol
├── mcp/             # Model Context Protocol server
├── parsing/         # tree-sitter AST parsing (12 languages)
├── plugins/         # Plugin system (22 hooks)
├── scalability/     # Batch processing, memory-aware embedding
├── search/          # FAISS vector + BM25 + hybrid search
├── services/        # Service-layer result types
├── storage/         # VectorStore, SymbolRegistry, caches
├── tools/           # AI Agent Tool Protocol (8 tools)
├── tui/             # Textual interactive terminal
├── utils/           # Logging utilities
├── web/             # Web UI + REST API
└── workspace/       # Multi-repo workspace management

Released under the MIT License.