Modern Data+AI Stack - Spec Tables
| Software | Purpose | License | Version | Capability | Env | Est. Savings per Year |
|---|
| DuckDB | Storage and querying of structured data | MIT | 0.9.2+ | In-process analytical database | All | 25,000−100,000 |
| dbt | Define, test, and document data transformations | Apache 2.0 | 1.6.0+ | Data transformation framework | All | 0−20,000 |
| SQLite | Lightweight storage for metadata | Public Domain | 3.40.0+ | Embedded database | All | 1,000−5,000 |
| LanceDB | Storage and searching of vector embeddings | Apache 2.0 | 0.5.0+ | Vector database | All | 10,000−50,000 |
| KuzuDB | Embeddable graph database | Apache 2.0 | 0.0.14+ | Graph database engine | All | 8,000−40,000 |
| n8n | Integration and automation of data workflows | Fair Code | 1.0.0+ | Workflow automation | All | 12,000−36,000 |
| UV | Dependency management and environment setup | MIT | 0.1.0+ | Python package manager | All | 200 eng hours |
| Evidence | Create interactive dashboards and reports | MIT | 2.0.0+ | Data visualization framework | All | 12,000−50,000 |
| Rill Data | Real-time dashboards and data exploration | MIT | 0.40.0+ | BI and self-service analytics platform | All | 10,000−40,000 |
| Docling | Extract structured data from documents | MIT | 2.30.0+ | Document processing toolkit | All | 5,000−25,000 |
| Marimo | Data exploration and analysis | Apache 2.0 | 0.1.0+ | Interactive notebooks | All | 5,000−15,000 |
| LocalStacks | Local AWS emulation for development | Apache 2.0 | 2.0.0+ | Local cloud service emulator | Local | 5,000−30,000 |
| Software | Purpose | License | Version | Capability | Env | Est. Savings per Year |
|---|
| Python | Core runtime for most components | PSF License | 3.11+ | Programming language | All | $0 |
| Node.js | Required for Evidence dashboards | MIT | 18.0.0+ | JavaScript runtime | All | $0 |
| Git | Track changes to code and configuration | GPL-2.0 | 2.35.0+ | Version control system | All | 0−5,000 |
| Ghostty | High-performance terminal emulator | MIT | Latest | GPU-accelerated terminal | All | 200 eng hours |
| Warp | Modern, Rust-based terminal | Proprietary | Latest | AI-enhanced terminal with blocks | macOS | 200 eng hours |
| Airflow | Alternative for scheduling and monitoring workflows | Apache 2.0 | 2.6.0+ | Workflow orchestration | All | 10,000−40,000 |
| Obsidian | Documentation viewer with diagrams & wikilinks | Proprietary | 1.4.5+ | Markdown knowledge base | Local | 500 eng hours |
| Logseq | Knowledge graph and outliner | AGPL-3.0 | 0.9.0+ | Connected note-taking system | All | 300 eng hours |
| Cursor | Development environment with AI capabilities | Proprietary | Latest | AI-assisted code editor | Local | 500 eng hours |
Optional & Extension Components
| Software | Purpose | License | Version | Capability | Env | Est. Savings per Year |
|---|
| MinIO | Local alternative to AWS S3 for data lake storage | AGPL-3.0 | Latest | S3-compatible object storage | Local | 1,000−10,000 |
| AWS S3 | Scalable data lake storage | N/A (Service) | N/A | Cloud object storage | Cloud | N/A |
| PostgreSQL | Alternative storage for metadata | PostgreSQL License | 14.0+ | Relational database | Any | 6,000−30,000 |
| Alembic | Database schema migrations | MIT | 1.10.0+ | Migration tool | All | $0 |
| DBeaver | Universal database tool | Apache 2.0 | 23.0.0+ | Database GUI | All | 2,000−8,000 |
| Superset | Alternative to Evidence for visualization | Apache 2.0 | 2.1.0+ | BI platform | Any | 12,000−50,000 |
AI Components
| Software | Purpose | License | Version | Capability | Env | Est. Savings per Year |
|---|
| LibreChat | Self-hosted AI chat interface | AGPL-3.0 | 0.6.0+ | Chat interface for multiple LLMs | All | 8,000−25,000 |
| Claude Desktop | Local desktop app for Claude AI | Proprietary | Latest | Desktop client for Anthropic’s Claude | Local | 0−10,000 |
| CrewAI | Framework for orchestrating AI agents | MIT | 0.22.0+ | Multi-agent orchestration framework | All | 15,000−40,000 |
| Restack | Local LLM and AI stack deployment | Proprietary | Latest | One-click AI deployment platform | All | 5,000−20,000 |
| AWS Bedrock | Managed service for foundation models | Proprietary | Latest | Access to multiple foundation models | Cloud | Pay-per-use |
| Ollama | Run open source LLMs locally | MIT | 0.1.19+ | Local LLM runner | All | 5,000−25,000 |
| LlamaIndex | Data framework for LLM applications | MIT | 0.9.0+ | RAG framework | All | 10,000−30,000 |
| GGML | Machine learning library for edge devices | MIT | 0.1.0+ | Tensor library for efficient inference | All | 3,000−15,000 |