Portable Data Stack Tooling - Project Plan

Short Task List (Prioritized)

  1. Implement Marimo service (port 3001)
  2. Integrate Prefect (port 4200)
  3. Add n8n workflow automation (port 5678)
  4. Set up Evidence dashboard (port 3000)
  5. Launch Docling API/UI (port 5001)
  6. Integrate DuckDB and UI (port 4213)
  7. Integrate LanceDB (local vector DB, port 4230)
  8. [Optional, implement-later] Add Dagster support (flag)
  9. [Optional, implement-later] Add Airflow support (flag)
  10. [Optional] Add OpenWebUI (port 4240), llama-cpp-python (port 4250), Qwen3:4b (port 4260), uv (optional services)

Core Features

Setup and Configuration

  • Basic script structure and logging
  • Unified script for setup and runtime operations
  • Support for non-interactive mode (-y flag)
  • Configuration file for customization (config.yaml)
  • Template-based approach for all components
  • Dependency version management
  • Integrate DuckDB installation and UI launch into pqs-cli.sh
  • Service management and status reporting refactored in pqs-cli.sh
  • Colored output and summary table for user experience in pqs-cli.sh
  • Configurable ports via environment variables or arguments in pqs-cli.sh
  • Error handling for missing dependencies and failed service starts in pqs-cli.sh

Data Stack Components (Implementation Order)

  • Marimo notebooks (port 3001)
  • Prefect orchestration (port 4200)
  • n8n workflow automation (port 5678)
  • Evidence dashboards (port 3000)
  • Docling API/UI (port 5001)
  • DuckDB database and UI (port 4213)
  • LanceDB (local vector DB, port 4230)
  • [Optional, implement-later] Dagster orchestration (flag)
  • [Optional, implement-later] Airflow orchestration (flag)
  • [Optional] OpenWebUI (LLM UI, optional, port 4240)
  • [Optional] llama-cpp-python (LLM backend, optional, port 4250)
  • [Optional] Qwen3:4b (LLM model, optional, port 4260)
  • [Optional] uv (Python package manager, optional)

Runtime Operations

  • Start/stop services (pqs-cli.sh)
  • [/] Materialize assets with prefect-dbt
  • Automatic browser opening for web UIs (only for just-started services)
  • [/] Service health monitoring (optional via —health flag)

Documentation

  • Basic README documentation
  • Marimo usage guide
  • Prefect integration guide
  • n8n workflow documentation
  • DuckDB integration guide
  • dbt usage examples
  • Evidence dashboard templates
  • Docling API/UI usage
  • LanceDB usage guide
  • OpenWebUI/LLM integration guide

Project Structure

Core Files

  • README.md - Documentation
  • config/config.yaml - Configuration settings
  • pyproject.toml - Python package configuration
  • pqs-cli.sh - Main setup and management script

Refactored Structure (by tool order)

  • src/marimo/ - Marimo notebooks
    • notebooks/ - Interactive notebooks
  • src/prefect/ - Prefect orchestration
    • flows/ - Prefect flow definitions
  • src/n8n/ - n8n orchestration
    • workflows/ - n8n workflow definitions
    • resources/ - Shared resources for workflows
  • src/evidence/ - Evidence dashboards
    • pages/ - Dashboard pages
    • sources/ - Data sources
  • src/docling/ - Docling API/UI
  • src/duckdb/ - DuckDB database and UI
  • src/lancedb/ - LanceDB vector database
  • [Optional] src/dagster/ - Dagster orchestration (implement-later)
  • [Optional] src/airflow/ - Airflow orchestration (implement-later)
  • [Optional] src/openwebui/ - OpenWebUI (optional)
  • [Optional] src/llama_cpp_python/ - llama-cpp-python (optional)
  • [Optional] src/qwen3_4b/ - Qwen3:4b model (optional)
  • [Optional] src/uv/ - uv (optional)

Script Status: pqs-cli.sh

  • Robust service management (start/stop/status/health)
  • Colored output and summary table for user clarity
  • Configurable ports and improved error handling
  • Only opens browser for just-started services
  • Optional health checks via —health flag
  • All core stack services supported (Marimo, Prefect, n8n, Evidence, Docling, DuckDB, LanceDB)
  • Modern CLI: Enhanced help, command parsing, and option handling (2024-06)
  • [Optional, implement-later] Dagster and Airflow support (flag)
  • [Optional] OpenWebUI, llama-cpp-python, Qwen3:4b, uv (optional services)

Tasks Completed

  1. Created a structured src/ directory for all components
  2. Moved data generators to proper assets directory
  3. Moved dbt SQL models to proper marts directory
  4. Moved shared resources to n8n resources directory
  5. Created configuration file structure
  6. Created Marimo notebook for data exploration
  7. Created Evidence dashboard
  8. Setup proper Python package configuration with pyproject.toml
  9. Refactored and improved setup script as pqs-cli.sh (see below)
  10. Implemented —start Service1,Service2 flag for selective service startup in pqs-cli.sh
  11. CLI/UX refactor: Enhanced help, command parsing, and option handling in pqs-cli.sh (2024-06)

Next Tasks (Priority order)

  1. [/] Create Evidence report explaining the services within the Portable Data Stack (PDS) (Priority: 1)
  2. Create clean-up functionality (Priority: 1)
  3. [/] Add testing framework (Priority: 2)
  4. [/] Create simplified lite mode (Priority: 3)
  5. [/] Add more dbt staging models (Priority: 4)
  6. [-] Integrate optional Dagster and Airflow support (Priority: 5)
    • Add flags for enabling/disabling
    • Update documentation and project structure if needed
  7. Test and document the new —start flag for selective service startup (Priority: 1)
  8. Add LanceDB integration and usage documentation (Priority: 2)
  9. Add OpenWebUI, llama-cpp-python, Qwen3:4b, uv as optional services (Priority: 3)

Technical Requirements

  • Use uv instead of pip for Python package management
  • Use pnpm instead of npm for Node.js package management
  • Support Python 3.11+ environments
  • Store project location for runtime operations
  • Keep consistent logging format throughout