Portable Data Stack - Specs

This document provides detailed specifications of all software components required for implementing the Portable Data Stack. For step-by-step setup instructions, see the Quickstart Guide.

Environment Key

  • All: Works in all environments (local, cloud, container)
  • Any: Works in multiple but not all environments
  • Local: Primarily for local development environments
  • Cloud: Primarily for cloud deployments

Total Estimated Savings Using this Portable Data Stack can save organizations between 293,000 per year in direct costs, plus approximately 1,300 engineering hours in development and maintenance time, compared to traditional enterprise cloud data stack options. Actual savings will vary based on data volume, team size, and specific requirements.

Core Components

SoftwareVersionCapabilityPurposeLicenseEnvEst. Savings per YearEst. MethodYouTube Videos
DuckDB0.9.2+In-process analytical databaseStorage and querying of structured dataMITAll100,000Compared to Snowflake or BigQuery pricingDuckDB in 100 Seconds, DuckDB: The Portable Data Stack
dbt1.6.0+Data transformation frameworkDefine, test, and document data transformationsApache 2.0All20,000Compared to dbt Cloud pricingdbt in 5 Minutes
Dagster1.5.0+Orchestration engineSchedule and manage data pipeline executionApache 2.0All45,000Compared to Airflow Cloud or Dagster CloudDagster in 10 Minutes
UV0.1.0+Python package managerDependency management and environment setupMITAll200 eng hoursFaster builds compared to pip (benchmarks)UV Package Manager in 100 Seconds
Evidence2.0.0+Data visualization frameworkCreate interactive dashboards and reportsMITAll50,000Compared to Tableau or LookerEvidence Introduction
Docling2.30.0+Document processing toolkitExtract structured data from documentsMITAll25,000Compared to Textract or similar servicesDocling Overview

For an architectural overview of how these components work together, refer to the Architecture Diagram.

Development & Support Tools

SoftwareVersionCapabilityPurposeLicenseEnvEst. Savings per YearEst. MethodYouTube Videos
Python3.11+Programming languageCore runtime for most componentsPSF LicenseAll$0Open source standardPython in 100 Seconds
Node.js18.0.0+JavaScript runtimeRequired for Evidence dashboardsMITAll$0Open source standardNode.js Crash Course
Git2.35.0+Version control systemTrack changes to code and configurationGPL-2.0All5,000Compared to proprietary VCS optionsGit in 100 Seconds
dbt-duckdb1.6.0+dbt adapter for DuckDBConnect dbt to DuckDB databaseApache 2.0AllPart of DuckDB savingsIncluded in DuckDB comparisondbt with DuckDB
dagster-duckdb0.21.0+Dagster integration for DuckDBConnect Dagster to DuckDBApache 2.0AllPart of Dagster savingsIncluded in Dagster comparisonDagster IO Managers
dagster-dbt0.21.0+Dagster integration for dbtOrchestrate dbt models in DagsterApache 2.0AllPart of Dagster savingsIncluded in Dagster comparisonDagster + dbt Integration
dagster-dg0.1.0+Dagster CLI toolSimplified Dagster project managementApache 2.0All100 eng hoursFaster development compared to standard Dagster CLIDagster DG Introduction
CursorlatestAI-assisted code editorDevelopment environment with AI capabilitiesProprietaryLocal500 eng hoursProductivity gains over standard editorsCursor IDE Overview
Obsidian1.4.5+Markdown knowledge baseDocumentation viewer with diagrams & wikilinksProprietaryLocal500 eng hoursCompared to traditional documentation tools or ConfluenceObsidian for Beginners
n8n1.45.0+Workflow automationOrchestrate and automate data pipelines and integrationsSustainable Use LicenseAll45,000Compared to Airflow Cloud or Dagster Cloudn8n in 100 Seconds
MarimoPython notebook

For information on how to use these tools together, see the Local Installation Guide.

Optional & Extension Components

SoftwareVersionCapabilityPurposeLicenseEnvEst. Savings per YearEst. MethodYouTube Videos
Docker24.0.0+Containerization platformConsistent deployment across environmentsApache 2.0Any15,000Compared to Docker Business pricingDocker in 100 Seconds
MinIORELEASE.2023-04-20T17-56-55Z+S3-compatible object storageLocal alternative to AWS S3 for data lake storageAGPL-3.0Local10,000Compared to AWS S3 for smaller workloadsMinIO Tutorial
AWS S3N/A (Service)Cloud object storageScalable data lake storageN/A (Service)CloudN/APaid service - baseline comparisonAWS S3 Tutorial
PostgreSQL14.0+Relational databaseAlternative storage for Dagster metadataPostgreSQL LicenseAny30,000Compared to managed DB servicesPostgreSQL Crash Course
SQLite3.40.0+Embedded databaseLightweight storage for Dagster metadataPublic DomainLocal5,000Compared to managed DB services for small workloadsSQLite in 100 Seconds
dbt-core1.6.0+dbt command-line interfaceRun dbt transformations without adaptersApache 2.0AllPart of dbt savingsIncluded in dbt comparisondbt Core Tutorial
Faker20.0.0+Test data generationCreate realistic sample dataMITAll3,000Compared to commercial data generation toolsPython Faker Tutorial
Parquet-tools1.12.0+Parquet file utilitiesInspect and manipulate Parquet filesApache 2.0All$0Open source standardParquet File Format
JupyterLab4.0.0+Notebook interfaceInteractive data explorationBSD-3-ClauseLocal15,000Compared to managed notebook servicesJupyterLab Tutorial
Superset2.1.0+BI platformAlternative to Evidence for visualizationApache 2.0Any50,000Compared to commercial BI toolsApache Superset Tutorial
dbt-utils1.1.1+dbt helper macrosExtended functionality for dbtApache 2.0AllPart of dbt savingsIncluded in dbt comparisondbt-utils Tutorial

For more information on integrating S3 storage with this stack, see the S3 Data Lake Pattern section.

Installation Commands

Core Components Installation

# Install UV package manager
curl -sSf https://astral.sh/uv/install.sh | sh
 
# Install Dagster dg CLI tool
uv tool install dagster-dg
 
# Install Python dependencies
uv add duckdb dbt-duckdb dagster dagster-webserver dagster-duckdb dagster-dbt pandas pyarrow docling
 
# Install Evidence
npm create evidence@latest my-evidence-project -- --yes

For Docker-based installation, follow the Docker Setup Guide.

Optional Components Installation

# Install MinIO (Docker-based)
docker run -p 9000:9000 -p 9001:9001 \
  -v ~/minio/data:/data \
  -e "MINIO_ROOT_USER=minioadmin" \
  -e "MINIO_ROOT_PASSWORD=minioadmin" \
  minio/minio server /data --console-address ":9001"
 
# Install JupyterLab
uv add jupyterlab
 
# Install Superset (Docker-based)
docker run -d -p 8088:8088 --name superset apache/superset

Documentation Setup with Obsidian

# Install Obsidian
# Download from https://obsidian.md/download and install
 
# Initialize Obsidian vault for documentation
mkdir -p portable-stack-docs
cd portable-stack-docs
 
# Create basic structure for Obsidian vault
mkdir -p attachments
touch "Portable Data Stack Guide.md"
touch "Portable Data Stack Quickstart.md"
touch "Data Stack Comparison Matrix.md"
touch "Orchestration Tools Comparison.md"
 
# Create Obsidian configuration
mkdir -p .obsidian
cat > .obsidian/app.json << EOF
{
  "promptDelete": false,
  "alwaysUpdateLinks": true,
  "newLinkFormat": "shortest",
  "useMarkdownLinks": false,
  "showUnsupportedFiles": true
}
EOF
 
# Enable Mermaid diagrams
cat > .obsidian/appearance.json << EOF
{
  "baseFontSize": 16,
  "enabledCssSnippets": [],
  "theme": "obsidian"
}
EOF
 
# Configure graph view
cat > .obsidian/graph.json << EOF
{
  "collapse-filter": false,
  "search": "",
  "showTags": true,
  "showAttachments": false,
  "hideUnresolved": false,
  "showOrphans": true,
  "collapse-color-groups": false,
  "colorGroups": [
    {
      "query": "tag:#data-engineering",
      "color": { "a": 1, "h": 4, "s": 0.5, "l": 0.5 }
    },
    {
      "query": "tag:#quickstart",
      "color": { "a": 1, "h": 92, "s": 0.5, "l": 0.62 }
    },
    {
      "query": "tag:#portable-stack",
      "color": { "a": 1, "h": 220, "s": 0.68, "l": 0.5 }
    }
  ],
  "collapse-display": false,
  "showArrow": true,
  "textFadeMultiplier": 0,
  "nodeSizeMultiplier": 1,
  "lineSizeMultiplier": 1,
  "collapse-forces": false,
  "centerStrength": 0.518713248970312,
  "repelStrength": 10,
  "linkStrength": 1,
  "linkDistance": 250,
  "scale": 0.7132754626224427
}
EOF

Version Compatibility Matrix

DuckDBdbtDagsterPythonNode.jsNoteEnv
0.9.21.6.01.5.03.1118.xMinimum recommended versionsAll
0.9.21.6.01.6.03.1118.xStable combinationAll
0.9.2+1.6.0+1.5.0+3.11+18.x+Latest versions generally compatibleAll
0.8.x1.5.x1.4.x3.9-3.1016.xLegacy compatibilityLocal

For troubleshooting version compatibility issues, refer to the Troubleshooting Guide.

Documentation Resources

For a comparison of orchestration tools including Dagster, see the Orchestration Tools Comparison.

software-requirements portable-stack specifications versions licensing