Portable Data Stack - Specs
This document provides detailed specifications of all software components required for implementing the Portable Data Stack. For step-by-step setup instructions, see the Quickstart Guide.
Environment Key
- All: Works in all environments (local, cloud, container)
- Any: Works in multiple but not all environments
- Local: Primarily for local development environments
- Cloud: Primarily for cloud deployments
Total Estimated Savings Using this Portable Data Stack can save organizations between 293,000 per year in direct costs, plus approximately 1,300 engineering hours in development and maintenance time, compared to traditional enterprise cloud data stack options. Actual savings will vary based on data volume, team size, and specific requirements.
Core Components
| Software | Version | Capability | Purpose | License | Env | Est. Savings per Year | Est. Method | YouTube Videos |
|---|---|---|---|---|---|---|---|---|
| DuckDB | 0.9.2+ | In-process analytical database | Storage and querying of structured data | MIT | All | 100,000 | Compared to Snowflake or BigQuery pricing | DuckDB in 100 Seconds, DuckDB: The Portable Data Stack |
| dbt | 1.6.0+ | Data transformation framework | Define, test, and document data transformations | Apache 2.0 | All | 20,000 | Compared to dbt Cloud pricing | dbt in 5 Minutes |
| Dagster | 1.5.0+ | Orchestration engine | Schedule and manage data pipeline execution | Apache 2.0 | All | 45,000 | Compared to Airflow Cloud or Dagster Cloud | Dagster in 10 Minutes |
| UV | 0.1.0+ | Python package manager | Dependency management and environment setup | MIT | All | 200 eng hours | Faster builds compared to pip (benchmarks) | UV Package Manager in 100 Seconds |
| Evidence | 2.0.0+ | Data visualization framework | Create interactive dashboards and reports | MIT | All | 50,000 | Compared to Tableau or Looker | Evidence Introduction |
| Docling | 2.30.0+ | Document processing toolkit | Extract structured data from documents | MIT | All | 25,000 | Compared to Textract or similar services | Docling Overview |
For an architectural overview of how these components work together, refer to the Architecture Diagram.
Development & Support Tools
| Software | Version | Capability | Purpose | License | Env | Est. Savings per Year | Est. Method | YouTube Videos |
|---|---|---|---|---|---|---|---|---|
| Python | 3.11+ | Programming language | Core runtime for most components | PSF License | All | $0 | Open source standard | Python in 100 Seconds |
| Node.js | 18.0.0+ | JavaScript runtime | Required for Evidence dashboards | MIT | All | $0 | Open source standard | Node.js Crash Course |
| Git | 2.35.0+ | Version control system | Track changes to code and configuration | GPL-2.0 | All | 5,000 | Compared to proprietary VCS options | Git in 100 Seconds |
| dbt-duckdb | 1.6.0+ | dbt adapter for DuckDB | Connect dbt to DuckDB database | Apache 2.0 | All | Part of DuckDB savings | Included in DuckDB comparison | dbt with DuckDB |
| dagster-duckdb | 0.21.0+ | Dagster integration for DuckDB | Connect Dagster to DuckDB | Apache 2.0 | All | Part of Dagster savings | Included in Dagster comparison | Dagster IO Managers |
| dagster-dbt | 0.21.0+ | Dagster integration for dbt | Orchestrate dbt models in Dagster | Apache 2.0 | All | Part of Dagster savings | Included in Dagster comparison | Dagster + dbt Integration |
| dagster-dg | 0.1.0+ | Dagster CLI tool | Simplified Dagster project management | Apache 2.0 | All | 100 eng hours | Faster development compared to standard Dagster CLI | Dagster DG Introduction |
| Cursor | latest | AI-assisted code editor | Development environment with AI capabilities | Proprietary | Local | 500 eng hours | Productivity gains over standard editors | Cursor IDE Overview |
| Obsidian | 1.4.5+ | Markdown knowledge base | Documentation viewer with diagrams & wikilinks | Proprietary | Local | 500 eng hours | Compared to traditional documentation tools or Confluence | Obsidian for Beginners |
| n8n | 1.45.0+ | Workflow automation | Orchestrate and automate data pipelines and integrations | Sustainable Use License | All | 45,000 | Compared to Airflow Cloud or Dagster Cloud | n8n in 100 Seconds |
| Marimo | Python notebook |
For information on how to use these tools together, see the Local Installation Guide.
Optional & Extension Components
| Software | Version | Capability | Purpose | License | Env | Est. Savings per Year | Est. Method | YouTube Videos |
|---|---|---|---|---|---|---|---|---|
| Docker | 24.0.0+ | Containerization platform | Consistent deployment across environments | Apache 2.0 | Any | 15,000 | Compared to Docker Business pricing | Docker in 100 Seconds |
| MinIO | RELEASE.2023-04-20T17-56-55Z+ | S3-compatible object storage | Local alternative to AWS S3 for data lake storage | AGPL-3.0 | Local | 10,000 | Compared to AWS S3 for smaller workloads | MinIO Tutorial |
| AWS S3 | N/A (Service) | Cloud object storage | Scalable data lake storage | N/A (Service) | Cloud | N/A | Paid service - baseline comparison | AWS S3 Tutorial |
| PostgreSQL | 14.0+ | Relational database | Alternative storage for Dagster metadata | PostgreSQL License | Any | 30,000 | Compared to managed DB services | PostgreSQL Crash Course |
| SQLite | 3.40.0+ | Embedded database | Lightweight storage for Dagster metadata | Public Domain | Local | 5,000 | Compared to managed DB services for small workloads | SQLite in 100 Seconds |
| dbt-core | 1.6.0+ | dbt command-line interface | Run dbt transformations without adapters | Apache 2.0 | All | Part of dbt savings | Included in dbt comparison | dbt Core Tutorial |
| Faker | 20.0.0+ | Test data generation | Create realistic sample data | MIT | All | 3,000 | Compared to commercial data generation tools | Python Faker Tutorial |
| Parquet-tools | 1.12.0+ | Parquet file utilities | Inspect and manipulate Parquet files | Apache 2.0 | All | $0 | Open source standard | Parquet File Format |
| JupyterLab | 4.0.0+ | Notebook interface | Interactive data exploration | BSD-3-Clause | Local | 15,000 | Compared to managed notebook services | JupyterLab Tutorial |
| Superset | 2.1.0+ | BI platform | Alternative to Evidence for visualization | Apache 2.0 | Any | 50,000 | Compared to commercial BI tools | Apache Superset Tutorial |
| dbt-utils | 1.1.1+ | dbt helper macros | Extended functionality for dbt | Apache 2.0 | All | Part of dbt savings | Included in dbt comparison | dbt-utils Tutorial |
For more information on integrating S3 storage with this stack, see the S3 Data Lake Pattern section.
Installation Commands
Core Components Installation
# Install UV package manager
curl -sSf https://astral.sh/uv/install.sh | sh
# Install Dagster dg CLI tool
uv tool install dagster-dg
# Install Python dependencies
uv add duckdb dbt-duckdb dagster dagster-webserver dagster-duckdb dagster-dbt pandas pyarrow docling
# Install Evidence
npm create evidence@latest my-evidence-project -- --yesFor Docker-based installation, follow the Docker Setup Guide.
Optional Components Installation
# Install MinIO (Docker-based)
docker run -p 9000:9000 -p 9001:9001 \
-v ~/minio/data:/data \
-e "MINIO_ROOT_USER=minioadmin" \
-e "MINIO_ROOT_PASSWORD=minioadmin" \
minio/minio server /data --console-address ":9001"
# Install JupyterLab
uv add jupyterlab
# Install Superset (Docker-based)
docker run -d -p 8088:8088 --name superset apache/supersetDocumentation Setup with Obsidian
# Install Obsidian
# Download from https://obsidian.md/download and install
# Initialize Obsidian vault for documentation
mkdir -p portable-stack-docs
cd portable-stack-docs
# Create basic structure for Obsidian vault
mkdir -p attachments
touch "Portable Data Stack Guide.md"
touch "Portable Data Stack Quickstart.md"
touch "Data Stack Comparison Matrix.md"
touch "Orchestration Tools Comparison.md"
# Create Obsidian configuration
mkdir -p .obsidian
cat > .obsidian/app.json << EOF
{
"promptDelete": false,
"alwaysUpdateLinks": true,
"newLinkFormat": "shortest",
"useMarkdownLinks": false,
"showUnsupportedFiles": true
}
EOF
# Enable Mermaid diagrams
cat > .obsidian/appearance.json << EOF
{
"baseFontSize": 16,
"enabledCssSnippets": [],
"theme": "obsidian"
}
EOF
# Configure graph view
cat > .obsidian/graph.json << EOF
{
"collapse-filter": false,
"search": "",
"showTags": true,
"showAttachments": false,
"hideUnresolved": false,
"showOrphans": true,
"collapse-color-groups": false,
"colorGroups": [
{
"query": "tag:#data-engineering",
"color": { "a": 1, "h": 4, "s": 0.5, "l": 0.5 }
},
{
"query": "tag:#quickstart",
"color": { "a": 1, "h": 92, "s": 0.5, "l": 0.62 }
},
{
"query": "tag:#portable-stack",
"color": { "a": 1, "h": 220, "s": 0.68, "l": 0.5 }
}
],
"collapse-display": false,
"showArrow": true,
"textFadeMultiplier": 0,
"nodeSizeMultiplier": 1,
"lineSizeMultiplier": 1,
"collapse-forces": false,
"centerStrength": 0.518713248970312,
"repelStrength": 10,
"linkStrength": 1,
"linkDistance": 250,
"scale": 0.7132754626224427
}
EOFVersion Compatibility Matrix
| DuckDB | dbt | Dagster | Python | Node.js | Note | Env |
|---|---|---|---|---|---|---|
| 0.9.2 | 1.6.0 | 1.5.0 | 3.11 | 18.x | Minimum recommended versions | All |
| 0.9.2 | 1.6.0 | 1.6.0 | 3.11 | 18.x | Stable combination | All |
| 0.9.2+ | 1.6.0+ | 1.5.0+ | 3.11+ | 18.x+ | Latest versions generally compatible | All |
| 0.8.x | 1.5.x | 1.4.x | 3.9-3.10 | 16.x | Legacy compatibility | Local |
For troubleshooting version compatibility issues, refer to the Troubleshooting Guide.
Documentation Resources
| Component | Official Documentation | Community Resources |
|---|---|---|
| DuckDB | DuckDB Docs | DuckDB GitHub • Discord |
| dbt | dbt Docs | dbt Discourse • Slack |
| Dagster | Dagster Docs | Dagster Slack • GitHub |
| UV | UV Docs | UV GitHub • Discussions |
| Docker | Docker Docs | Docker Forums • Stack Overflow |
| Evidence | Evidence Docs | Evidence Discord • GitHub |
| Docling | Docling Docs | Docling GitHub • Issues |
| Obsidian | Obsidian Help | Obsidian Forum • Discord • Reddit |
For a comparison of orchestration tools including Dagster, see the Orchestration Tools Comparison.
Related Resources
- Comparing Different Data Stack Approaches
- Performance Optimization Guide
- FAQ
- DuckDB with dbt: Local Power
- Modern Data Stack in a Box with DuckDB
- Fully Local Data Transformation with dbt and DuckDB
software-requirements portable-stack specifications versions licensing