You have two options for setting up the Portable Data Stack:
Automated Setup: Using the provided setup script
Manual Setup: Following the step-by-step instructions
Option 1: Automated Setup
This option uses our setup script to automate the entire installation process.
Step 1: Download the Setup Script
# Download the setup scriptcurl -o portable_stack_setup.sh https://example.com/portable_stack_setup.shchmod +x portable_stack_setup.sh
Step 2: Run the Setup Script
# Run the script./portable_stack_setup.sh
The script will:
Install the UV package manager and Dagster DG CLI tool
Create a new Dagster project with the appropriate structure
Set up DuckDB integration
Create data generator assets
Configure dbt for transformations
Set up Evidence for visualization
Provide a helper script for common tasks
Step A: Start the Services
Once the script completes, you can start the services:
# Navigate to your project directorycd ~/portable-data-stack# Start Dagster server./run.sh start# In a new terminal, materialize assets./run.sh materialize# In another terminal, start Evidence dashboard./run.sh evidence
# Create project directorymkdir -p portable-data-stackcd portable-data-stack# Initialize a new Dagster project using DGdg init portable_stack# Navigate to the project directorycd portable_stack
The dg init command creates a standard project structure with:
Update the definitions file at src/portable_stack/definitions.py:
import osfrom dagster import Definitions, load_assets_from_modules, define_asset_job, ScheduleDefinitionfrom . import defsfrom .defs.assets import dbt_models# Import the DuckDB I/O managerfrom .defs.asset_io_managers.duckdb_io_manager import build_duckdb_io_manager# Create database directory if it doesn't existos.makedirs("../../db", exist_ok=True)# Load assetsall_assets = load_assets_from_modules([defs.assets])dbt_transformed_assets = dbt_models.dbt_assets# Define a job to materialize all assetsmaterialize_all_job = define_asset_job( name="materialize_all", selection="*")# Create a schedule to materialize all assets dailydaily_schedule = ScheduleDefinition( job=materialize_all_job, cron_schedule="0 0 * * *",)# Define all objectsdefs = Definitions( assets=[*all_assets, *dbt_transformed_assets], schedules=[daily_schedule], resources={ "io_manager": build_duckdb_io_manager(database_path="../../db/datamart.duckdb") },)
Step 9: Set Up Evidence for Dashboards
# Create directory for Evidencecd .. # Go back to the main project directorymkdir -p evidence_projectcd evidence_project# Initialize Evidence projectnpm create evidence@latest . -- --yes# Create a source configurationmkdir -p sourcescat > sources/duckdb.yml << EOFname: 'duckdb'type: 'duckdb'path: '../db/datamart.duckdb'EOF# Create a simple dashboardmkdir -p pagescat > pages/index.md << EOF# Sales Dashboard\`\`\`sql sales_by_categoryselect category, sum(total_price) as revenuefrom analytics.fact_salesgroup by categoryorder by revenue desc\`\`\`## Category Performance<BarChart data={sales_by_category} x=category y=revenue title="Revenue by Category"/>## Sales by Location\`\`\`sql sales_by_countryselect country, sum(total_price) as revenuefrom analytics.fact_salesgroup by countryorder by revenue desc\`\`\`<PieChart data={sales_by_country} value=revenue category=country title="Revenue by Country"/>## Daily Trend\`\`\`sql daily_salesselect date_trunc('day', order_date) as date, sum(total_price) as revenuefrom analytics.fact_salesgroup by dateorder by date\`\`\`<LineChart data={daily_sales} x=date y=revenue title="Daily Sales Revenue"/>EOF# Return to project rootcd ..
Step 10: Start the Services
# Start Dagster in one terminalcd portable_stackdg dev# In a new terminal, materialize assetscd portable_stackdg materialize# In another terminal, start Evidence dashboardcd evidence_projectnpm run dev
Verification Checklist
Before proceeding further, use this checklist to verify that all components are working correctly:
Component
Verification Step
Expected Result
✓
DuckDB
Run duckdb db/datamart.duckdb "SELECT count(*) FROM analytics.dim_customers;"
Validation Query Run this query to validate the full pipeline from raw data to transformed analytics:
SELECT p.category, COUNT(DISTINCT f.customer_id) AS unique_customers, SUM(f.total_price) AS total_revenue, AVG(f.total_price) AS avg_order_valueFROM analytics.fact_sales fJOIN analytics.dim_products p ON f.product_id = p.product_idGROUP BY p.categoryORDER BY total_revenue DESC;
This query should return results with multiple categories and meaningful metrics.