Data Model
pgflow’s data model separates flow definitions from runtime execution state. Flow orchestration uses PostgreSQL tables and functions in the pgflow schema for transparent, queryable behavior.
Schema Design Principles
Section titled “Schema Design Principles”🏷️ Slugs as Identifiers
Section titled “🏷️ Slugs as Identifiers”Flows and steps are identified by slugs - simple text identifiers like 'analyze_website' or 'fetch_data'. Slugs must be valid identifiers (alphanumeric plus underscores, max 128 characters) and serve as natural, readable keys throughout the system.
🔑 Composite Keys with Denormalization
Section titled “🔑 Composite Keys with Denormalization”The schema uses composite primary and foreign keys that include denormalized slug values:
-- Definition tablesflows: (flow_slug)steps: (flow_slug, step_slug)deps: (flow_slug, dep_slug, step_slug)
-- Runtime tablesruns: (run_id) + denormalized flow_slugstep_states: (run_id, step_slug) + denormalized flow_slugstep_tasks: (run_id, step_slug, task_index) + denormalized flow_slugDenormalized slugs simplify queries - no joins needed:
-- Get all tasks for a specific step across all runsSELECT * FROM pgflow.step_tasksWHERE flow_slug = 'my_flow' AND step_slug = 'my_step';Two Categories of Tables
Section titled “Two Categories of Tables”The schema organizes tables into two distinct categories that serve different purposes in the flow lifecycle. All tables live in the pgflow schema.
Static Definition Tables
Section titled “Static Definition Tables”These tables store the flow structure - the “blueprint” of what flows exist and how they connect:
flows - Flow identity and global configuration
- Contains
opt_max_attempts,opt_base_delay,opt_timeout - Each flow has a unique slug
steps - DAG nodes with step type (single or map)
single: Regular steps created with.step()or.array()DSL methodsmap: Map steps created with.map()method for parallel array processing- Can override flow-level options
- Must be added in topological order
🔗 deps - DAG edges defining step dependencies
- Determines execution order and data flow
Populated during deployment through migrations. Change only when flow definitions change.
Runtime State Tables
Section titled “Runtime State Tables”These tables track the execution state of flow instances:
runs - Execution instances of flows
- Tracks run status (
started,completed,failed) - Stores input and aggregated outputs
- Maintains
remaining_stepscounter for completion detection
step_states - State of individual steps within a run
- Tracks step status (
created,started,completed,failed) - For map steps, tracks
initial_tasksandremaining_taskscounts - Coordinates step-level completion and aggregation
step_tasks - Units of work for steps
- Single steps create 1 task, map steps create N tasks
- Each task has retry counter and attempts tracking
- Contains
task_indexfor map task array elements - Stores individual outputs for aggregation
Created and modified during execution. Each run creates new records tracking progress from start to completion.
Schema Visualization
Section titled “Schema Visualization”The relationships between these tables form the complete pgflow data model:
How the Categories Work Together
Section titled “How the Categories Work Together”The separation between definition and runtime tables enables several key capabilities:
🔒 Separation of Concerns - Definition tables read-only during execution, runtime tables query without affecting definitions, multiple runs execute concurrently
👁️ Observability - All execution state in queryable SQL tables (step_states, step_tasks), no hidden state or external coordination
⚡ ACID Guarantees - All state transitions in transactions, dependencies and task spawning atomic, no race conditions
Task Creation Patterns
Section titled “Task Creation Patterns”The relationship between steps and tasks depends on the step type:
| Step Type | Tasks Created | Task Index | Input Received |
|---|---|---|---|
single | Exactly 1 | N/A | Full input with dependencies |
map | N (one per array element) | 0 to N-1 | Individual array element |
This design handles both single-task steps and parallel fanout through the same execution model.
State Transitions and Counters
Section titled “State Transitions and Counters”🏃 Run-level tracking
remaining_stepsstarts at total step count, decrements on completion- When zero, run marked complete and outputs aggregated
🔸 Step-level tracking (map steps only)
initial_tasksset when array size knownremaining_tasksdecrements as tasks complete- When zero, outputs aggregated into array