Payload Bucket

The Cirrus payload bucket is an S3 bucket used to store workflow payloads that are either too large to pass inline through AWS service limits or that need to be persisted for state tracking, debugging, and retrieval after a workflow has finished executing.

Together with the StateDB, the payload bucket forms the record of a Cirrus deployment’s work: where the StateDB tracks that an execution happened and what state it’s in, the payload bucket stores the actual JSON payload content associated with each execution.

Configuration

Cirrus does not manage the payload bucket itself. The bucket is supplied to Cirrus code through the CIRRUS_PAYLOAD_BUCKET environment variable, and any S3 bucket that the running code has permission to read from and write to can serve as the payload bucket. A bucket provisioned alongside Cirrus (for example, via the reference CloudFormation in this repository) is one common way to satisfy this, but it is not a requirement.

All Cirrus code that reads or writes payload objects goes through the cirrus.lib.payload_bucket.PayloadBucket class, which resolves the bucket name from CIRRUS_PAYLOAD_BUCKET at construction time. If the variable is not set when a PayloadBucket is created, UndefinedPayloadBucketError is raised.

The root prefix under which all payload objects are organized can be customized via the optional CIRRUS_PAYLOAD_ROOT_PREFIX environment variable. When not set, it defaults to cirrus. This allows deployments that share a payload bucket with other systems to use a distinct namespace, avoiding key collisions.

Because Cirrus writes all of its objects under a single top-level prefix (see Key Organization), the payload bucket can also be shared with other, unrelated content without risk of key collisions.

Key Organization

All objects written by Cirrus live under a configurable top-level prefix (default cirrus/, controlled by CIRRUS_PAYLOAD_ROOT_PREFIX). Within that, keys are split into two distinct namespaces based on how long the data is expected to live:

  • <root_prefix>/tmp/ephemeral storage for transient payloads that the system does not need to retain once processing has moved on. Objects under this prefix are expected to be cleaned up by a bucket lifecycle rule (see Lifecycle and Retention).

  • <root_prefix>/executions/persistent storage for per-execution input and output payloads that Cirrus uses to link workflow runs to their payload content.

The full layout is as follows (using the default cirrus prefix):

s3://<payload-bucket>/
└── cirrus/                               # configurable via CIRRUS_PAYLOAD_ROOT_PREFIX
    ├── tmp/                              # ephemeral
    │   ├── oversized/
    │   │   └── <uuid>.json               # overflow for oversized payloads
    │   ├── batch/
    │   │   └── <payload_id>/
    │   │       └── <uuid>.json           # payloads handed to batch tasks
    │   └── invalid/
    │       └── <uuid>.json               # payloads that failed validation
    └── executions/                       # persistent
        └── <payload_id>/
            └── <execution_id>/
                ├── input.json            # payload as received
                └── output.json           # payload after workflow completed

The prefix values are derived from the configured root prefix and are exposed as instance attributes on PayloadBucket. They are the single source of truth for where each type of object is written.

Note

Because a Cirrus payload_id has the form <collections>/workflow-<workflow_name>/<item_ids> and contains forward slashes, objects under <root_prefix>/executions/ and <root_prefix>/tmp/batch/ fan out into multiple levels of S3 prefixes. This is intentional: it allows listing all executions for a given collection, workflow, or item grouping with ordinary S3 prefix queries.

Execution Payloads

When a workflow execution is claimed by the process lambda, the input payload is written to:

<root_prefix>/executions/<payload_id>/<execution_id>/input.json

On successful completion, the update-state lambda writes the output payload to the sibling key:

<root_prefix>/executions/<payload_id>/<execution_id>/output.json

The <execution_id> component is the Step Functions execution name, which is deterministically derived from the payload ID and the current list of executions in the StateDB record (see PayloadManagers.gen_execution_arn). Because a single payload_id may be re-run — for example on FAILED or ABORTED states, or with replace=True — multiple <execution_id> children may exist under the same payload_id prefix, one per attempt.

This layout is what makes the payload bucket and the StateDB addressable together: given a StateDB record, its payload_id plus the relevant entry from its executions list uniquely identify the corresponding input.json and output.json in the bucket. The management CLI’s get-input-payload and get-output-payload commands rely on exactly this correspondence.

Oversized Payloads

Cirrus passes payloads between lambdas and Step Functions as JSON, but AWS EventBridge embeds the Step Functions input and output as escaped strings inside its own event envelope. To stay safely below the EventBridge event size limit, PayloadManager enforces a conservative MAX_PAYLOAD_LENGTH (120 KB) on the JSON-escaped payload length.

When a payload exceeds that limit, its contents are uploaded to:

<root_prefix>/tmp/oversized/<uuid>.json

and replaced in-flight by a small reference object of the form {"url": "s3://.../<root_prefix>/tmp/oversized/<uuid>.json"}. Downstream code in cirrus.lib.utils.payload_from_s3 transparently re-hydrates these references when the full payload is needed again.

Because oversized payloads are only used as an overflow mechanism during a single execution, they live under <root_prefix>/tmp/ and are expected to be cleaned up by the bucket’s lifecycle rule.

Batch Payloads

Tasks that are dispatched to AWS Batch cannot receive their payload inline, so PayloadBucket.upload_batch_payload writes the payload to:

<root_prefix>/tmp/batch/<payload_id>/<uuid>.json

The Batch task is then invoked with a reference to this key. This path is also ephemeral and cleaned up by the lifecycle rule.

Note

The key format <payload_id>/<uuid>.json (rather than simply <uuid>/input.json) is retained for backwards compatibility with earlier versions of cirrus-lib.

Invalid Payloads

When a payload fails validation in a way that Cirrus wants to preserve for later inspection, it is uploaded to:

<root_prefix>/tmp/invalid/<uuid>.json

This is intended as short-lived diagnostic storage, not a permanent archive of invalid inputs, which is why it lives under <root_prefix>/tmp/. If long-term retention of invalid payloads is required for a particular deployment, the recommended approach is to ship them out to a separate bucket or archive from the handling code rather than to alter the lifecycle of <root_prefix>/tmp/.

Lifecycle and Retention

Cirrus assumes a simple retention model for the payload bucket:

  • Everything under <root_prefix>/tmp/ is ephemeral and should be removed by a bucket lifecycle rule. The reference CloudFormation in this repository configures a 10-day expiration on the <root_prefix>/tmp/ prefix (derived from the configured PayloadRootPrefix), which is a reasonable default that gives oversized and batch payloads enough lifetime to survive retries, backfills, and manual reprocessing.

  • Everything under <root_prefix>/executions/ is persistent and is not expired by any Cirrus-managed rule. These objects are the primary mechanism for reconstructing the history of a workflow run after the fact; losing them will break the get-input-payload and get-output-payload CLI commands and make post-hoc debugging significantly harder.

Operators bringing their own payload bucket should configure a lifecycle rule on <root_prefix>/tmp/ that matches this expectation. A 10-day expiration is a reasonable starting point; shortening it risks removing objects that are still in use by in-flight retries, and extending it past <root_prefix>/tmp/ into other prefixes will silently delete execution history.

Operators who need different retention semantics for execution payloads — for example, cost control on very high-throughput deployments — can add their own lifecycle rules targeting the <root_prefix>/executions/ prefix, keeping in mind that any payload whose execution record is still referenced from the StateDB will become unreachable through the CLI once expired.

Access Patterns

The payload bucket is accessed in a small number of well-defined places:

  • process lambda — uploads execution input payloads via PayloadBucket.upload_input_payload, and uploads oversized payloads via PayloadBucket.upload_oversize_payload before invoking Step Functions.

  • update-state lambda — uploads execution output payloads via PayloadBucket.upload_output_payload on successful completion.

  • Task code — reads oversized payloads via cirrus.lib.utils.payload_from_s3 and, for batch tasks, fetches payload content from the batch prefix.

  • Management CLI — reads execution input and output payloads via PayloadBucket.get_input_payload_url and PayloadBucket.get_output_payload_url, which reconstruct the deterministic S3 URL from a payload_id and execution_id pair looked up in the StateDB.

No Cirrus code writes directly to the bucket without going through PayloadBucket, and the prefix attributes are not intended to be re-implemented elsewhere. Downstream code that needs to locate a payload object by key should import the helpers from cirrus.lib.payload_bucket rather than constructing prefixes by hand, so that any future reorganization of the layout remains a single-file change.