Getting started with Cirrus
What is Cirrus?
Cirrus is a STAC-based geospatial processing pipeline built using a serverless and scalable architecture. Cirrus can scale from tiny workloads of tens of items to massive workloads of millions of items in both a cost-efficient and performance-efficient manner, regardless if your pipeline processing takes seconds, hours, or longer.
Cirrus is made up of cirrus-geo, a cli-based project management and deploy tool, as well as cirrus-lib, a Python library providing a number of useful abstractions solving common needs for users writing their own Cirrus components.
Why Cirrus?
Concepts
STAC-based workflows
A key principal of Cirrus is the use of the STAC metadata specification as a central tenant of the Cirrus Process Payload format. In this way Cirrus encourages a highly-interoperable, metadata-first focus for both pipeline operators and end-users alike.
Cirrus pipelines are, ideally, STAC-in and STAC-out, ensuring compatibility with the full range of tooling and services available in the STAC ecosystem. Though opinionated in this respect, Cirrus remains flexible to accommodate varied use-cases and data sources, such that input format requirements can be relaxed as needed for a given workflow.
Cirrus Components
Cirrus is organized into reusable blocks called Components, which can be broken down into three main types:
Feeders: take arbitrary input in and create a Cirrus Process Payload, which is enqueue for processing
Tasks: the basic unit of work in a Workflow, uses a Cirrus Process Payload for both input and output
Workflows: a set of Tasks implementing a processing pipeline to transform a given input into one or more output STAC items
An additional component type is that of a Function, though they are less commonly extended by end users.
Horizontal and vertical scaling
Cirrus can scale both horizontally and vertically to match the requirements of diverse workloads.
Cirrus supports scaling workflow execution capacity as-needed without requiring expensive capacity reservations to support peak demands. This scaling can accommodate anything from highly intermittent one-off executions to massively parallel processing across hundreds of thousands of simultaneous workflow executions (or more).
Vertical scaling support also allow compute resources to matched to different workloads/requirements within a workflow execution. In other words, executions are not tied to a specific instance for their duration, but can instead utilize optimal instance sizes/types on a per-task basis.
Relationship with stac-server
Cirrus Workflows create STAC items, which are stored in S3 for persistence and can be published to stac-server (or any other STAC API) for indexing/search. In other words, Cirrus generates the data, stac-server makes it easily accessible to end-users and the whole world of STAC tooling.
Example use cases
AWS services used
Cirrus is built on top of a number of AWS services that allow its serverless and scalable architecture, including:
Lambda: underlays tasks, feeders, and functions
Batch: supports longer runtimes and/or custom resource requirements for feeders and tasks
SNS: messages to multiple subscribers
SQS: message queuing for reliability
DynamoDB: State-tracking database
Step Functions: multi-step functions underlying workflows
ECR: image hosting for batch and lambda containers
IAM: function roles and associated permissions/access policies
S3: persistent storage for input payloads and generated items and their assets
CloudFormation: infrastructure-as-code and deployment automation
EventBridge: trigger processing on specific events, like workflow completion
Where to go next?
New Cirrus users may want to progress through the Cirrus documentation following different paths, depending on their role. We’ve broken down a few tracks for key Cirrus user types: work through the list of docs for your role in the order provided, before branching out to the rest of the docs as necessary.
Infrastructure Engineers
Those that are deploying Cirrus and managing the Cirrus infrastructure.
Framework Users
Those that are configuring, operating, and monitoring pipeline workflows.
Algorithm Developers
Those writing code to be run as Cirrus tasks within workflows.
cirrus-lib
documentation