Couler

Unified Machine Learning Workflow Optimization in Cloud

What is Couler?

  • Couler is a system designed for unified machine learning workflow optimization in the cloud. Couler endeavors to provide a unified interface for constructing and optimizing workflows across various workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow. Couler enhances workflow efficiency through features like Autonomous Workflow Construction, Automatic Artifact Caching Mechanisms, Big Workflow Auto Parallelism Optimization, and Automatic Hyperparameters Tuning.
  • Couler is included in CNCF Cloud Native Landscape and LF AI Landscape.
  • Check out our technical report published on ICDE 2024 here.

Note that while one of ambitious goals of Couler is to support multiple workflow engines, Couler currently only supports Argo Workflows as the workflow orchestration backend. An ambitious goal of Couler is to provide support for multiple workflow engines. While it initially supported only Argo Workflows for workflow orchestration, efforts are now underway to extend support to other workflow engines such as Tekton Pipelines and Apache Airflow. In addition, if you are looking for a Python SDK that provides access to all the available features from Argo Workflows, you might want to check out the low-level Python SDK maintained by the Argo Workflows team.

Who uses Couler?

You can find a list of organizations who are using Couler in ADOPTERS.md. If you’d like to add your organization to the list, please send us a pull request.

Why use Couler?

Many workflow engines exist nowadays, e.g. Argo Workflows, Tekton Pipelines, and Apache Airflow. However, their programming experience varies and they have different level of abstractions that are often obscure and complex. The code snippets below are some examples for constructing workflows using Apache Airflow and Kubeflow Pipelines.

Apache Airflow Kubeflow Pipelines

```python def create_dag(dag_id, schedule, dag_number, default_args): def hello_world_py(*args): print('Hello World') dag = DAG(dag_id, schedule_interval=schedule, default_args=default_args) with dag: t1 = PythonOperator( task_id='hello_world', python_callable=hello_world_py, dag_number=dag_number) return dag for n in range(1, 10): default_args = {'owner': 'airflow', 'start_date': datetime(2018, 1, 1) } globals()[dag_id] = create_dag( 'hello_world_{}'.format(str(n)), '@daily', n, default_args) ```

```python class FlipCoinOp(dsl.ContainerOp): """Flip a coin and output heads or tails randomly.""" def __init__(self): super(FlipCoinOp, self).__init__( name='Flip', image='python:alpine3.6', command=['sh', '-c'], arguments=['python -c "import random; result = \'heads\' if random.randint(0,1) == 0 ' 'else \'tails\'; print(result)" | tee /tmp/output'], file_outputs={'output': '/tmp/output'}) class PrintOp(dsl.ContainerOp): """Print a message.""" def __init__(self, msg): super(PrintOp, self).__init__( name='Print', image='alpine:3.6', command=['echo', msg], ) # define the recursive operation @graph_component def flip_component(flip_result): print_flip = PrintOp(flip_result) flipA = FlipCoinOp().after(print_flip) with dsl.Condition(flipA.output == 'heads'): flip_component(flipA.output) @dsl.pipeline( name='pipeline flip coin', description='shows how to use graph_component.' ) def recursive(): flipA = FlipCoinOp() flipB = FlipCoinOp() flip_loop = flip_component(flipA.output) flip_loop.after(flipB) PrintOp('cool, it is over. %s' % flipA.output).after(flip_loop) ```

Couler is a system for unified Mechine Learning (ML) workflow optimization in cloud and the contributions are outlined below::

  • Simplicity and Extensibility: Couler provides a unified programming interface for workflow definition, ensuring independence from the workflow engine and compatibility with various workflow engines such as Argo Workflows, Airflow, and Tekton.
  • Automation: Couler integrates LLMs in unified programming code generation. By leveraging LLMs, Couler facilitates the generation of unified programming code using NL descriptions. Additionally, we automate hyperparameters tuning through the integration of Dataset Card and Model Card, enhancing the effectiveness of the autoML process.
  • Efficiency: Couler introduces the Intermediate Representative (IR) to depict the workflow Directed Acyclic Graph (DAG), optimizing extensive workflow computations by dividing a large workflow into smaller ones for auto-parallelism optimization. Couler also implements dynamic caching of artifacts, which are the outputs of jobs in the workflow, to minimize redundant computations and ensure fault tolerance.
  • Open Source Community: The released open-source version of Couler has garnered adoption from multiple companies and end-users. For instance, over 3000 end users are utilizing Couler within Ant Group, and more than 20 companies have adopted Couler as their default workflow engine interface.