Getting started with Flyte on Mac M1

I have been looking for a pipeline ochestration platform that can run without the struggle of Kubernetes for a year but I have not been successful in my quest. I remember a few months ago I tried to build and install Kubeflow locally on my Mac M1 but I wasn’t successful. It was a hassle for me to run Mubeflow locally and build and deploy my pipeline. I started looking around for a lightweight pipeline orchestration tool that could run on a low resource environment like my local Macbook and in a production environment with a single machine and less than 8GB of RAM. This is how I came across flyte. In this post I will take you through my journey of running a pipeline with flyte locally with k3d on a Macbook.

What is Flyte

Flyte is an open source orchestrator that makes it easy to build production-grade ML pipelines. It is written in Go and uses Kubernetes under the hood to run pipelines. In this post I will describe how I installed Flyte on my Macbook M1 and how I got my pipeline running. It was inspired by the Flyte the hard way tutorial.

Prerequisites

This tutorial assumes that you have docker desktop and Kubernetes installed on your machine. It will be divided into two parts, the first will set up the components needed to run flyte on Mac, and the second will build our first workflow using Flyte. Before we dive into the installation of flyte, we need to make sure that we have a Kubernetes cluster on our local environment.

Installing K3d on the Mac

What is K3d?

Before we dive into what K3d is, let us understand what K3s is. K3s is a production-ready and easy-to-install lightweight version of Kubernetes. It is easy to install and packs Kubernetes and other components into a single, simple launcher. It is one of the recommended ways to run Kubernetes in a low resource environment. K3s is a great tool and runs in a Linux based environment, unfortunately it doesn’t support Mac Os and M1 CPU. This is why k3d exists. K3d is a wrapper to run k3s in Docker. It has the same advantages as k3s.

Installing K3d

Install the environment with the following command:

wget -q -O - https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash.

Whichever command you use will install k3d on your cluster, once you have installed it you are ready to go. See the following page for more information on how to install k3d.

Cluster Creation

Before creating the cluster, I run the following command locally

mkdir k3data.

This will create a folder that will be used to store our cluster data. To create the cluster, run the following command:

` k3d cluster create flyte-cluster -v $PWD/k3data:/data`.

This will create a cluster named flyte-cluster and share the mount of the volume. By mounting the volume, it shares all the items we have in the /data folder in the cluster with the k3data folder on our local machine.

If all goes well, you should be able to see your cluster when you run

k3d cluster list

NAME SERVERS AGENTS LOADBALANCER

flyte-cluster 1/1 0/0 true

With our k3d cluster installed, we can now create the flyte deployment.

Flyte Dependencies

A Flyte cluster depends on two main dependencies:

A relational database:

This stores task status, execution option and all other task metadata.

An S3-compatible object store:

This stores task metadata and data processed by workflows.

Minio

Minio is an S3-compatible object that is installed using a Kubernetes service. To install Minio, create a file in the flyte-k3d directory and name it minio.yaml.

apiVersion: v1
kind: Namespace
metadata:
  name: flyte
  labels:
    name: flyte
---
kind: PersistentVolume
apiVersion: v1
metadata:
  name: minio-pv
  namespace: flyte
spec:
  storageClassName: hostpath
  capacity:
    storage: 25Gi
  accessModes:
  - ReadWriteOnce
  hostPath:
    path: /data
  claimRef:
    name: minio-pvc
    namespace: flyte
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pvc
  namespace: flyte
spec:
  storageClassName: hostpath
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  volumeName: minio-pv
--- 

apiVersion: apps/v1 #  for k8s versions before 1.9.0 use apps/v1beta2  and before 1.8.0 use extensions/v1beta1
kind: Deployment
metadata:
  # This name uniquely identifies the Deployment
  name: minio
  namespace: flyte
  labels:
    app.kubernetes.io/name: minio
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: minio
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        # Label is used as selector in the service.
        app.kubernetes.io/name: minio
    spec:
      containers:
      - name: minio
        # Pulls the default Minio image from Docker Hub
        image: docker.io/bitnami/minio:2024
        imagePullPolicy: IfNotPresent
        env:
        # Minio access key and secret key
          - name: MINIO_ROOT_USER
            value: minio 
          - name: MINIO_ROOT_PASSWORD
            value: miniostorage
          - name: MINIO_DEFAULT_BUCKETS
            value: flyte
          - name: MINIO_DATA_DIR
            value: "/data"
          - name: MINIO_BROWSER_LOGIN_ANIMATION
            value: 'off'
        ports:
        - containerPort: 9000
          name: minio
        - containerPort: 9001
          name: minio-console
        # Mount the volume into the pod
        resources: 
          limits:
            cpu: 200m
            memory: 512Mi
          requests:
            cpu: 10m
            memory: 128Mi
        volumeMounts:
        - name:  minio-storage # must match the volume name, above
          mountPath: "/data"
      volumes:
      - name: minio-storage
        persistentVolumeClaim:
          claimName: minio-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: minio
  namespace: flyte
  labels: 
    app.kubernetes.io/name: minio
spec:
  type: NodePort
  ports:
  - name: minio
    nodePort: 30084
    port: 9000
    protocol: TCP
    targetPort: minio
  - name: minio-console
    nodePort: 30088
    port: 9001
    protocol: TCP
    targetPort: minio-console
  selector: 
    app.kubernetes.io/name: minio

Running kubectl apply -f flyte-k3d/minio.yaml will create the following components:

A namespace called flyte, which will contain all deployments related to flyte.
A persistent volume and a persistent volume claim (PVC): These two objects will help us to use storage in our service. Note that for the persistent volume we specified the hostpath to be /data, it is the path where we mount our volume in the cluster.
The minio deployment, which is an object that will define the pod and how we will run our object storage.

In this deployment we have created a volume mount that will use our persistent volume claim and mount the storage to /data.

A note on storage

When creating the cluster, we mounted the k3data folder into the /data kubernetes cluster. Then with the service and the Pv, we mount the /data folder in our cluster to /data in our minio pod.

IMAGE ON STORAGE HERE

With the volume mount set up this way, if we delete a pod, the pod data will be backed up to the k3data folder. Each time we rebuild the service with the same setup, we will be able to restore the data from our last run.

PS: Note that we have hardcoded the PASSWORD in our environment. This is not good practice for a production environment, we should use secrets for the password.

The service, will allow an external service to interact with minio. It exposes the two minio ports, the console port on 9001 and the main port on 9000.

To test if the installation is running, you can run: kubectl get pods -n flyte.

Accessing the Minio Storage

This will show the status of the running minio server.

To access the minio UI, you will need to do a port forwarding to redirect anything running on port 9001. You can do this by running this in a console terminal.

kubectl -n flyte port-forward --address 127.0.0.1 --address ::1 service/minio 9001:9001.

And in another console terminal, run:

kubectl -n flyte port-forward --address 127.0.0.1 --address ::1 service/minio 9000:9000.

With these two commands you can access the minio server in the browser at localhost:9000. Once we have installed the minio server, we can do the same to install the postgres database.

PUT The image of the server here

Postgres

It uses similar components to minio server, the pv, the pvc, the service and the deployment.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgresql-pvc
  namespace: flyte
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: flyte
  labels: 
    app.kubernetes.io/name: postgres
spec:
  type: NodePort
  ports:
  - name: postgres
    port: 5432
    nodePort: 30089
    protocol: TCP
    targetPort: postgres
  selector: 
    app.kubernetes.io/name: postgres
  
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: flyte
  labels: 
    app.kubernetes.io/name: postgres
spec:
  replicas: 1
  selector:
    matchLabels: 
      app.kubernetes.io/name: postgres
  template:
    metadata:
      labels: 
        app.kubernetes.io/name: postgres
    spec:
      containers:
      - image: "ecr.flyte.org/ubuntu/postgres:13-21.04_beta"
        imagePullPolicy: "IfNotPresent"
        name: postgres
        env:
        - name: POSTGRES_PASSWORD
          value: postgres #Change it to a different value if needed
        - name: POSTGRES_USER
          value: flyte
        - name: POSTGRES_DB
          value: flyte
        ports:
        - containerPort: 5432
          name: postgres
        resources: 
          limits:
            cpu: 1000m
            memory: 512Mi
          requests:
            cpu: 10m
            memory: 128Mi
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
      volumes:
        - name: postgres-storage
          persistentVolumeClaim:
            claimName: postgresql-pvc

This code creates a pod that will run the postgres database instance.

kubectl apply -f flyte-k3d/postgres.yaml .

You can use kubectl get pods -n flyte to make sure that all pods associated with flyte are running.

NAME READY STATUS RESTARTS AGE

minio-6bfbc8fd5c-rp72r 1/1 Running 0 25h

postgres-7dc7747447-wtgf7 1/1 Running 0 25h

Having installed the postgres database and the minio object store, we now need to install Flyte.

Flyte

Flyte Architecture.

A Flyte deployment is made up of several components grouped into 3 categories: The user plane, the control plane and the data plane.

You can learn more about the architecture and those components here.

With this architecture, flyte have made our job easier by giving us two ways to install the flyte. We can install it as a single binary or as core.

Single binary installs all Flyte components in a single pod. This setup is useful for environments with limited resources and a need for quick setup. With core, all components come as standalone pods and sometimes with replicas. For our use case, as you may have guessed, we will install flyte as a single binary.

Installing

To avoid storing the DB password in clear text in the values file, we use a feature of the flyte binary chart that allows us to consume pre-created secrets.

kubectl create -f flyte-k3d/local-secrets.yaml

Update the Helm Values

We will install flyte using flyte binary helm chart. That chart has the placeholder for database and cloud storage configuration.

Here is how those values look like:

configuration:
  database:
    username: flyte
    host: postgres.flyte.svc.cluster.local
    dbname: flyte
  storage:
    type: minio
    metadataContainer: flyte #This is the default bucket created with Minio. Controlled by the MINIO_DEFAULT_BUCKETS env var in the local-flyte-resources.yaml manifest
    userDataContainer: flyte
    provider: s3
    providerConfig:
      s3:
        authType: "accesskey"
        endpoint: "http://minio.flyte.svc.cluster.local:9000"
        accessKey: "minio"
        secretKey: "miniostorage" #If you need to change this parameters, refer to the local-flyte-resources.yaml manifest and adjust the MINIO_ROOT_PASSWORD env var
        disableSSL: "true"
        secure: "false"
        v2Signing: "true"

  inlineSecretRef: flyte-binary-inline-config-secret
  inline:
    plugins:
      k8s:
        inject-finalizer: true
        default-env-vars:
          - FLYTE_AWS_ENDPOINT: "http://minio.flyte.svc.cluster.local:9000"
          - FLYTE_AWS_ACCESS_KEY_ID: "minio"
          - FLYTE_AWS_SECRET_ACCESS_KEY: "miniostorage" #Use the same value as the MINIO_ROOT_PASSWORD

    task_resources:
      defaults:
        cpu: 100m
        memory: 500Mi #change default requested resources and limits to fit your needs
      limits:
        memory: 2Gi

serviceAccount:
  create: true

With those values, we can now create our flyte binary pod.

helm install flyte-binary flyteorg/flyte-binary --values flyte-k3d/onprem-flyte-binary-values.yaml -n flyte

This will create a flyte binary pod, it comes with two separate services: A grpc service and a http service.

Once the helm chart is installed we can access it, we can create a two port forwarding session.

One for the http and one for the grpc.

kubectl -n flyte port-forward service/flyte-binary-grpc 8089:8089

kubectl -n flyte port-forward service/flyte-binary-http 8088:8088

The http session gives us access to the flyte console UI. From this UI we can manage and trigger our workflows.

We will submit workflows via the Grpc UI.

Configuring Flyte

To set up the Flyte connection, we need to install the flytectl locally and generate a config.

pip install -U flytekit

Will install Flytekit locally.

Once that is installed you can run:

` flytectl config init` to create the config file. Inside that config file add the following:

admin:

# For GRPC endpoints you might want to use dns:///flyte.myexample.com

endpoint: localhost:8089

authType: Pkce

insecure: true

logger:

show-source: true

level: 6

An update on config

For some reason to make the flyte application to connect to local storage, you need to update your host file with the following files:

sudo vi /etc/hosts

127.0.0.1 minio.flyte.svc.cluster.local