Skip to content

IBM Process Mining Architecture on IBM Cloud

Abstract

This document describes the deployment of IBM Process Mining on the RedHat OpenShift Kubernetes Service on IBM Cloud, known as ROKS, on Virtual Private Cloud (VPC) Gen 2 infrastructure.

PM-topology

As you can see in the topology above, the RedHat OpenShift Kubernetes Service cluster has been deployed on a MultiZone Region (MZR) data center with three availability zones (AZs) where Virtual Private Cloud (VPC) Gen 2 is available.

Warning

IBM Process Mining requires ReadWriteMany (RWX) storage. In order to offer Read Write Many (RWX) storage for the applications running on your RedHat OpenShift Kubernetes Service cluster on Virtual Private Cloud (VPC) Gen 2, you need to make OpenShift Data Foundation available in our RedHat OpenShift cluster.

OpenShift Data Foundation (ODF) is a storage solution that consists of open source technologies Ceph, Noobaa, and Rook. ODF allows you to provision and manage File, Block, and Object storage for your containerized workloads in Red Hat® OpenShift® on IBM Cloud™ clusters. Unlike other storage solutions where you might need to configure separate drivers and operators for each type of storage, ODF is a unified solution capable of adapting or scaling to your storage needs.

In order to install OpenShift Data Foundation (ODF) in your RedHat OpenShift Kubernetes Service (ROKS) cluster on IBM Cloud on a Virtual Private Cloud (VPC) Gen 2, you need to make sure that your RedHat OpenShift Kubernetes Service cluster counts with at least three worker nodes. For high availability, you must create your RedHat OpenShift Kubernetes Service cluster with at least one worker node per zone across the three zones. Each worker node must have a minimum of 16 CPUs and 64 GB RAM.

Important

The storageClass used to configure OpenShift Data Foundation to request storage volumes must be of type metro. What metro means is that the volumeBindingMode of that storageClass will be set to WaitForFirstConsumer as opposed to the default Immediate. And what that means is that the Persistent Volume creation and allocation by the IBM Cloud Object Storage, as a result of its Persistent Volume Claim, will not happen until the pod linked to that Persistent Volume Claim is scheduled. This allows IBM Cloud Object Storage to know what Availability Zone of your MultiZone Region cluster the pod requesting block storage ended up on and, as a result, to be able to provision such storage in the appropriate place. Otherwise, if we used a storageClass whose volumeBindingMode was the default Immediate, IBM Cloud Object Storage would create and allocate the Persistent Volume in one of the Availability Zones which might not be the same Availability Zone the pod requiring such storage ends up on as a result of the OpenShift pod scheduler which would make the storage inaccessible to the pod. See Kubernetes official documentation here for further detail.

Important

The storageClass you need to configure OpenShift Data Foundation to use with must not have Retain Reclaim policy. If you retain the Persistent Volume, it might end up assigned to a pod in a different Availability Zone later, making that storage inaccessible to the pod allocated to.

Therefore, the storageClassName you need to configure OpenShift Data Foundation to use with, in the deployment section, will need to be of either ibmc-vpc-block-metro-10iops-tier, ibmc-vpc-block-metro-5iops-tier or ibmc-vpc-block-metro-custom types.

Storage

The full install of IBM Process mining requires two mandatory persistent volumes and two optional persistent volumes. The mandatory persistent volumes are for storing process mining events and task mining events. The optional storage is for IBM DB2 and MongoDB. IBM Process Mining requires MongoDB for the process mining component and IBM DB2 (or optionally Mysql) for the task mining component. The IBM Process Mining Operator will automatically install an embedded MongoDB and IDM DB2 by default. However, this deployment is suitable for demonstration or evaluation use cases. For production environments, where performance is more important, please configure your process mining component and task mining component with a external database that you provisioned yourself. For production environments the following databases are required:

  • Mongo DB v3.6 or higher for the process mining component.
  • IBM DB2 11.5.6.0 for the task mining component.

See the Links section at the bottom for more information on storage for IBM Process Mining.

Security

TLS certificates are mandatory to secure the exposed routes of the application.

The certificates are required for the following routes:

  • Process Mining public Rest API.
  • Task Mining REST API for Agent and Designer integration.

In a default installation, self-signed certificates are automatically created by the operator and no further action is required. However, for a production environment, your own certificates that are issued by a trusted CA should be provided within the ProcessMining CSV.

See the Links section at the bottom for more information on security and certificates for IBM Process Mining.

High Availability

It is recommended that for production environments IBM Process Mining is installed highly available for better resiliency. All components of IBM Process Mining can be highly available deployed except from the embedded IBM DB2 and MongoDB components, which are not recommended for production environments and external self provisioned MongoDB and IBM DB2 databases are strongly recommended instead (see Storage section above).

The highly available deployment of each of the IBM Process Mining components can be through the IBM Process Mining Custom Resource Definition (CRD) when installing your IBM Process Mining instance.

See the Links section at the bottom for more information on deployment profiles for IBM Process Mining.

Backup and Restore

Back up and restore procedure for Process Mining

The Process Mining component stores information in two places.

  1. Raw events and process analysis by directly accessing the file system, by using a persistent volume.
  2. More meta-information on processes, organizations, user-profiling on Mongo DB. The Mongo DB instance can be external or hosted in a cluster Pod. In the latter case, it stores the information on a persistent volume.

Back up and Restore Procedure for Task Mining

The Task Mining component stores information in two places:

  1. Task events and activity logs by directly accessing the file system, by using a persistent volume.
  2. Process and workflow metadata in IBM DB2 (or MySQL) DB. The IBM DB2 (or MySql) DB can be provided externally or it can be hosted in a cluster Pod. In the latter case, it will be an IBM DB2 DB and it will store the information on a persistent volume.

See the Links section at the bottom for full detail on how to backup and restore the process mining and task mining components of IBM Process Mining.

Sizing

It is strongly recommended that you carefully read the official IBM Process Mining Sizing documentation here.

Minimum Resources required for IBM Process Mining

This basic setup that does not require any specific configuration. It is only intended for non-production environments.

Software Memory (GB) CPU (cores) Disk (GB) Nodes
Process Mining 64 16 100 1
Task Mining 16 4 100 1
Total 80 20 200 1

Production Setup

Listed below are three sizing configurations for IBM Process Mining. In order to appropriately sizing your Red Hat OpenShift cluster on IBM Cloud, you need to consider data volumes and data complexity (i.e. number of events), plus the number of concurrent users active on the application; it’s not so important how many users are working, but what they are doing concurrently.

No Events Software Memory (GB) CPU (cores) Disk (GB) Nodes
Up to 10M Process Mining 64 16 300 1
Up to 50M Process Mining 128 32 600 1
Up to 100M Process Mining 192 48 1000 1

For Task Mining it is suggested a common configuration:

Software Memory (GB) CPU (cores) Disk (GB) Nodes
Task Mining 32 8 300 1

Highly Available Setup

The following table details the minimum resources required for installing IBM Process Mining on Red Hat OpenShift for 10 concurrent users with up to 50 Million events with each of the IBM Process Mining components highly available

Software Memory (GB) CPU (cores) Disk (GB) Nodes
Process Mining 128 48 200 3
Task Mining 32 8 300 3
Total 160 56 500 3

See the Deployment Profiles section in the official IBM Process Mining documentation for a better understanding of the CRDs that will deploy IBM Process Mining instances with the sizing characteristics explained above.

Summary

As you can see in the topology diagram above for the production reference architecture of IBM Process Mining on the RedHat OpenShift Kubernetes Service (ROKS) on IBM Cloud on Virtual Private Cloud (VPC) Gen 2, we strongly recommend you create your RedHat OpenShift Kubernetes Service cluster with 6 worker nodes, where three of them could be used for OpenShift Data Foundation only (see tip below) while the other three would be reserved to run the IBM Process Mining components. You can review above in this section the sizes for each of the worker nodes running OpenShift Data Foundation as well as the total sizing of the other three worker nodes running the IBM Process Mining components. Finally, make sure you are using a metro storage class with delete reclaim policy for OpenShift Data Foundation which will provide IBM Process Mining the required Read Write Many (RWX) storage.

Tip

You could use labels, tolerations and taints if you want that OpenShift Data Foundation specific workloads get deployed on completely separate worker nodes as your application workloads (IBM Process Mining in this case). Check the OpenShift Data Foundation documentation here and pay attention to the workerNodes parameter.