Known issues for Cloudera Data Services on premises 1.5.5 SP2

List about the known issues and limitations, their areas of impact, and workarounds in Cloudera Data Services on premises 1.5.5 SP2.

The known issues in Cloudera Data Services on premises 1.5.5 are carried into Cloudera Data Services on premises1.5.5 SP2.

For more information, see Known Issues.

For more information on 1.5.5 SP1 known issues, see Known Issues.

Known issue identified in 1.5.5 SP2

The following are the known issues identified in 1.5.5 SP2:

OPSX-6950 - DRS Restore fails due to ClusterIP allocation conflict: During a DRS restore, the restore operation can fail with a Kubernetes error indicating that a service ClusterIP is already allocated. This occurs when the restore process attempts to recreate a service using a ClusterIP that is currently in use by another existing service in the cluster.
A typical error message looks like:
service "cdp-release-cert-manager-cainjector" is invalid: spec.clusterIPs: failed to allocate IP <IP_ADDRESS>: provided IP is already allocated; To resolve this issue:

Identify the service using the conflicting IP address: kubectl get svc -A -o wide | grep <IP_ADDRESS>

Retry the DRS restore. On retrying, the restore process will clean up existing conflicting resources (including the service holding the conflicting IP) before proceeding with the restore. If you cannot access the Cloudera Control Plane UI to retry the restore, please contact support for assistance.

OPSX-6867 - Post upgrade validation fails due to longhorn-system pods in CrashLoopBackOff: During an upgrade to 1.5.5 SP2, some Longhorn CSI plugin pods gets into a terminating state, which causes the upgrade to fail. This problem has been observed when Longhorn is not configured to use dedicated disks, leading to instability in the storage components and preventing proper pod shutdown and restart.; To address this issue:

For ECS customers, Longhorn must be deployed only on dedicated disks; shared disks can cause pods to shutdown during the upgrade.

Delete the Longhorn CSI plugin pods that are in the terminating state. Removing these pods allows the system to clear the stuck resources and unblocks the Longhorn components.

Retry the upgrade — on retry, the process will proceed successfully once the conflicting pods are removed.

OPSX-6858 - Cloudera Embedded Container Service first run fails at install-cp step due to mke2fs failure: During some Cloudera Embedded Container Service installations, the first-run process fails at the install-cp step because certain pods remain in a Creating state. The underlying cause is a failure to mount the associated Longhorn volume, which leads to an error when Kubernetes tries to format the block device. The pod event shows a message as follows:
Warning FailedMount ... MountVolume.MountDevice failed for volume "pvc-…" rpc error: code = Internal desc = format of disk "/dev/longhorn/pvc-…" failed: … mke2fs … /dev/longhorn/pvc-… is apparently in use by the system; will not make a filesystem here!
This happens when the Longhorn PVC’s block device still contains stale filesystem or partition metadata from previous use. Because the device appears in use the mke2fs command cannot create a new filesystem, blocking the pod from starting.; To resolve this issue, you can manually clear the remaining metadata on the Longhorn block device on the node where the pod is running. For example, run the wipefs command on the affected device:
wipefs /dev/longhorn/pvc-2e2dc23b-82d6-45cd-9348-b40eba0fb4e1
This removes existing filesystem signatures so that Longhorn (and Kubernetes) can format and mount the volume successfully. After cleaning up the metadata, retry the Cloudera Embedded Container Service installation — the pods should be able to proceed past the install-cp step.