Known issues in 1.5.5 SP2
You might run into some known issues while using Cloudera AI on premises 1.5.5 SP2.
Cloudera AI on premises 1.5.5 and 1.5.5 SP1 existing known issues are carried into Cloudera AI on premises 1.5.5 SP2.
Upgrade management
- DSE-50441: Unable to register or deploy Model with the 1.5.4_h30 → 1.5.5 SP1 → 1.5.5 SP2 upgrade path and DSE-50487: Add workaround for upgrading AI registry from 1.5.4 - 1.5.5
-
During the Cloudera AI Registry upgrade process, a new Cloudera AI Registry instance is provisioned, and the database is assigned a fresh Persistent Volume (PV). The upgrade workflow is designed to replace this fresh PV with the old Cloudera AI Registry PV to retain historical data.
A race condition occurs between the application startup and storage orchestration. The database schema migration logic might execute on the fresh PV before the volume swap sequence is completed. As a result, when the legacy PV is eventually attached, it lacks the necessary schema updates, leading to query failures caused by schema mismatches.
Workaround
Restart the Cloudera AI Registry v1 pod. This action forces the application to reinitialize, detect the correctly attached legacy PV, and apply the pending schema migrations.
- DSE-50712: On premises 1.5.5 SP2 Cloudera AI Registry on OpenShift
Container Service: 1.5.4 SP2 to 1.5.5 SP2 upgrade fails with
knox Init:ImagePullBackOfffailure -
An upgrade failure occurs when upgrading the on premises Cloudera AI Registry from 1.5.4 SP2 version to 1.5.5 SP2 version. The failure is caused by an
ImagePullBackOfferror during Knox initialization.Workaround
Upgrading directly from 1.5.4 SP2 to 1.5.5 SP2 is not the supported upgrade path. The recommended procedure is to upgrade first to 1.5.5 GA or 1.5.5 SP1 before proceeding to 1.5.5 SP2.
Log management
- DSE-49031: Add fluent bit sidecar for model endpoints in serving-default namespace and DSE-49032: Add fluent bit sidecar for knative pods
-
The diagnostic bundle includes live cluster logs from all namespaces, however the archived logs are limited to the cml-serving infra namespace.
Workaround: None.
Quota management
- DSE-50530: Error observed when configuring custom user or team quota
-
When setting custom CPU or memory quotas for a user or team without specifying a GPU quota, the system displays the
Request must contain at least one accelerator user quotaerror message.Workaround
To resolve this issue, always include a GPU quota value when updating CPU or memory quotas for a user or team.
- DSE-49793: Different default quota in heterogeneous GPU for users and teams
-
In configurations with heterogeneous GPUs, the default GPU quota is common for both users and teams. This limitation requires users to configure a single default GPU quota that applies to both users and teams.
Cloudera recommends setting a single default GPU quota that is suitable for both users and teams.
Workbench management
- DSE-49514: Review of the user resources and workbench resources does not display the right data
-
GPU resources are discovered by the workbench even if no GPUs were added during the creation or the provisioning of the workbench. The workbench automatically discovers all GPUs available in the cluster regardless of whether they were allocated to the workbench. As a result, the
User Resourcessection inaccurately displays the total number of GPUs in the cluster instead of reflecting the GPUs assigned to the workbench.Workaround: None.
Model training
- DSE-48824: Cloudera AI Registry does not work with Google Chrome browser version 142 or higher
- Cloudera AI Registry does not function properly when accessed using
Google Chrome version 142 or higher. The Model Endpoints page fails
to load and displays the following error message:
Error occurred while communicating with Cloudera AI Registry in environment '<ENVIRONMENT_NAME>'.Workaround
To resolve this issue, either downgrade Google Chrome to a version lower than 142 or use an alternative browser.
Site Administration
- DSE-36561: Updating Cloudera AI applications fails if the Allow users to use ML runtime addons option is disabled
-
The Allow users to use ML runtime addons option is enabled by default in Cloudera AI on premises. However, if this option is disabled for any configuration activity, updating a Cloudera AI application can fail with the Whoops, there was an unexpected error HTTP 400 error message.
Workaround:
To address this issue, navigate to the page and enable the Allow users to use ML runtime addons option.
Model serving
- DSE-50375: Remove provisioning to update S3 bucket field in update storage configuration for Cloudera AI Inference service instance
-
The Cloudera AI Inference service does not use the S3 bucket option, consequently the secret does not contain S3 fields either.
Workaround: None.
- DSE-48823: Cloudera AI Inference does not work with Google Chrome browser version 142 or higher
-
Cloudera AI Inference service does not function properly when accessed using Google Chrome version 142 or higher. The Model Endpoints page fails to load and displays the following error message:
Error occurred while communicating with Cloudera AI Inference service in environment '<ENVIRONMENT_NAME>'.Workaround
To resolve this issue, either downgrade Google Chrome to a version lower than 142 or use an alternative browser.
- Cloudera AI Inference service Known issues
-
-
Updating the description after a model has been added to a model endpoint will lead to a UI mismatch in the model builder for models listed by the model builder and the deployed models.
-
Embedding models function in two modes: query or passage. This has to be specified when interacting with the models in one of the following ways:
-
Suffixing the model ID in the payload by either -query or -passage
-
Specifying the input_type parameter in the request payload.
For more information, see NVIDIA documentation.
-
-
Embedding models only accept strings as input. Token stream input is currently not supported.
-
Llama 3.2 Vision models are not supported on AWS on A10G and L40S GPUs.
-
