Known issues in Cloudera Data Warehouse on premises 1.5.5 SP2

Review the known issues and limitations that you might run into while using the Cloudera Data Warehouse service in Cloudera Private Cloud Data Services

Known issues identified in 1.5.5 SP2

DWX-22406: Log router reloader containers might fail with CrashLoopBackOff due to OOMKilled errors
During Cloudera Data Warehouse on premises environment activation, log-router pods created for the environment might fail with a CrashLoopBackOff status and OOMKilled reason for the reloader container. This occurs because the log router reloader container does not filter ConfigMaps and namespaces correctly and attempts to load many ConfigMaps into memory on large clusters, leading to excessive memory usage and out-of-memory failures.
Edit the log-router DaemonSet in the affected log router namespace and remove the memory limit from the reloader container by deleting the following lines under the resources field of the reloader container:
Limits:
      memory: 1G
DWX-22360: ArrayIndexOutOfBoundsException when querying large MariaDB tables
When querying extremely large tables (containing hundreds of millions of rows or more) using the MariaDB connector, users may encounter an ArrayIndexOutOfBoundsException.
To avoid this issue, adopt the following data storage strategy:
  • Use MariaDB only for small to medium-sized dimension tables.
  • Offload large fact tables to scalable storage solutions such as HDFS, Ozone, or cloud object stores.
DWX-22578: Large queries fail with "Per-Node Memory Limit Exceeded" despite spilling enabled
Large queries involving heavy joins may fail with a "per-node memory limit exceeded" error, even if the "spill-to-disk" feature is properly enabled.

This issue is primarily caused by missing or inaccurate table statistics, particularly on join columns. Without accurate statistics, the Trino query optimizer may select suboptimal join strategies that cause memory usage to spike rapidly, hitting the node limit before the spill mechanism can react.

Perform the following workarounds:
  • You can force the query to use a partitioned join strategy by running the following session command before your query:
    SET SESSION join_distribution_type = 'PARTITIONED';
  • Alternatively, you can force the optimizer to avoid broadcasting large tables by lowering the size limit. For example:
    SET SESSION join_max_broadcast_table_size = '10MB';

To permanently address the root cause, ensure that table and column statistics are accurate and up-to-date. This allows the optimizer to automatically select the most efficient execution plan without requiring session overrides.

DWX-22392: Trino compute update fails due to entity being leased by a background operation
Trino's auto-suspend and auto-scaling processes do not account for manual operations performed by users, such as Start or Update. To prevent conflicts between manual and automatic operations, a 1-minute delay is introduced, during which the lease of the Trino Virtual Warehouse is held after a manual operation is executed.

During this delay, attempting another manual operation may result in the error message: "Compute entity is currently ‘leased’ by another internal operation."

This behavior is intentional and ensures system stability. Wait for 1 minute after completing a manual operation before initiating another.
DWX-21891: Delayed metadata loading in Hue for auto-suspended Trino Virtual Warehouses
When a Hue session starts, Hue sends requests to the Trino coordinator to fetch database metadata. For large metadata tables, the coordinator divides the metadata request into "splits" and assigns them to worker nodes.

However, if the Trino Virtual Warehouse is auto-suspended, all worker nodes are stopped, causing the metadata request to be queued. This triggers the auto-start process to spin up the worker nodes. Since it can take a few minutes to start worker nodes in a Kubernetes cluster, Hue may take a long time to load metadata for an auto-suspended Trino Virtual Warehouse.

Disable auto-suspend for the Trino Virtual Warehouse and enable auto-scaling with a minimum of one worker node to ensure at least one worker is always available.
CDPD-76644: information_schema.table_privileges metadata is unsupported
Querying the information_schema.table_privileges access control metadata for ranger is unsupported and a TrinoException is displayed indicating that the connector does not support table privileges.
None.
CDPD-76643/CDPD-76645: SET AUTHORIZATION SQL statement does not modify Ranger permissions
The following SQL statements do not dynamically modify the Ranger permissions:
CREATE SCHEMA test_createschema_authorization_user AUTHORIZATION user;
ALTER SCHEMA test_schema_authorization_user SET AUTHORIZATION user;
As an Administrator, you can authorize the permissions from the Ranger Admin UI.
CDPD-68246: Roles related operations are not authorized by Ranger Trino plugin
The SHOW ROLES, SHOW CURRENT ROLES, and SHOW ROLE GRANT statements are not authorised by Ranger Trino plugin. Users can run these commands without any policy and audits are not generated for these statements.
None.
CDPD-81960: Row filter policy for same resource and same user across different repos is not supported
When you have row filter policy for same resource and same user in both cm_trino and cm_hive (Hadoop SQL) repos and the row filtering conditions are different, then on querying the table using that user returns empty response in the trino-cli.
Do not create row filter policies for the same resource and same user in different repos.
DWX-19626: Number of rows returned by Trino does not match with the Hive query results
If you are running an exact same query on both Hive and Trino engine that involves dividing integers, it was observed that the query results returned by Trino does not match with the query results returned by the Hive engine. This is due to a default behavior of Trino when dividing two integers. Trino does not cast the result into a FLOAT data type.
While performing a floating point division on two integers, cast one of the integers to a DOUBLE. For more information, see the Trino documentation.
CDPD-94500: Impala query fails on collection with late materialization enabled
When reading Parquet files from S3 with late materialization enabled, Impala queries might fail with an error indicating a failure to skip values in a column.
Disable the late materialization feature by setting the parquet_late_materialization_threshold query option to -1 in the coordinator flagfile configuration.

Apache Jira: IMPALA-14619