Known issues in Cloudera Data Warehouse on premises 1.5.5 SP2
Review the known issues and limitations that you might run into while using the Cloudera Data Warehouse service in Cloudera Private Cloud Data Services
Known issues identified in 1.5.5 SP2
- DWX-22406: Log router reloader containers might fail with CrashLoopBackOff due to OOMKilled errors
- During Cloudera Data Warehouse on premises environment activation, log-router pods created for the environment might fail with a CrashLoopBackOff status and OOMKilled reason for the reloader container. This occurs because the log router reloader container does not filter ConfigMaps and namespaces correctly and attempts to load many ConfigMaps into memory on large clusters, leading to excessive memory usage and out-of-memory failures.
- DWX-22360:
ArrayIndexOutOfBoundsExceptionwhen querying large MariaDB tables - When querying extremely large tables (containing hundreds of millions
of rows or more) using the MariaDB connector, users may encounter an
ArrayIndexOutOfBoundsException. - DWX-22578: Large queries fail with "Per-Node Memory Limit Exceeded" despite spilling enabled
- Large queries involving heavy joins may fail with a "
per-node memory limit exceeded"error, even if the "spill-to-disk" feature is properly enabled.This issue is primarily caused by missing or inaccurate table statistics, particularly on join columns. Without accurate statistics, the Trino query optimizer may select suboptimal join strategies that cause memory usage to spike rapidly, hitting the node limit before the spill mechanism can react.
- DWX-22392: Trino compute update fails due to entity being leased by a background operation
- Trino's auto-suspend and auto-scaling processes do not account for
manual operations performed by users, such as Start or Update. To prevent conflicts between
manual and automatic operations, a 1-minute delay is introduced, during which the lease of the
Trino Virtual Warehouse is held after a manual operation is executed.
During this delay, attempting another manual operation may result in the error message: "Compute entity is currently ‘leased’ by another internal operation."
- DWX-21891: Delayed metadata loading in Hue for auto-suspended Trino Virtual Warehouses
- When a Hue session starts, Hue sends requests to the Trino
coordinator to fetch database metadata. For large metadata tables, the coordinator divides the
metadata request into "splits" and assigns them to worker nodes.
However, if the Trino Virtual Warehouse is auto-suspended, all worker nodes are stopped, causing the metadata request to be queued. This triggers the auto-start process to spin up the worker nodes. Since it can take a few minutes to start worker nodes in a Kubernetes cluster, Hue may take a long time to load metadata for an auto-suspended Trino Virtual Warehouse.
- CDPD-76644:
information_schema.table_privilegesmetadata is unsupported - Querying the
information_schema.table_privilegesaccess control metadata for ranger is unsupported and a TrinoException is displayed indicating that the connector does not support table privileges. - CDPD-76643/CDPD-76645:
SET AUTHORIZATIONSQL statement does not modify Ranger permissions - The following SQL statements do not dynamically modify the Ranger
permissions:
CREATE SCHEMA test_createschema_authorization_user AUTHORIZATION user; ALTER SCHEMA test_schema_authorization_user SET AUTHORIZATION user; - CDPD-68246: Roles related operations are not authorized by Ranger Trino plugin
- The
SHOW ROLES,SHOW CURRENT ROLES, andSHOW ROLE GRANTstatements are not authorised by Ranger Trino plugin. Users can run these commands without any policy and audits are not generated for these statements. - CDPD-81960: Row filter policy for same resource and same user across different repos is not supported
- When you have row filter policy for same resource and same user in both cm_trino and cm_hive (Hadoop SQL) repos and the row filtering conditions are different, then on querying the table using that user returns empty response in the trino-cli.
- DWX-19626: Number of rows returned by Trino does not match with the Hive query results
- If you are running an exact same query on both Hive and Trino engine
that involves dividing integers, it was observed that the query results returned by Trino does
not match with the query results returned by the Hive engine. This is due to a default
behavior of Trino when dividing two integers. Trino does not cast the result into a
FLOATdata type. - CDPD-94500: Impala query fails on collection with late materialization enabled
- When reading Parquet files from S3 with late materialization enabled, Impala queries might fail with an error indicating a failure to skip values in a column.
