What's new in Cloudera Data Warehouse on cloud

Review the new features introduced in this release of Cloudera Data Warehouse on cloud, version 1.12.1-b259.

What's new in Cloudera Data Warehouse on cloud

Azure AKS 1.34 upgrade
Cloudera supports the Azure Kubernetes Service (AKS) version 1.34. In 1.12.1-b259 (released March 31, 2026), when you activate an Environment, Cloudera Data Warehouse automatically provisions AKS 1.34. To upgrade to AKS 1.34 from a lower version of Cloudera Data Warehouse, you must backup and restore Cloudera Data Warehouse.
AWS EKS 1.34 upgrade
Cloudera supports the AWS Elastic Kubernetes Service (EKS) version 1.34. In 1.12.1-b259 (released March 31, 2026), when you activate an Environment, Cloudera Data Warehouse automatically provisions EKS 1.34. To upgrade to EKS 1.34 from a lower version of Cloudera Data Warehouse, you must backup and restore Cloudera Data Warehouse.
Removal of Unified Analytics
In this release, the Unified Analytics framework, including the Impala Virtual Warehouse implementation, is fully removed from Cloudera Data Warehouse on cloud. All remaining Unified Analytics components, configuration paths, and UI flows are either cleaned up or migrated to the standard Impala virtual warehouse architecture. This change simplifies operations and ensures continued support for existing Impala workloads on the current platform.

What's new in Cloudera Data Explorer (Hue) on Cloudera Data Warehouse on cloud

Product branding update
Starting with this release, the product component previously known as Hue has been renamed to Cloudera Data Explorer (Hue). This change reflects UI and an updated branding initiative and will be rolled out in phases.

As part of this release, you may notice:

  • A new logo displayed in the UI
  • The service name updated to Data Explorer in the UI
  • The new product name reflected in documentation

Some UI references may still display the previous name as the branding update is completed incrementally in future releases.

There is no functional impact associated with this change. All existing configurations, workflows, and integrations continue to work as before.

Enhanced session security for Cloudera Data Explorer (Hue)
Data Explorer now includes security for the session ID (sessionid) cookie. This enhancement helps prevent unauthorized access results in data exposure, unauthorized query execution, and job submission across connected Data Explorer services.
For more information, see Securing sessions .
Facts supports in SQL AI Assistant
You can now define custom system instructions to guide the SQL AI Assistant in generating more accurate queries based on your specific business logic. This enhancement supports complex, cross-database workflows by allowing you to persist organizational context in the Assistant settings.
For more information, see Fact support for SQL query.
Data Explorer support for the boto3 SDK
Data Explorer now supports the boto3 SDK for accessing AWS S3. This update replaces the legacy connector framework to provide improved performance and compatibility with AWS services.
To ensure a smooth transition, the system automatically converts your existing configurations to the new connector system. This feature is enabled by default, but you can manually disable the feature flag if necessary.
For more information, see Enabling the S3 File Browser for Cloudera Data Explorer (Hue) in Cloudera Data Warehouse with RAZ and Enabling the S3 File Browser for Cloudera Data Explorer (Hue) in Cloudera Data Warehouse without RAZ.

What's new in Hive on Cloudera Data Warehouse on cloud

Small file warnings in console
The MSCK and ANALYZE commands now display a warning in the console if the average file size for a table or partition is below the threshold. This helps you identify small files that might affect performance.

For more information, see Statistics generation and viewing commands in Cloudera Data Warehouse

Performance improvement for column changes
The ALTER CHANGE COLUMN command is now faster for tables that have many partitions. This change prevents the command from performing a separate Metastore service call to update column statistics for every partition, which previously caused long execution times and timeouts. For large partitioned tables, the execution time is reduced from hours to minutes.

Apache Jira: HIVE-28346

Hive Query History Service
The Hive query history service provides a scalable solution for storing and analyzing historical Hive query data. It captures detailed information about completed queries, such as runtime, accessed tables, errors, and metadata, and stores it in an efficient Iceberg table format. For more information see, Hive query history service

What's new in Iceberg on Cloudera Data Warehouse on cloud

Table repair feature support for Iceberg tables
Impala introduces the repair_metadata() function for Iceberg tables. This function provides a self-service recovery path to recover Iceberg tables that are inaccessible due to missing data files after manual file deletions in the underlying storage. For more information, see Table repair feature.
Support for SHOW FILES IN table PARTITION for Iceberg
Impala now supports the SHOW FILES IN command with the PARTITION clause to list data files for specific partitions in Iceberg tables. This enhancement extends metadata capabilities by enabling inspection of partition-level physical data directly from Impala. For more information, see Describe table metadata feature.
Support for additional partition transform functions for Iceberg tables
Iceberg now supports additional partition transform functions such as BUCKET, TRUNCATE, IDENTITY, and VOID. These transformations extend partitioning capabilities by enabling hashing, value truncation, direct partitioning, and handling of null partitions. For more information, see Partition transform feature.
Support for partition columns in WHERE clause predicates
Hive Iceberg compaction now supports WHERE clause predicates on partition columns. This enhancement allows you to selectively compact data by filtering partition columns, improving efficiency and control over compaction operations. For more information, see Data compaction.

What's new in Impala on Cloudera Data Warehouse on cloud

Caching intermediate query results
Impala now supports caching intermediate results to improve query performance and resource efficiency for repetitive workloads. By storing results at various locations within the SQL plan tree, the system can reuse computation for similar queries even when they are not identical, provided the underlying data and settings remain unchanged. For more information, see Caching intermediate results.
User role management
You can now grant and revoke roles directly to and from individual users in Impala, providing more granular control over security management. This feature includes support for the GRANT ROLE, REVOKE ROLE, and SHOW ROLE GRANT USER statements, aligning Impala with Apache Hive's role-related functionality.

For more information, see impala role, impala grant role, impala show roles and impala revoke role

Apache Jira: IMPALA-14085

Native geospatial query acceleration
Cloudera Data Warehouse 2025.0.21.0 introduces native implementations for specific geospatial functions to accelerate simple queries. This feature reduces processing overhead by avoiding transitions to the Java Virtual Machine and optimizing file-level filtering for Parquet and Iceberg tables. For more information, see Impala Geospatial query acceleration
OpenTelemetry integration for Impala
Cloudera Data Warehouse now provides OpenTelemetry (OTel) support to help you see query performance and troubleshoot issues. This new feature, collects and exports query telemetry data as OpenTelemetry traces to a central OpenTelemetry compatible collector. The integration is designed to have a minimal impact on performance because it uses data already being collected and handles the export in a separate process. For more information, see OpenTelemetry support for Impala

Apache Jira: IMPALA-13234

Hierarchical metastore event processing
Cloudera Data Warehouse now supports a multi-layered, hierarchical approach to metastore event processing to improve synchronization speed and handle event dependencies more efficiently. By enabling this feature, you can segregate events based on their dependencies and process them independently through a system of database and table event executors. This method reduces synchronization time for HMS events by allowing parallel processing while maintaining linearizability for specific tables.
This update also introduces synchronization tracking metrics, such as event lag and dispatch time, and a graceful pausing mechanism to ensure metadata consistency. For more information, see Hierarchical metastore event processing.
New Impala AI function options
You can now use the impala_options parameter to control AI function behavior. This allows you to specify the API credential type, set the API standard, or provide a custom payload. For more information, see impala-ai-arguments
Impala cookie secret file support
You can use the --cookie_secret_file startup flag to provide a path to a file containing a 256-bit (32-byte) secret in binary format. This allows the secret used for HMAC cookie verification for both HS2-HTTP clients and the Impala Web UI to be shared across instances and service restarts. Impala monitors the file for changes and reloads the secret if it is modified. For more information, see Impala client connections.
Filtering SHOW PARTITIONS output
You can now use the WHERE clause with the SHOW PARTITIONS statement to filter results based on partition column values. This enhancement helps you manage tables with a large number of partitions by narrowing down the output using comparison operators, IN lists, BETWEEN clauses, IS NULL predicates, and logical expressions. For more information, see the SHOW PARTITIONS statement.

Apache Jira: IMPALA-14065

Parallelizing JDBC queries
You can now execute queries on JDBC tables in parallel to improve performance for joins and aggregations. Impala now estimates the number of rows in a JDBC table by running a COUNT query during query preparation. This estimation allows the planner to assign multiple scanner threads, introduce exchange nodes, and produce more efficient join orders. You can also use the --min_jdbc_scan_cardinality backend flag to set a lower bound for these estimates. For more information, see Parallelizing JDBC queries
Recreating tables with statistics
You can use the WITH STATS clause in the SHOW CREATE TABLE statement to generate the SQL required to recreate a table along with its column statistics and partition metadata. See, SHOW CREATE TABLE WITH STATS statement.

Apache Jira: IMPALA-13066

Quoting reserved words in column names
You can now explicitly quote all column names projected in SQL queries generated for JDBC external table names. Column names are wrapped with quote characters based on the JDBC driver being used:
  • Backticks (`) for Cloudera Runtime Hive, Impala, and MySQL
  • Double quotes (") for all other databases
This supports the use of case-sensitive or reserved column names. For more information, see Quoting reserved words in column names

Apache Jira: IMPALA-13066

New catalogd flag to disable HMS sync by default
You can now use the disable_hms_sync_by_default catalogd startup flag to set a global default for the impala.disableHmsSync property. This feature allows you to skip event processing for all databases and tables by default while opting in specific elements as needed.

For more information, see: Catalogd Daemon startup flag

Apache Jira: IMPALA-14131

Parallel metadata loading in local catalog mode
Previously, when a query accessed multiple unloaded tables in local catalog mode, Impala loaded the metadata for those tables one after another. This sequential process caused significant latency and performance regressions compared to the legacy catalog mode.
This issue is addressed by parallelizing the table loading process. The fix allows Impala to load and gather metadata for multiple tables simultaneously. You can control the maximum number of threads used for this process by using the new max_stmt_metadata_loader_threads flag, which defaults to 8 threads per query compilation. See, Catalog startup flag

Apache Jira: IMPALA-14447

Specifying compression levels for LZ4, ZLIB, and ZSTD
You can now specify compression levels for the LZ4, ZLIB, GZIP, and ZSTD codecs to achieve higher compression ratios. This includes support for high compression modes in LZ4 (levels 3–12) and negative compression levels for ZSTD. These levels are supported by using the compression_codec query option.

For more information, see compression_codec query option

Apache Jira: IMPALA-10630, IMPALA-14082

Configuring remote JCEKS keystores for Impala AI Functions
You can now specify a remote JCEKS keystore path by using the REMOTE_JCEKS_PATH environment variable. This allows the system to automatically copy remote keystores from S3 or Azure storage to the local filesystem on coordinator and executor pods, preventing initialization errors.

For more information, see Configuring remote JCEKS keystores for Impala AI Functions

Batch processing for reload events
Cloudera now supports batch processing of RELOAD events on the same table by using the BatchPartitionEvent logic. This enhancement allows you to load partitions in parallel and reduces duplicate reloads. By minimizing the number of times a table lock is acquired and reducing table version changes, this feature improves the performance of coordinators in local-catalog mode and reduces query planning retries.

Apache Jira: IMPALA-14082

Consolidated event processing for partition changes
Cloudera now supports the ALTER_PARTITIONS event type, which consolidates multiple partition changes into a single event. By processing one batch event instead of numerous individual ALTER_PARTITION events, the event processor can synchronize metadata more quickly and reduce the processing load on the CatalogD cache.

Apache Jira: IMPALA-13593

What's new in Trino on Cloudera Data Warehouse on cloud

General Availability (GA) of Trino in Cloudera Data Warehouse

Trino is a distributed SQL query engine designed to efficiently query large datasets across one or more heterogeneous data sources. This integration enables users to leverage Trino's powerful capabilities directly within Cloudera Data Warehouse.

The GA release of Trino in Cloudera Data Warehouse introduces several key capabilities:

  • Trino Virtual Warehouses — Offers full support for creating and managing Trino Virtual Warehouses across both Amazon Web Services (AWS) and Microsoft Azure environments. This enables efficient querying across diverse, large datasets regardless of your cloud provider. For information about creating a Trino Virtual Warehouse, see Adding a new Virtual Warehouse.
  • Federation and Connectivity — Seamless connection and management of various remote data sources is possible through Trino Federation Connectors, including the new Teradata custom connector. A dedicated connector management UI and backend facilitates the creation and configuration of these connectors. For more information, see Trino Federation Connectors.
  • Security and Governance — Governance is enforced by default through Apache Ranger using the cm_trino authorization service. You can create or update Ranger policies for specific resources and assign permissions to Trino users, groups, or roles. When a user submits a query to Trino, the system verifies the defined policies to ensure that the user has the necessary permissions to run queries. For more information, see Ranger authorization for Trino Virtual Warehouses.
  • Performance Optimization — Built-in capabilities for auto-suspend and auto-scaling are supported. These configurations help optimize resource utilization and ensure the provisioning of a high-performance and scalable Trino Virtual Warehouse.
  • Support for Teradata connector (Technical Preview)Cloudera Data Warehouse now introduces support for a read-only Trino-Teradata connector. This feature is designed to facilitate SELECT operations on Teradata sources, operating in ANSI Mode and optimizing performance by pushing down filters and aggregates. For more information, see Teradata connector.
  • Connection pooling for JDBC-based connectors — You can now configure connection pooling capabilities for JDBC-based Trino connectors, such as MySQL, PostgreSQL, MariaDB, Teradata, and Oracle. Connection pooling helps in better performance, resource utilization, and stability while querying different data sources using Trino. For more information, see Connection pooling for JDBC-based connectors.
  • Backup and restore behavior for Trino — The backup and restore functionality in Cloudera Data Warehouse now includes updates for Trino. Trino Virtual Warehouses are included in environment backups and are restored along with the environment. However, Trino connector objects are not backed up or restored as part of the environment reactivation workflow.

    After restoring an environment, it is necessary to manually recreate Trino connectors and attach them to the restored Trino Virtual Warehouses. For more information, see backup and restore Cloudera Data Warehouse.