Behavior changes in Cloudera Data Warehouse on premises 1.5.5 SP2
Summary: Increased Batch Sizes for COMPUTE STATS
Before this release: The COMPUTE STATS query previously failed on tables containing more than 5000 columns. This issue was specific to wide tables and could not be resolved by dropping and rerunning the query.
After this release: To resolve this, we enable the batch retrieval or insertion of the object metadata by default value of the hive.metastore.direct.sql.batch.size property is changed from 0 to 1000, and the default value of the metastore.rawstore.batch.size property is changed from -1 to 500. After this change, COMPUTE STATS queries now run successfully on tables with more than 5000 columns.
Summary: Default fe_service_threads increased for Impala Virtual
Warehouse
Before this release: The default value for the
fe_service_threads setting was 128.
After this release: Starting with Cloudera Data Warehouse on on premises 1.5.5 SP2, the default value is 256. This change accommodates higher active connection counts and improves performance.
Summary: Parquet late materialization behavior has changed
Parquet late materialization feature is enabled by default for all types including collections.
Before this release: Parquet late materialization feature
was disabled by default. You would use the
parquet_late_materialization_threshold query option to set the minimum
number of consecutive filtered rows required to trigger late materialization. The default
value was -1. The feature was not supported for collection columns.
After this release: Parquet late materialization feature is
enabled by default. The parquet_late_materialization_threshold is now set
to 1 if the query option is greater than or equal to 0 and there is a collection value that
can be skipped. Otherwise, the value is the same as the query option, which defaults to
20
Apache Jira: IMPALA-3841
Summary: TCP Keepalive is now enabled by default for client connections
Before this release: TCP keepalive was disabled by default
for client connections. Idle connections dropped by load balancers remained active in
Impala, consuming service threads (fe_service_threads).
After this release: TCP keepalive is now enabled by default for all client connections, enhancing stability and availability. Impala is configured to check idle connections aggressively, every 10 minutes.
JIRA Issue: IMPALA-14031
Summary: Support for load-based routing in impala-proxy
Before this release: The impala-proxy used a random selection policy to choose a coordinator. This approach did not consider the current load on each coordinator, which lead to an uneven distribution of connections and potential performance bottlenecks.
IMPALA_PROXY_COORDINATOR_LOAD_CPU_WEIGHT: Determines the weight applied to the current percentage of CPU utilization when calculating the coordinator's load.IMPALA_PROXY_COORDINATOR_LOAD_MEMORY_WEIGHT: Determines the weight applied to the current percentage of memory utilization when calculating the coordinator's load.
- Log in to the Cloudera web interface and navigate to the Cloudera Data Warehouse service.
- From the Overview page, click the Virtual Warehouses tab.
- Identify the Impala Virtual Warehouse you want to configure, and then click the Edit icon.
- In the Virtual Warehouse details page, click .
- Select env from the Configuration files drop-down.
- Modify the values as required for the following parameters:
IMPALA_PROXY_COORDINATOR_LOAD_CPU_WEIGHTIMPALA_PROXY_COORDINATOR_LOAD_MEMORY_WEIGH
