大数据基础之Impala(3)部分调优
1)将coordinator和executor角色分离
By default, each host in the cluster that runs the impalad daemon can act as the coordinator for an Impala query, execute the fragments of the execution plan for the query, or both. During highly concurrent workloads for large-scale queries, especially on large clusters, the dual roles can cause scalability issues:
- The extra work required for a host to act as the coordinator could interfere with its capacity to perform other work for the earlier phases of the query. For example, the coordinator can experience significant network and CPU overhead during queries containing a large number of query fragments. Each coordinator caches metadata for all table partitions and data files, which can be substantial and contend with memory needed to process joins, aggregations, and other operations performed by query executors.
- Having a large number of hosts act as coordinators can cause unnecessary network overhead, or even timeout errors, as each of those hosts communicates with the statestored daemon for metadata updates.
- The “soft limits” imposed by the admission control feature are more likely to be exceeded when there are a large number of heavily loaded hosts acting as coordinators.
2)default_pool_max_requests,默认是200,要根据自己集群的内存规模以及每个查询需要的内存进行调整;
Maximum number of concurrent outstanding requests allowed to run before incoming requests are queued. Because this limit applies cluster-wide, but each Impala node makes independent decisions to run queries immediately or queue them, it is a soft limit; the overall number of concurrent queries might be slightly higher during times of heavy load. A negative value indicates no limit. Ignored if fair_scheduler_config_path and llama_site_path are set.
3)开启kerberos之后,通过jdbc访问需要做客户端load balance,因为jdbc url里需要携带对应server的principal;
相关文章
- 面试题_Spring基础篇
- 大数据基础之Kudu(4)spark读写kudu
- 大数据基础之Kudu(2)移除dead tsever
- 大数据基础之Marathon(2)marathon-lb
- 大数据基础之Ambari(3)通过Ambari部署Airflow
- 大数据基础之HDFS(1)HDFS新创建文件如何分配Datanode
- Spark修炼之道(基础篇)——Linux大数据开发基础:第三节:用户和组
- 2-3python语法基础-基础-运算符
- Python Turtle绘图基础(二)——空间和角度坐标体系
- 1.XML的基础和DOCTYPE字段的解析 DTD——文档类型定义(Document Type Definition)/ 由于XML可以自定义标签,那么自然各人编写的标签不一样,这样同步数据便成了问
- mysql常用基础操作语法(五)--对数据的简单条件查询【命令行模式】
- python --> Python初阶 --> 基础语法 --> 条件和分支
- 猿创征文|Python基础——Visual Studio版本——第五章 文件I/O
- 〖大前端 - 基础入门三大核心之JS篇⑦〗- JavaScript中的数据类型转换
- 从基础到实践,一文带你看懂HashMap
- 野生前端的数据结构基础练习(4)——字典
- 【Python 八股文】- Redis基础
- 一篇博客带你掌握pytorch基础,学以致用(包括张量创建,索引,切片,计算,Variable对象的创建,和梯度求解,再到激活函数的使用,神经网络的搭建、训练、优化、测试)
- 編程的本質:编程也应像其他科学和工程领域一样基于坚实的数学基础
- 上海数据分析师培训告诉你:零基础如何转行软件测试
- JMeter基础 — JMeter聚合报告详解
- Linux|centos二进制方式安装系统和网络监控神器prometheus+grafana(装逼神器它来了)(基础篇 一)
- RK3399平台开发系列讲解(内存篇)15.11、如何对内核内存泄漏做些基础的分析?
- 试题 基础练习 数列特征
- 老杨说运维 | 直播回顾(二):以数据治理为基础的建设实践分享