Storm-源码分析-Topology Submit-Task
mk-task, 比较简单, 因为task只是概念上的结构, 不象其他worker, executor都需要创建进程或线程
所以其核心其实就是mk-task-data,
1. 创建TopologyContext对象, 其实就是把之前的topology对象和worker-data混合到一起, 便于task在执行时可以取到需要的topology信息.
2. 创建task-object, spout-object或bolt-object, 封装相应的逻辑, 如nextTuple, execute
3. 生成tasks-fn, 名字起的不好,让人误解执行了task的功能, 其实就是做些emit之间的准备工作, 其中最重要的就是调用grouper去产生targets task, 当然还包含些metrics, hooks的调用.
说白了其实mk-tasks, 没做啥事
(defn mk-task [executor-data task-id] (let [task-data (mk-task-data executor-data task-id) ;;1 mk-task-data storm-conf (:storm-conf executor-data)] (doseq [klass (storm-conf TOPOLOGY-AUTO-TASK-HOOKS)] ;; add预定义的hooks (.addTaskHook ^TopologyContext (:user-context task-data) (- klass Class/forName .newInstance))) ;; when this is called, the threads for the executor havent been started yet, ;; so we wont be risking trampling on the single-threaded claim strategy disruptor queue (send-unanchored task-data SYSTEM-STREAM-ID ["startup"]) ;;向SYSTEM-STREAM, 发送startup通知,谁会接收SYSTEM-STREAM…? task-data ))
1 mk-task-data
(defn mk-task-data [executor-data task-id] (recursive-map :executor-data executor-data :task-id task-id :system-context (system-topology-context (:worker executor-data) executor-data task-id) :user-context (user-topology-context (:worker executor-data) executor-data task-id) :builtin-metrics (builtin-metrics/make-data (:type executor-data)) :tasks-fn (mk-tasks-fn ) :object (get-task-object (.getRawTopology ^TopologyContext (:system-context )) (:component-id executor-data))))1.1 TopologyContext
Storm-源码分析-Topology Submit-Task-TopologyContext
:system-context, :user-context, 只是context中的topology对象不同, system为system-topology!
1.2 builtin-metrics/make-data这里的builtin-metrics用来记录spout或bolt的执行状况的metrics
1.3 mk-tasks-fn返回tasks-fn, 这个函数主要用于做emit之前的准备工作, 返回target tasks list
1. 调用grouper, 产生target tasks
2. 执行emit hook
3. 满足sampler条件时, 更新stats和buildin-metrics
task-fn, 两种不同参数版本
[^String stream ^List values], 这个版本好理解些, 就是将stream对应的component的target tasks都算上(一个stream可能有多个out component, 一份数据需要发到多个bolt处理)
[^Integer out-task-id ^String stream ^List values], 指定out-task-id, 即direct grouping
这里对out-task-id做了验证
out-task-id (if grouping out-task-id), 即out-task-id- component- grouper不为nil(为:direct?), 即验证这个stream确实有到该out-task-id对应component
如果验证失败, 将out-task-id置nil
(defn mk-tasks-fn [task-data] (let [task-id (:task-id task-data) executor-data (:executor-data task-data) component-id (:component-id executor-data) ^WorkerTopologyContext worker-context (:worker-context executor-data) storm-conf (:storm-conf executor-data) emit-sampler (mk-stats-sampler storm-conf) stream- component- grouper (:stream- component- grouper executor-data) ;;Storm-源码分析-Streaming Grouping user-context (:user-context task-data) executor-stats (:stats executor-data) debug? (= true (storm-conf TOPOLOGY-DEBUG))] (fn ([^Integer out-task-id ^String stream ^List values] (when debug? (log-message "Emitting direct: " out-task-id "; " component-id " " stream " " values)) (let [target-component (.getComponentId worker-context out-task-id) component- grouping (get stream- component- grouper stream) grouping (get component- grouping target-component) out-task-id (if grouping out-task-id)] (when (and (not-nil? grouping) (not= :direct grouping)) (throw (IllegalArgumentException. "Cannot emitDirect to a task expecting a regular grouping"))) (apply-hooks user-context .emit (EmitInfo. values stream task-id [out-task-id])) (when (emit-sampler) (builtin-metrics/emitted-tuple! (:builtin-metrics task-data) executor-stats stream) (stats/emitted-tuple! executor-stats stream) (if out-task-id (stats/transferred-tuples! executor-stats stream 1) (builtin-metrics/transferred-tuple! (:builtin-metrics task-data) executor-stats stream 1))) (if out-task-id [out-task-id]) ([^String stream ^List values] (when debug? (log-message "Emitting: " component-id " " stream " " values)) (let [out-tasks (ArrayList.)] (fast-map-iter [[out-component grouper] (get stream- component- grouper stream)] (when (= :direct grouper) ;; TODO: this is wrong, need to check how the stream was declared (throw (IllegalArgumentException. "Cannot do regular emit to direct stream"))) (let [comp-tasks (grouper task-id values)] ;;执行grouper, 产生target tasks (if (or (sequential? comp-tasks) (instance? Collection comp-tasks)) (.addAll out-tasks comp-tasks) (.add out-tasks comp-tasks) (apply-hooks user-context .emit (EmitInfo. values stream task-id out-tasks)) ;;执行事先注册的emit hook (when (emit-sampler) ;;满足抽样条件时, 更新stats和buildin-metrics中的emitted和transferred metric (stats/emitted-tuple! executor-stats stream) (builtin-metrics/emitted-tuple! (:builtin-metrics task-data) executor-stats stream) (stats/transferred-tuples! executor-stats stream (count out-tasks)) (builtin-metrics/transferred-tuple! (:builtin-metrics task-data) executor-stats stream (count out-tasks))) out-tasks))) ))1.4 get-task-object
取出component的对象,
比如对于Spout, 取出SpoutSpec中的ComponentObject spout_object, 包含了spout的逻辑, 比如nextTuple()
(defn- get-task-object [^TopologyContext topology component-id] (let [spouts (.get_spouts topology) bolts (.get_bolts topology) state-spouts (.get_state_spouts topology) obj (Utils/getSetComponentObject (cond (contains? spouts component-id) (.get_spout_object ^SpoutSpec (get spouts component-id)) (contains? bolts component-id) (.get_bolt_object ^Bolt (get bolts component-id)) (contains? state-spouts component-id) (.get_state_spout_object ^StateSpoutSpec (get state-spouts component-id)) true (throw-runtime "Could not find " component-id " in " topology))) obj (if (instance? ShellComponent obj) (if (contains? spouts component-id) (ShellSpout. obj) (ShellBolt. obj)) obj ) obj (if (instance? JavaObject obj) (thrift/instantiate-java-object obj) obj )] ))
本文章摘自博客园,原文发布日期:2013-07-31
BUILDING REALTIME DATA PIPELINES WITH KAFKA CONNECT AND SPARK STREAMING 立即下载
相关文章
- ArrayBlockingQueue源码分析-Java8
- spring mvc之启动过程源码分析
- jQuery源码分析系列(38) : 队列操作
- jQuery源码分析系列:Callback深入
- jQuery 2.0.3 源码分析 样式操作
- 轻量级前端MVVM框架avalon源码分析-总结
- 深入JUnit源码之Builder、Request与JUnitCore
- Apache Spark源码走读(五)部署模式下的容错性分析 &standalone cluster模式下资源的申请与释放
- Apache Spark源码走读(六)Task运行期之函数调用关系分析 &存储子系统分析
- 淘宝数据库OceanBase SQL编译器部分 源码阅读--生成逻辑计划
- linux下mysql 5.1.73 源码安装笔记
- MFC Windows 程序设计[326]之表格控件例程二(附源码)
- MFC Windows 程序设计(三)-锦上添花(附源码)
- Spring Security从过滤器到认证授权的源码分析
- Android系统源码编译
- SRS流媒体服务器——Forward集群搭建和源码分析
- 从源码分析DEARGUI之背变换
- 从源码分析DEARGUI之add_color_int和4
- LiteOS内核源码分析:任务栈信息
- 【华为云技术分享】深入浅出Sqoop之迁移过程源码分析
- 【Linux 内核】实时调度类 ⑦ ( 实时调度类核心函数源码分析 | dequeue_task_rt 函数 | 从执行队列中移除进程 )
- tpm2-tools源码分析之tpm2_load.c(1)
- 第二人生的源码分析(109)脚本的语法分析(3)
- VC++详解Base64编解码原理以及Base64编解码接口实现(附源码)
- VC++ 创建桌面、开始菜单快捷方式(附源码)
- Tomcat 中的 NIO 源码分析
- Android系统启动流程源码分析
- 00 可综合风格的模块实例(附源码)
- (17)Blender源码分析之闪屏窗口的菜单显示过程