zl程序教程

您现在的位置是:首页 >  其它

当前栏目

Discretized Streams (DStreams)离散化流

离散 Streams
2023-09-14 09:14:45 时间

Discretized Stream or DStream is the basic abstraction provided by Spark Streaming. It represents a continuous stream of data(连续的数据流), either the input data stream received from source(从源接收到的数据流), or the processed data stream generated by transforming the input stream(也可以是通过转换输入流生成的已处理数据流). Internally, a DStream is represented by a continuous series of RDDs, which is Spark’s abstraction of an immutable, distributed dataset. Each RDD in a DStream contains data from a certain interval.

DStream :数据流
RDD:Spark对一个不可变的分布式数据集的抽象,DStream 由一系列的RDD组成,每个RDD在DStream 是包含特定间隔的数据
在这里插入图片描述
Any operation applied on a DStream translates to operations on the underlying RDDs(应用于数据流的任何操作都会转换为底层RDD上的操作). For example, in the earlier example of converting a stream of lines to words, the flatMap operation is applied on each RDD in the lines DStream to generate the RDDs of the words DStream.
对DStream操作算子,比如map/ flatMap,其实底层会被翻译为对DStream中的每个RDD都做相同操作,
因为一个DStream是由不同批次的RDD所构成的。
在这里插入图片描述