DeveloperGuide Hive UDTF
hive
2023-09-11 14:18:41 时间
Writing UDTF's
GenericUDTF Interface
A custom UDTF can be created by extending the GenericUDTF abstract class and then implementing the initialize
, process
, and possibly close
methods. The initialize
method is called by Hive to notify the UDTF the argument types to expect. The UDTF must then return an object inspector corresponding to the row objects that the UDTF will generate. Once initialize()
has been called, Hive will give rows to the UDTF using the process()
method. While in process()
, the UDTF can produce and forward rows to other operators by calling forward()
. Lastly, Hive will call the close()
method when all the rows have passed to the UDTF.
UDTF Example:
import java.util.ArrayList; /** * GenericUDTFCount2 outputs the number of rows seen, twice. It's output twice * to test outputting of rows on close with lateral view. * */ public class GenericUDTFCount2 extends GenericUDTF { Object forwardObj[] = new Object[ 1 ]; @Override public void close() throws HiveException { forwardObj[ 0 ] = count; forward(forwardObj); forward(forwardObj); } @Override public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException { ArrayList<String> fieldNames = new ArrayList<String>(); ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(); fieldOIs); } @Override public void process(Object[] args) throws HiveException { } } |
For reference, here is the abstract class:
/** * A Generic User-defined Table Generating Function (UDTF) * * Generates a variable number of output rows for a single input row. Useful for * explode(array)... */ public abstract class GenericUDTF { Collector collector = null ; /** * Initialize this GenericUDTF. This will be called only once per instance. * * @param args * An array of ObjectInspectors for the arguments * @return A StructObjectInspector for output. The output struct represents a * row of the table where the fields of the stuct are the columns. The * field names are unimportant as they will be overridden by user * supplied column aliases. */ public abstract StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException; /** * Give a set of arguments for the UDTF to process. * * @param o * object array of arguments */ public abstract void process(Object[] args) throws HiveException; /** * Called to notify the UDTF that there are no more rows to process. * Clean up code or additional forward() calls can be made here. */ public abstract void close() throws HiveException; /** * Associates a collector with this UDTF. Can't be specified in the * constructor as the UDTF may be initialized before the collector has been * constructed. * * @param collector */ public final void setCollector(Collector collector) { this .collector = collector; } /** * Passes an output row to the collector. * * @param o * @throws HiveException */ protected final void forward(Object o) throws HiveException { } } |
相关文章
- FAILED: Hive Internal Error: java.lang.RuntimeException(Error while making MR scratch directory异常的解决
- 在shell中判断hive查询记录数大小
- hive分区表新增字段,已有分区显示为null
- Hive基本操作
- hive 分位数函数 percentile(col, p)
- HIVE的transform函数的使用
- 大叔经验分享(33)hive select count为0
- 大叔问题定位分享(21)spark执行insert overwrite非常慢,比hive还要慢
- Hive MetaStore同步方法
- track_info分区表的创建并将ETL的数据加载到Hive表
- 【大数据开发运维解决方案】Hadoop+Mysql+Hive+zookeeper+kafka+Hbase+Sqoop+Kylin单机伪分布式安装及官方案例详细文档