您现在的位置是：首页 > 大数据

当前栏目

使用Arrow管理数据

数据使用管理 Arrow

2023-06-13 09:18:16 时间

在之前的数据挖掘：是时候更新一下TCGA的数据了推文中，保存TCGA的数据就是使用Arrow格式，因为占空间小，读写速度快，多语言支持（我主要使用的3种语言都支持）

Format

https://arrow.apache.org

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Language Supported

Arrow's libraries implement the format and provide building blocks for a range of use cases, including high performance analytics. Many popular projects use Arrow to ship columnar data efficiently or as the basis for analytic engines.

Libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

Ecosystem

Apache Arrow is software created by and for the developer community. We are dedicated to open, kind communication and consensus decisionmaking. Our committers come from a range of organizations and backgrounds, and we welcome all to participate with us.

R

install.packages("arrow")
library(arrow)
# write iris to iris.arrow and compressed by zstd
arrow::write_ipc_file(iris,'iris.arrow', compression =  "zstd",compression_level=1)
# read iris.arrow as DataFrame
iris=arrow::read_ipc_file('iris.arrow')

python

# conda install -y pandas pyarrow
import pandas as pd
# read iris.arrow as DataFrame
iris=pd.read_feather('iris.arrow')
# write iris to iris.arrow and compressed by zstd
iris.to_feather('iris.arrow',compression='zstd', compression_level=1)

Julia

using Pkg
Pkg.add(["Arrow","DataFrames"])

using Arrow, DataFrames
# read iris.arrow as DataFrame
iris = Arrow.Table("iris.arrow") |> DataFrame
# write iris to iris.arrow, using 8 threads and compressed by zstd
Arrow.write("iris.arrow",iris,compress=:zstd,ntasks=8)

猜你喜欢

jquery特效幻灯片效果示例代码
GMS2(Gamemaker Studio 2)运行工程时遇到的问题解决
Linux文件复制到U盘：快捷步骤.（linux文件复制到u盘）
3 种扩展 Kubernetes 能力的方式
高效学习工具三步曲
死磕juc（四）Java内存模型之JMM
jQuery实现回车键（Enter）切换文本框焦点的代码实例
以Redis为引擎让网页极速加载（redis缓存网页）
ORA-48108: invalid value given for the diagnostic_dest init.ora parameter ORACLE 报错故障修复远程处理
Neo4j为你带来更多有趣的关系创建（neo4j关系创建）
最新数据显示腾讯QQ月活用户已不到微信的一半
Oracle 视图 V$RMAN_STATUS 官方解释，作用，如何使用详细说明
如何理解Apache 2.0许可证中的专利许可条款？
2019 – 微软：嗨哥们，我能加入你们的发行版邮件列表吗？
【说站】python for…in循环的使用
【CentOS】重启MySQL：一步一步教程（centos重启mysql）
MySQL计算两个时间的差异及应用技巧（mysql 两时间差分）
英特尔13代Raptor Lake-S台式处理器功耗标定细节曝光
标准分类的IP地址
安装Linux虚拟机优盘快速安装指南（linux虚拟机优盘）
redis 开源_redis 可视化

zl程序教程