zl程序教程

您现在的位置是:首页 >  其他

当前栏目

pandas实战 | NC格式站点观测转csv表格

2023-03-14 22:41:45 时间

今年拿到的观测资料是nc格式,为了保证去年的脚本还能正常使用,可以考虑先将观测转为csv表格。NC数据的信息如下:

   dimensions:
      time = 1
      station = 3956  // unlimited
      strlen = 30
   variables:
      character stid ( station, strlen )

      float lon ( station )
         units :  degrees_east
         longname :  longitude

      float lat ( station )
         units :  degrees_east
         longname :  latitude

      float elev ( station )
         units :  meter
         longname :  elevation

       integer wd10a ( station, time )
         units :  degree
         longname :  Wind Direction,10 minute average value

      float ws10a ( station, time )
         units :  m/s
         longname :  Wind speed,10 minute average value

主要用到了两个库

  • netCDF4:用于读取nc文件中的变量
  • pandas:用于生产dataframe对象和输出csv文件

示例脚本

import netCDF4 as nc 
import numpy as np 
import pandas as pd

filename = "20210301100000.nc"
fout = "test.csv"

fn = nc.Dataset(filename,"r")
stid  = fn.variables['stid']
stid  = np.apply_along_axis(lambda x: x.tobytes().decode("utf-8"), 1, stid[:].data)

lon  = fn.variables['lon']
lat  = fn.variables['lat']
elev = fn.variables['elev']
wd10a = fn.variables['wd10a'] # Wind Direction,10 minute average value
ws10a = fn.variables['ws10a'] # Wind speed,10 minute average value

df = pd.DataFrame( { 'stid' : stid, 
                     'lon'  : lon[:], 
                     'lat'  : lat[:], 
                     'elev' : elev[:],
                     'wd10a': wd10a[:,0], # 必须是1维
                     'ws10a': ws10a[:,0],
                   } )
df.to_csv(fout, index=False)

另外需要注意一下stid的处理,stid变量的内容如下:

[[b'5' b'4' b'3' ... b'' b'' b'']
 [b'5' b'4' b'4' ... b'' b'' b'']
 [b'5' b'4' b'4' ... b'' b'' b'']
 ...
 [b'C' b'S' b'2' ... b'' b'' b'']
 [b'C' b'S' b'2' ... b'' b'' b'']
 [b'C' b'S' b'2' ... b'' b'' b'']]

这是一个二维的character变量,第0维表示不同的站点,第1维表示的是每个站点的id,每一位存放一个字符。我们需要通过np.apply_along_axis利用匿名函数lambda x: x.tobytes().decode("utf-8")将原始数据按行合并成字符串并解码为utf-8格式。