zl程序教程

您现在的位置是:首页 >  数据库

当前栏目

Elasticsearch 批量导入数据2

2023-03-31 10:28:46 时间

产生了报错,并且数据也的确没有加成功,原因是在校验操作请求(action_and_meta_data)时,由于不符合规范,所以报异常

正确导入方法

解决办法是将格式纠正过来,加上换行

[root@es-bulk tmp]# vim test.json 
[root@es-bulk tmp]# cat test.json 
{"index":{"_index":"stuff_orders","_type":"order_list","_id":903713}}
{"real_name":"刘备","user_id":48430,"address_province":"上海","address_city":"浦东新区","address_district":null,"address_street":"上海市浦东新区广兰路1弄2号345室","price":30.0,"carriage":6.0,"state":"canceled","created_at":"2013-10-24T09:09:28.000Z","payed_at":null,"goods":["营养早餐:火腿麦满分"],"position":[121.53,31.22],"weight":70.0,"height":172.0,"sex_type":"female","birthday":"1988-01-01"}
[root@es-bulk tmp]# curl -XPOST 'localhost:9200/stuff_orders/_bulk?pretty' --data-binary @test.json
{
  "took" : 36,
  "errors" : false,
  "items" : [ {
    "index" : {
      "_index" : "stuff_orders",
      "_type" : "order_list",
      "_id" : "903713",
      "_version" : 1,
      "_shards" : {
        "total" : 2,
        "successful" : 1,
        "failed" : 0
      },
      "status" : 201
    }
  } ]
}
[root@es-bulk tmp]# curl localhost:9200/stuff_orders/order_list/903713?pretty
{
  "_index" : "stuff_orders",
  "_type" : "order_list",
  "_id" : "903713",
  "_version" : 1,
  "found" : true,
  "_source":{"real_name":"刘备","user_id":48430,"address_province":"上海","address_city":"浦东新区","address_district":null,"address_street":"上海市浦东新区广兰路1弄2号345室","price":30.0,"carriage":6.0,"state":"canceled","created_at":"2013-10-24T09:09:28.000Z","payed_at":null,"goods":["营养早餐:火腿麦满分"],"position":[121.53,31.22],"weight":70.0,"height":172.0,"sex_type":"female","birthday":"1988-01-01"}
}
[root@es-bulk tmp]# 

Tip: 当数据量极大时,这样一个个改肯定不方便,这时可以使用sed脚本,能很方便的进行批量修改

[root@es-bulk summary]# sed -ir  's/[}][}][{]/}}
{/' jjjj.json 
[root@es-bulk summary]# less jjjj.json

其实就是匹配到合适的地方加上一个换行


内存不足

基本上只要遵循前面的操作方式,理想情况下都会很顺利地将数据导入ES,但是实现环境中,总会有各种意外,我就遇到了其中一种:内存不足

[root@es-bulk tmp]# time curl -XPOST 'localhost:9200/stuff_orders/_bulk?pretty' --data-binary @es_data.json > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 38  265M    0     0   38  102M      0  43.8M  0:00:06  0:00:02  0:00:04 43.9M
curl: (56) Failure when receiving data from the peer

real	0m5.351s
user	0m0.161s
sys	0m0.919s
[root@es-bulk tmp]#