【大数据实时数据同步】GoldenGate实时同步异常:OGG-03533:character ‘c2 a0‘ at offset 0 that is not available报错解决
前言
博主所在单位目前使用Oracle GoldenGate将各个业务生产库汇聚到一起做数仓实时ODS平台,源端库可能涉及Oracle、Mysql、达梦、Guassdb库。
最近遇到了这个错误:OGG-03533:character 'c2 a0' at offset 0 that is not available。
导致报错的原因是源端库有业务人员手动用excel往数据库导入数据,源端库字符集:AMERICAN_AMERICA.AL32UTF8
,目标库ODS字符集:AMERICAN_AMERICA.ZHS16GBK
。
业务人员导入的数据中包含了全角空格及不间断空格,从utf8的数据库往gbk的数据库同步过程中,字符转码失败。
因为是金融领域生产环境,此同步进程中包含其他生产表,所以恢复要格外小心,否则可能会导致数据实时同步异常,数据错乱。
下面是我记录的本地测试解决方案。
一、重现报错
源端字符集:AMERICAN_AMERICA.AL32UTF8
ODS字符集:AMERICAN_AMERICA.ZHS16GBK
源端查询数据
2 SQL> select * from dept;
3
4 DEPTNO DNAME LOC
5 ‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
6 1 东东 a
7 10 ACCOUNTING NEW YORK
8 20 RESEARCH DALLAS
9 30 SALES CHICAGO
10 40 OPERATIONS BOSTON
ods端查询数据:
12 SQL> select deptno,dname,loc from dept;
13
14 DEPTNO DNAME LOC
15 ‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
16 10 ACCOUNTING NEW YORK
17 20 RESEARCH DALLAS
18 30 SALES CHICAGO
19 40 OPERATIONS BOSTON
20 1 东东 a
源端执行插入操作:
22
23 SQL> insert into dept values(60,'东
123'||chr(to_number('C2A0','xxxx'))||'东A','a');
24
25 1 row created.
26
27 SQL> commit;
28
29 Commit complete.
源端切换归档:
31 SQL> alter system switch logfile;
32
33 System altered.
源端再次查询数据:
35 SQL> select * from dept;
36
37 DEPTNO DNAME LOC
38 ‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
39 1 东东 a
40 60 东123 东A a
41 10 ACCOUNTING NEW YORK
42 20 RESEARCH DALLAS
43 30 SALES CHICAGO
44 40 OPERATIONS BOSTON
45
46 6 rows selected.
ODS端查询进程状态:
49 GGSCI (ogg) 24> info all
50
51 Program Status Group Lag at Chkpt Time Since Chkpt
52
53 MANAGER RUNNING
54 REPLICAT ABENDED R_SC 00:00:00 00:00:50
应用进程报错信息如下:
2018‐09‐20 22:39:36 INFO OGG‐06510 Using the following key columns for target table SCOTT.DEPT_AUDIT: DEPTNO.
2018‐09‐20 22:39:36 INFO OGG‐03010 Performing implicit conversion of column data from character set UTF‐8 to zhs16gbk.
Source Context :
SourceModule : [ggapp.csconv]
SourceID :
[/scratch/aime/adestore/views/aime_adc4150378/oggcore/OpenSys/src/ggli b/ggapp/csconv.cpp]
SourceFunction : [conversion_error_handler]SourceLine : [820]
ThreadBacktrace : [18] elements
:
[/u01/app/oracle/ogg12/libgglog.so(CMessageContext::AddThreadContext()
+0x1b) [0x7f8e943ebf2b]]
:
[/u01/app/oracle/ogg12/libgglog.so(CMessageFactory::CreateMessage(CSou rceContext*, unsigned int, ...)+0x134) [0x7f8e943e5f84]]
:
[/u01/app/oracle/ogg12/libgglog.so(_MSG_ERR_COLUMN_CHARSET_CONVERSION_ CSTOCS2(CSourceContext*, char const*, unsigned long, ggs::gglib::ggapp::CDBObjName<(DBObjTy
pe)13> const&, ggs::gglib::ggapp::CDBObjName<(DBObjType)13> const&, char const*, char const*, CMessageFactory::MessageDisposition)+0x89) [0x7f8e943b4b63]]
:
[/u01/app/oracle/ogg12/replicat(conversion_error_handler(ULibCharSet, ULibCharSet, ULibError, ULibBadCharDetail const&, col_def const*, col_def const*, bool, ext
r_ptr_def*, bool&, bool&)+0x184) [0x634b66]]
:
[/u01/app/oracle/ogg12/replicat(CSCONV_convert(short, col_def const*, ULibCharSet, short, col_def*, ULibCharSet, char*, char*, col_def*, bool, extr_ptr_def*)+0x3
52) [0x635580]]
:
[/u01/app/oracle/ogg12/replicat(map_columns_int(ggs::gglib::gglcr::Com monLCR const&, ggs::gglib::gglcr::CommonLCR&, char*, map_col_def*, short*, short*, short, s
hort, extr_ptr_def*, bool, bool, bool)+0x1d33) [0x63f1e3]]
:
[/u01/app/oracle/ogg12/replicat(map_columns(ggs::gglib::gglcr::CommonL CR const&, ggs::gglib::gglcr::CommonLCR&, map_col_def*, short*, short*, short, short, short
*, int, extr_ptr_def*)+0xfe) [0x63f70e]]
:
[/u01/app/oracle/ogg12/replicat(replicate_io(ggs::gglib::ggdatasource:
:DataSource*, file_def*, ggs::gglib::gglcr::CommonLCR const*, ggs::gglib::gglcr::CommonLCR*
, extr_ptr_def*, int*, int)+0x488) [0x61b858]]
:
[/u01/app/oracle/ogg12/replicat(process_record(ggs::gglib::ggdatasource::DataSource*, file_def*&, ggs::gglib::gglcr::CommonLCR const*, ggs::gglib::gglcr::CommonL
CR*, short&, extr_ptr_def*&, extr_ptr_def*&, int&, int&, int&, bool)+0x328) [0x623178]]
:
[/u01/app/oracle/ogg12/replicat(ggs::er::ReplicatContext::processRepli catLoop(ggs::Heartbeat::MapGeneratorParams&)+0x1261) [0x5da591]]
:
[/u01/app/oracle/ogg12/replicat(ggs::er::ReplicatContext::run(ggs::Hea rtbeat::MapGeneratorParams&)+0x29) [0x5ce579]]
: [/u01/app/oracle/ogg12/replicat()
[0x617936]]
:
[/u01/app/oracle/ogg12/replicat(ggs::gglib::MultiThreading::MainThread
::ExecMain()+0x60) [0x6d15c0]]
:
[/u01/app/oracle/ogg12/replicat(ggs::gglib::MultiThreading::Thread::Ru nThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*)+0x14d) [0x6d25dd]]
:
[/u01/app/oracle/ogg12/replicat(ggs::gglib::MultiThreading::MainThread
::Run(int, char**)+0xb1) [0x6d26c1]]
: [/u01/app/oracle/ogg12/replicat(main+0x3b)
[0x61ab9b]]
: [/lib64/libc.so.6( libc_start_main+0xfd)
[0x3dc9c1ed1d]]
: [/u01/app/oracle/ogg12/replicat()
[0x54eb69]]
2018‐09‐20 22:44:28 ERROR OGG‐03533 Conversion from character set UTF‐8 of source column DNAME to character set zhs16gbk of target column DNAME failed because the source column contains
a character 'c2 a0' at offset 0 that is not available in the target character set.
二、处理过程
在应用进程加入参数:
REPLACEBADCHAR ESCAPE
REPLACEBADCHAR 参 数 官 方 介 绍 如 下
REPLACEBADCHAR Valid For
Extract and Replicat
Description
Use the REPLACEBADCHAR parameter to control the response of the
process when a valid code point does not exist for either the source
or target character set when mapping character-type columns. By
default, the check for invalid code points is only performed when the
source and target databases have different character sets, and the
default response is to abend. You can use the FORCECHECK option to
force the process to check for invalid code points when the source and
target databases have the same character set. REPLACEBADCHAR applies
globally.Default
ABORT Syntax
REPLACEBADCHAR {ABORT | SKIP | ESCAPE | SUBSTITUTE string | NULL |
SPACE} [FORCECHECK] [NOWARNING] ABORT The process abends on an invalid
code point. This is the default.SKIP The process skips the record that has the invalid code point. Use
this option with caution, because skipping a record can cause data
discrepancies on the target.ESCAPE The process replaces the data value with an escaped version of
the data value. Depending on the character set of the source database,
the value is output as one of the following:If the source data is not UTF-16 (NCHAR/NVARCHAR), the output is
hexadecimal (\xXX).If the source data is UTF-16, the output is Unicode (\uXXXX).
SUBSTITUTE string The process replaces the data with a specified
string, either Unicode notation or up to four characters. By default
the default substitution character of the target character set is used
for replacement.NULL The process replaces an invalid character with the value of NULL
if the target column is nullable or, otherwise, assigns a white space
(U+0020).SPACE The process replaces an invalid character with a white space
(U+0020).FORCECHECK The process checks for invalid code points when the source
and target databases have identical character sets. This overrides the
default, where the validation is skipped when the source and target
character sets are identical.NOWARNING The process suppresses warning messages related to
conversion and validation errors.Examples
Example 1 The following example replaces invalid code points with
the value of NULL.REPLACEBADCHAR NULL Example 2 Because ESCAPE is specified, Oracle
GoldenGate will replace the Euro symbol in a source NCHAR column with
the escaped version of u20AC, because the target is ISO-8859-1, which
does not support the Euro code point.REPLACEBADCHAR ESCAPE Example 3 The following substitutes a control
character for invalid characters.REPLACEBADCHAR SUBSTITUTE \u001A
重启进程:
1 GGSCI (ogg) 25> edit params r_Sc
2
3 GGSCI (ogg) 26> start r_sc
4
5 Sending START request to MANAGER ...
6 REPLICAT R_SC starting
7
8
9 GGSCI (ogg) 27> info all
10
11 Program Status Group Lag at Chkpt Time Since Chkpt
12
13 MANAGER RUNNING
14 REPLICAT RUNNING R_SC 00:00:00 00:00:01
进程状态正常,再去ODS端查询数据:
1 SQL> select deptno,dname,loc from dept;
2
3 DEPTNO DNAME LOC
4 ‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
5 10 ACCOUNTING NEW YORK
6 20 RESEARCH DALLAS
7 30 SALES CHICAGO
8 40 OPERATIONS BOSTON
9 1 东东 a
10 60 东123\xC2\xA0东A a
11
12
发现之前报错的utf8编码为 c2a0的字符已在ODS端的表中替换成了源端utf8的字符编码。而其他字符还是正常的,可以在报错后不改数据库字符集的前提下将特殊字符转换成相应的uft8编码,再通过人工写函数或则存过映射替换修改这类字符,或则可以考虑将不间断空格改为GBK中的普通空格,使用如下配置:
REPLACEBADCHAR SPACE
ODS端使用这个参数后再次在源端插入相同的字符串看下:
源端插入数据:
2 SQL> insert into dept values(61,'东
123'||chr(to_number('C2A0','xxxx'))||'东A','a');
3
4 1 row created.
5
6 SQL> commit;
7
8 Commit complete.
9
10 SQL> alter system switch logfile;
11
12 System altered.
13 SQL> select * from dept;
14
15 DEPTNO DNAME LOC
16 ‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
17 1 东东 a
18 60 东123 东A a
19 61 东123 东A a
20 10 ACCOUNTING NEW YORK
21 20 RESEARCH DALLAS
22 30 SALES CHICAGO
23 40 OPERATIONS BOSTON
ODS端查询:
25 SQL> select deptno,dname,loc from dept;
26
27 DEPTNO DNAME LOC
28 ‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ ‐‐‐‐‐‐‐‐
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
29 10 ACCOUNTING NEW YORK
30 20 RESEARCH DALLAS
31 30 SALES CHICAGO
32 40 OPERATIONS BOSTON
33 1 东东 a
34 60 东123\xC2\xA0东A a
35 61 东123 东A a
发现utf8的不间断空格c2a0已经被替换成GBK的普通空格(U+0020)了。问题解决,在生产环境做了配置,实时同步恢复正常!
总结
本文主要介绍的是通过GoldenGate实时同步两个字符集不同的数据库时,如果遇到字符不兼容的问题时,应该怎么处理的解决方案,生产环境需谨慎,最后测试完无异常后在生产环境做了修改。
相关文章
- python实时视频流播放
- 使用Postgres,MobilityDB和Citus大规模(百亿级)实时分析GPS轨迹
- lsyncd实时同步搭建指南——取代rsync+inotify
- Rsync+Inotify实现数据实时同步
- centos8安装sersync为rsync实现实时同步
- 【STM32F407的DSP教程】第47章 STM32F407的IIR带阻滤波器实现(支持逐个数据的实时滤波)
- 使用SAP Analytics Cloud显示新冠肺炎病毒感染人数的实时信息
- [转]CENTOS 使用RSYNC+INOTIFY实现文件实时自动同步
- AI:2020北京智源大会与五位图灵奖得主和100多位专家《共同探讨人工智能的下一个十年》——6月21日~6月24日的日程安排(实时更新,建议收藏)
- RTSP服务器综合项目:rtsp实时监控
- 实时音视频聊天中的延时问题一篇就够,低延时场景及优化
- 用 Python 构建实时股票警报消息系统机器人
- 数据可视化看板:基于 Echarts + Python Flask 动态实时大屏
- 【架构实践】全链路实时追踪系统架构实战: 链路追踪系统 Tracing Analysis System
- inotify+rsync实现实时同步并邮件通知
- 088:vue+openlayers实时显示单个卫星的位置及轨迹 (示例代码)
- rsync+inotify实时同步——筑梦之路
- 腾讯数十亿广告的秘密武器:利用大数据实时精准推荐
- [实时更新]jquery全部版本号下载
- druid.io 海量实时OLAP数据仓库 (翻译+总结) (1)——分析框架如hive或者redshift(MPPDB)、ES等
- Python实时垃圾分类系统(环境教程&完整源码&数据集)
- Rsync 文件同步(四):实时同步 Rsync + Lsyncd
- sersync-实时同步服务