hadoop集群jvm默认编码采用的是usascii,而我们编写代码idea采用是utf-8,修改集群编码为utf-8未成功。修改本地编译器编码无效。
错误:数据量小的时候过正则正常,数据量大了不行,全业务崩溃,经排查,数据格式正常;未采用共享变量,可排出高并发;正则表达式本地测试通过,排除正则错误。错误内容如下:
Lost task 7.3 in stage 6.0 (TID 299) on executor uhadoop-t5ff1u-task1: scala.MatchError (ver=[1.0.4.ip.c],session_id=6281417222615232142,dest_time=2016-05-06 11:46:55,dest_name=Ucloud-udbcluster,src_ip=NULL,src_port=0,dest_ip=NULL,dest_port=0,dest_term=linux,pts=pipe:[1397605744],home=/root,histfile=NULL,histfilesize=0,histsize=0,user=root,pid=28302,ppid=28300,pexe=/bin/bash,ppexe=/bin/bash,pargv=[/bin/bash /root/udb/script/mysql-5.6_lxc_lvm/calc_iops.sh /opt/udb/instance/mysql-5.6 6c50b15e-3187-4552-8a18-c93e2dfe2799],ppargv=[/bin/bash /root/udb/script/mysql-5.6_lxc_lvm/calc_iops.sh /opt/udb/instance/mysql-5.6 6c50b15e-3187-4552-8a18-c93e2dfe2799],pruid=0,peuid=0,ppruid=0,ppeuid=0,pwd=/,file_name=/root/udb/script/mysql-5.6_lxc_lvm/calc_iops.sh,interactive=0,cmd=[grep 6c50b1,localIP=172.28.162.13,UDPIP=172.23.7.111 (of class java.lang.String)) [duplicate 3]
正则表达式内容如下:
val ub1 = "session_id=6281142782794927459,dest_time=2016-05-05 18:01:59,dest_name=Ucloud-udbcluster,src_ip=172.23.0.213,src_port=55917,dest_ip=172.23.93.20,dest_port=22,dest_term=dumb,pts=pipe:[2024273564],home=/root,histfile=NULL,histfilesize=0,histsize=0,user=root,pid=8112,ppid=8110,pexe=/bin/bash,ppexe=/bin/bash,pargv=[sh monitor-udb-scripts.sh e6343c24947c0b6b66c7da01bf24e825],ppargv=[sh monitor-udb-scripts.sh e6343c24947c0b6b66c7da01bf24e825],pruid=0,peuid=0,ppruid=0,ppeuid=0,pwd=/root/common_shells_import,file_name=/root/common_shells_import/monitor-udb-scripts.sh,interactive=0,cmd=[cut -d= -f2],localIP=172.23.93.20,UDPIP=172.23.7.111"
val regex1 = new Regex("([\s\S]),src_ip=([\s\S]),src_port=([\s\S]),user=([\s\S]),pid=([\s\S]),ppexe=([\s\S]),pargv=([\s\S]),cmd=([\s\S])\],localIP=([\s\S]*)")
val regex1(ver,src_ip,src_port,user,pid,ppexe,pargv,cmd,localIP)=ub
现在直接采用字符串分隔的方式不会出错,程序正常运行就是想高明白,为什么这样做会出错。请大神帮忙解决。