注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

PostgreSQL 中文网

 
 
 

日志

 
 

PostgreSQL: 流复制备库 PITR 恢复一例  

2014-07-16 16:37:13|  分类: PG案例分析 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

        今天发现一套流复制环境备库异常, 已经不能和主库同步了, 这个库是日志库, 经常出现主备数据同步延迟的告警, 这类告警经常被俺疏忽, 环境信息如下:
     
--流复制环境
PG 版本: 9.2.8
数据库大小 : > 3 TB
业务类型: 日志库
高可用: 流复制
WAL SIZE:  16 MB
每天归档量: 30000 个左右 WAL

--备库数据库日志

2014-07-16 10:14:33.515 CST,,,1202,,53c5e009.4b2,2,,2014-07-16 10:14:33 CST,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:  requested WAL segment 00000006000060EB000000A2 has already been removed",,,,,,,,"libpqrcv_receive, libpqwalreceiver.c:389",""
2014-07-16 10:14:38.520 CST,,,1210,,53c5e00e.4ba,1,,2014-07-16 10:14:38 CST,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,"libpqrcv_connect, libpqwalreceiver.c:171",""
2014-07-16 10:14:38.520 CST,,,1210,,53c5e00e.4ba,2,,2014-07-16 10:14:38 CST,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:  requested WAL segment 00000006000060EB000000A2 has already been removed"

备注:大量上述日志, 关于这个问题, 之前写了 blog ,如下:
PostgreSQL:“ FATAL: requested WAL segment 0000000800002A0000000000 has already been removed”

      之前的处理方法是重做备库,  这次不需要, 因为前段时间已开启了此库的归档。
     
--查看主库的归档

postgres@db--> psql
psql (9.2.8)
Type "help" for help.

postgres=# show archive_command ;
                                        archive_command                                         
------------------------------------------------------------------------------------------------
 DIR=/pg_arch/arch/`date +%F`; test ! -d $DIR && mkdir $DIR; test ! -f $DIR/%f && cp %p $DIR/%f
(1 row)

备注: 每天一个归档目录,  既然有了归档, 计划将归档目录 nfs 共享给备库做恢复

--归档目录

postgres@db--> ls /pg_arch/arch/
2014-06-30  2014-07-02  2014-07-04  2014-07-06  2014-07-08  2014-07-10  2014-07-12  2014-07-14  2014-07-16
2014-07-01  2014-07-03  2014-07-05  2014-07-07  2014-07-09  2014-07-11  2014-07-13  2014-07-15


配置 nfs
--修改主库 /etc/exports

/pg_arch        192.168.xxx.xxx/32(rw,no_root_squash,sync)


--重启 nfs 服务

[root@db- arch]# service nfs restart
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Shutting down RPC idmapd:                                  [  OK  ]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
^[[AStarting RPC idmapd:                                   [  OK  ]


--备库 mount 归档目录

[root@db2 ~]# mkdir /pg_restore
[root@db2 ~]# mount -t nfs 192.168.xxx.xxx:/pg_arch /pg_restore

备注: 我们将归档目录 mount 到 /pg_restore 目录, nfs 其它配置这里不记录了.请查看 nfs 相关文档,接下来可以进行还原操作了.

-- 修改备库 recovery.conf 文件的以下参数

restore_command = 'cp /pg_restore/arch/2014-07-[1-3][0-9]/%f %p;'

备注: 因为要取多天归档,  所以用了正则.

--操作前, 调整主库参数

wal_keep_segments = 20000

备注: 之前设置得太小了,调整后执行 reload 操作。

--重启备库

pg_ctl restart -m fast -D $PGDATA

备注: 接下来进行备库恢复操作, 重启备库。

--查看备库日志

2014-07-16 10:14:54.583 CST,,,1305,,53c5e01e.519,1,,2014-07-16 10:14:54 CST,,0,LOG,00000,"database system was shut down in recovery at 2014-07-16 10:14:42 CST",,,,,,,,"StartupXLOG, xlog.c:6273",""
2014-07-16 10:15:09.231 CST,,,1619,"",53c5e02d.653,1,"",2014-07-16 10:15:09 CST,,0,LOG,00000,"connection received: host=192.168.100.3 port=50518",,,,,,,,"BackendInitialize, postmaster.c:3471",""
2014-07-16 10:15:09.234 CST,,,1619,"192.168.100.3:50518",53c5e02d.653,2,"",2014-07-16 10:15:09 CST,,0,LOG,08P01,"incomplete startup packet",,,,,,,,"ProcessStartupPacket, postmaster.c:1505",""
2014-07-16 10:15:50.825 CST,,,2241,"",53c5e056.8c1,1,"",2014-07-16 10:15:50 CST,,0,LOG,00000,"connection received: host=127.0.0.1 port=11732",,,,,,,,"BackendInitialize, postmaster.c:3471",""
2014-07-16 10:15:50.825 CST,"postgres","postgres",2241,"127.0.0.1:11732",53c5e056.8c1,2,"",2014-07-16 10:15:50 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,"ProcessStartupPacket, postmaster.c:1759",""
2014-07-16 10:15:52.906 CST,,,2258,"",53c5e058.8d2,1,"",2014-07-16 10:15:52 CST,,0,LOG,00000,"connection received: host=127.0.0.1 port=11734",,,,,,,,"BackendInitialize, postmaster.c:3471",""
....
...
2014-07-16 10:23:15.393 CST,,,1305,,53c5e01e.519,11,,2014-07-16 10:14:54 CST,1/0,0,LOG,00000,"restored log file ""00000006000060EB0000000C"" from archive",,,,,,,,"RestoreArchivedFile, xlog.c:3273",""
2014-07-16 10:23:18.318 CST,,,1305,,53c5e01e.519,12,,2014-07-16 10:14:54 CST,1/0,0,LOG,00000,"restored log file ""00000006000060EB0000000D"" from archive",,,,,,,,"RestoreArchivedFile, xlog.c:3273",""
2014-07-16 10:24:17.714 CST,,,1305,,53c5e01e.519,13,,2014-07-16 10:14:54 CST,1/0,0,LOG,00000,"restored log file ""00000006000060EB0000000E"" from archive",,,,,,,,"RestoreArchivedFile, xlog.c:3273",""
2014-07-16 10:24:20.552 CST,,,1305,,53c5e01e.519,14,,2014-07-16 10:14:54 CST,1/0,0,LOG,00000,"restored log file ""00000006000060EB0000000F"" from archive",,,,,,,,"RestoreArchivedFile, xlog.c:3273",""

....
省略部分日志

备注: 可以看到, 备库在拼命地读归档日志并 apply,  库比较大,  WAL 也很多,  共有两天的 WAL 日志需要应用 , 估计没个一天半载跑不完。( 后来花了一天半才恢复!)

--后续措施
1 调高 wal_keep_segments 参数
2 加大监控力度

--参考
  评论这张
 
阅读(1487)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016