注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

PostgreSQL 中文网

 
 
 

日志

 
 

GreenPlum 问题一例 "ERROR: Interconnect timeout: Unable to complete setup of all connections within time limit."  

2011-04-18 12:18:08|  分类: GreenPlumn |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

   

         虚拟机 GreenPlum 数据仓库搭建好以后,可以执行建库,建表,数据插入语句,就是不能执行"select"操作
报以下ERROR,这个问题折腾了好几天,后来在德哥的帮忙下,终于解决了,原来自己在一个配置上疏忽
了。。。。。


1问题现象,不能执行查询SQL
--1.1创建测试表,并插入数据
warehouse=# create table test_1 (id  integer , name varchar(32)) distributed by (id);
CREATE TABLE

warehouse=# \d test_1
           Table "public.test_1"
 Column |         Type          | Modifiers
--------+-----------------------+-----------
 id     | integer               |
 name   | character varying(32) |
Distributed by: (id)

warehouse=# insert into test_1 values (1,'francs');
INSERT 0 1

warehouse=# insert into test_1 values (2,'fpZhou');
INSERT 0 1

--1.2查询报错
warehouse=# select * From test_1;
ERROR:  Interconnect timeout: Unable to complete setup of all connections within time limit.
DETAIL:  Completed 0 of 2 incoming and 0 of 0 outgoing connections.  gp_interconnect_setup_timeout = 20 seconds.


2 处理过程
--2.1检查Master节点,未发现明显异常

--2.2检查Segment节点,发现以下ERROR
2011-04-14 21:14:05.535064 CST,"warehouse","warehouse",p20619,th-1208420672,"192.168.1.50","41829",2011-04-14 21:08:26 CST,1279,con6,cmd16,seg0,slice1,,x1279,sx1,"LOG","58M01",Interconnect could not connect to seg-1 127.0.0.1:61572 pid=9845; will retry. Connection refused (connect errno 111)",,,,,,"select * From test_1

    从上面的信息来看“seg-1 127.0.0.1” , seg-1 指的是 master 节点,怎么解析成 "127.0.0.1"了, 其实这里我早应该
发现,这是关键信息。
  
--2.3尝试调整参数 gp_interconnect_setup_timeout
  将参数 gp_interconnect_setup_timeout 调大到 120s 后,依然无效。

--2.4 GP整体状态检查,未发现异常
[greenplum@gpmaster opt]$ gpstate -s
20110414:20:12:43:gpstate:gpmaster:greenplum-[INFO]:-Obtaining GPDB array type, [Brief], please wait...
20110414:20:13:01:gpstate:gpmaster:greenplum-[INFO]:-Obtaining GPDB array type, [Brief], please wait...
20110414:20:13:20:gpstate:gpmaster:greenplum-[INFO]:-Spawning parallel processes batch [1], please wait...
....
20110414:20:13:40:gpstate:gpmaster:greenplum-[INFO]:-Waiting for parallel processes batch [1], please wait...
...............
20110414:20:14:03:gpstate:gpmaster:greenplum-[INFO]:-Master Configuration & Status
20110414:20:14:03:gpstate:gpmaster:greenplum-[INFO]:------------------------------
20110414:20:14:04:gpstate:gpmaster:greenplum-[INFO]:-Master host = gpmaster
20110414:20:14:05:gpstate:gpmaster:greenplum-[INFO]:-Master postgres process ID = 20992
20110414:20:14:06:gpstate:gpmaster:greenplum-[INFO]:-Master data directory = /opt/gp_data/gp-1
20110414:20:14:06:gpstate:gpmaster:greenplum-[INFO]:-Database name = template1
20110414:20:14:07:gpstate:gpmaster:greenplum-[INFO]:-Master port = 5432
20110414:20:14:07:gpstate:gpmaster:greenplum-[INFO]:-Master current role = dispatch
20110414:20:14:08:gpstate:gpmaster:greenplum-[INFO]:-Greenplum array configuration type = Standard
20110414:20:14:09:gpstate:gpmaster:greenplum-[INFO]:-Greenplum initsystem version = 2|3.3.6.1 build 1
20110414:20:14:10:gpstate:gpmaster:greenplum-[INFO]:-Greenplum current version = PostgreSQL 8.2.13 (Greenplum Database 3.3.6.1 build 1) on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) compiled on Apr 2 2010 16:39:49
20110414:20:14:10:gpstate:gpmaster:greenplum-[INFO]:-Postgres version = 8.2.13
20110414:20:14:11:gpstate:gpmaster:greenplum-[INFO]:-Greenplum fault mode = readonly
20110414:20:14:11:gpstate:gpmaster:greenplum-[INFO]:-Greenplum mirroring status = on
20110414:20:14:12:gpstate:gpmaster:greenplum-[INFO]:-Greenplum master standby = No master standby configured
20110414:20:14:12:gpstate:gpmaster:greenplum-[INFO]:-Greenplum standby master state =
20110414:20:14:13:gpstate:gpmaster:greenplum-[INFO]:-Segment instance status
20110414:20:14:13:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:14:gpstate:gpmaster:greenplum-[INFO]:-Parallel process exit status
20110414:20:14:14:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:15:gpstate:gpmaster:greenplum-[INFO]:-Total processes marked as completed = 4
20110414:20:14:15:gpstate:gpmaster:greenplum-[INFO]:-Total processes marked as killed = 0
20110414:20:14:15:gpstate:gpmaster:greenplum-[INFO]:-Total processes marked as failed = 0
20110414:20:14:16:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:17:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:17:gpstate:gpmaster:greenplum-[INFO]:-Segment Instance Status Report
20110414:20:14:17:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:18:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:18:gpstate:gpmaster:greenplum-[INFO]:-Segment instance hostname = gpnode1
20110414:20:14:19:gpstate:gpmaster:greenplum-[INFO]:-Segment instance port = 50001
20110414:20:14:19:gpstate:gpmaster:greenplum-[INFO]:-Segment instance datadir = /opt/gp_data/data/gp0 Type = Primary
20110414:20:14:20:gpstate:gpmaster:greenplum-[INFO]:-Master reports status as = Valid
20110414:20:14:20:gpstate:gpmaster:greenplum-[INFO]:-Total errors (remote) = 0
20110414:20:14:20:gpstate:gpmaster:greenplum-[INFO]:-File postmaster.pid = Found
20110414:20:14:21:gpstate:gpmaster:greenplum-[INFO]:-PID from postmaster.pid file = 18851
20110414:20:14:21:gpstate:gpmaster:greenplum-[INFO]:-Lock files in /tmp status = Found
20110414:20:14:22:gpstate:gpmaster:greenplum-[INFO]:-Active PID = 18851
20110414:20:14:22:gpstate:gpmaster:greenplum-[INFO]:-Instance status = Active
20110414:20:14:23:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:23:gpstate:gpmaster:greenplum-[INFO]:-Segment instance hostname = gpnode1
20110414:20:14:24:gpstate:gpmaster:greenplum-[INFO]:-Segment instance port = 60001
20110414:20:14:24:gpstate:gpmaster:greenplum-[INFO]:-Segment instance datadir = /opt/gp_data/mdata/gp1 Type = Mirror
20110414:20:14:24:gpstate:gpmaster:greenplum-[INFO]:-Master reports status as = Valid
20110414:20:14:25:gpstate:gpmaster:greenplum-[INFO]:-Total errors (remote) = 0
20110414:20:14:25:gpstate:gpmaster:greenplum-[INFO]:-File postmaster.pid = Found
20110414:20:14:26:gpstate:gpmaster:greenplum-[INFO]:-PID from postmaster.pid file = 18865
20110414:20:14:26:gpstate:gpmaster:greenplum-[INFO]:-Lock files in /tmp status = Found
20110414:20:14:26:gpstate:gpmaster:greenplum-[INFO]:-Active PID = 18865
20110414:20:14:27:gpstate:gpmaster:greenplum-[INFO]:-Mirror status = Passive
20110414:20:14:27:gpstate:gpmaster:greenplum-[INFO]:-Instance status = Valid
20110414:20:14:28:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:29:gpstate:gpmaster:greenplum-[INFO]:-Segment instance hostname = gpnode2
20110414:20:14:29:gpstate:gpmaster:greenplum-[INFO]:-Segment instance port = 50001
20110414:20:14:29:gpstate:gpmaster:greenplum-[INFO]:-Segment instance datadir = /opt/gp_data/data/gp1 Type = Primary
20110414:20:14:30:gpstate:gpmaster:greenplum-[INFO]:-Master reports status as = Valid
20110414:20:14:30:gpstate:gpmaster:greenplum-[INFO]:-Total errors (remote) = 0
20110414:20:14:31:gpstate:gpmaster:greenplum-[INFO]:-File postmaster.pid = Found
20110414:20:14:31:gpstate:gpmaster:greenplum-[INFO]:-PID from postmaster.pid file = 17141
20110414:20:14:32:gpstate:gpmaster:greenplum-[INFO]:-Lock files in /tmp status = Found
20110414:20:14:32:gpstate:gpmaster:greenplum-[INFO]:-Active PID = 17141
20110414:20:14:32:gpstate:gpmaster:greenplum-[INFO]:-Instance status = Active
20110414:20:14:33:gpstate:gpmaster:greenplum-[INFO]:------------------------------------------------
20110414:20:14:33:gpstate:gpmaster:greenplum-[INFO]:-Segment instance hostname = gpnode2
20110414:20:14:34:gpstate:gpmaster:greenplum-[INFO]:-Segment instance port = 60001
20110414:20:14:34:gpstate:gpmaster:greenplum-[INFO]:-Segment instance datadir = /opt/gp_data/mdata/gp0 Type = Mirror
20110414:20:14:35:gpstate:gpmaster:greenplum-[INFO]:-Master reports status as = Valid
20110414:20:14:35:gpstate:gpmaster:greenplum-[INFO]:-Total errors (remote) = 0
20110414:20:14:36:gpstate:gpmaster:greenplum-[INFO]:-File postmaster.pid = Found
20110414:20:14:36:gpstate:gpmaster:greenplum-[INFO]:-PID from postmaster.pid file = 17153
20110414:20:14:36:gpstate:gpmaster:greenplum-[INFO]:-Lock files in /tmp status = Found
20110414:20:14:37:gpstate:gpmaster:greenplum-[INFO]:-Active PID = 17153
20110414:20:14:38:gpstate:gpmaster:greenplum-[INFO]:-Mirror status = Passive
20110414:20:14:38:gpstate:gpmaster:greenplum-[INFO]:-Instance status = Valid

--2.5网络检查,发现网络不好
[greenplum@gpmaster ~]$ gpchecknet -f all_host_file -d test/
/opt/greenplum-db/./bin/gpchecknet -f all_host_file -d test/

-------------------
--  NETPERF TEST
-------------------

====================
==  RESULT
====================
Netperf bisection bandwidth test
gpnode1 -> gpnode2 = 6.090000
gpnode2 -> gpmaster = 7.730000
gpnode2 -> gpnode1 = 6.960000
gpmaster -> gpnode2 = 6.530000

Summary:
sum = 27.31 MB/sec
min = 6.09 MB/sec
max = 7.73 MB/sec
avg = 6.83 MB/sec
median = 6.96 MB/sec

[Warning] connection between gpnode1 and gpnode2 is no good
[Warning] connection between gpmaster and gpnode2 is no good


--2.6虚拟机用的是 HOST-ONLY 网络连接,尝试将网络连接改成桥接,依然无效


--2.7后来实在搞不定,在德哥的帮助下,终于确定了以下问题

--2.8ping gpmaster 主机发现问题
[root@gpmaster ~]# ping gpmaster
PING gpmaster (127.0.0.1) 56(84) bytes of data.
64 bytes from gpmaster (127.0.0.1): icmp_seq=1 ttl=64 time=2.55 ms
64 bytes from gpmaster (127.0.0.1): icmp_seq=2 ttl=64 time=0.073 ms
64 bytes from gpmaster (127.0.0.1): icmp_seq=3 ttl=64 time=0.040 ms
64 bytes from gpmaster (127.0.0.1): icmp_seq=4 ttl=64 time=0.049 ms
64 bytes from gpmaster (127.0.0.1): icmp_seq=5 ttl=64 time=0.039 ms

  ping gpmaster 发现,gpmaster 解析为 127.0.0.1, 应该是 192.168.1.50 才对


--2.9 检查 /etc/hosts

--2.9.1 原来的 /etc/hosts内容
# that require network functionality will fail.
127.0.0.1               gpmaster localhost.localdomain localhost
192.168.1.50 gpmaster
192.168.1.51 gpnode1
192.168.1.52 gpnode2

   发现, gpmaster 解析成 127.0.0.1 了,先读的第一行。

--2.9.2修改后的 /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.1.50 gpmaster
192.168.1.51 gpnode1
192.168.1.52 gpnode2

--2.9.3再次 ping gpmaster
[root@gpmaster ~]# ping gpmaster
PING gpmaster (192.168.1.50) 56(84) bytes of data.
64 bytes from gpmaster (192.168.1.50): icmp_seq=1 ttl=64 time=2.59 ms
64 bytes from gpmaster (192.168.1.50): icmp_seq=2 ttl=64 time=0.066 ms
64 bytes from gpmaster (192.168.1.50): icmp_seq=3 ttl=64 time=0.038 ms
64 bytes from gpmaster (192.168.1.50): icmp_seq=4 ttl=64 time=0.051 ms
64 bytes from gpmaster (192.168.1.50): icmp_seq=5 ttl=64 time=0.044 ms

   这时,gpmaster 的IP解析为 "192.168.1.50"。

--2.9.4退出 session 后,重新测试
warehouse=# select * From test_1;
 id |  name 
----+--------
  1 | francs
  2 | fpZhou
(2 rows)

       这次终于OK了,数据可以查询了。
     
总结: 1  这个问题折腾了好几天,结果问题发生在 /etc/hosts上,还是在高手帮忙的情况下
                解决的,应该好好反思下,OS层面的知识比较薄弱,对基本知识应该加强;
      
            2  /etc/hosts 解析是从上到下解析的,当解析文件 /etc/hosts 时,第一次匹配成功
                则返回。

  评论这张
 
阅读(28232)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016