[pgpool-general: 2179] Re: read_startup_packet: incorrect packet length What does that mean?

Wed Oct 9 23:47:20 JST 2013

What is the Zabbix agent? I know there are several Zabbix template for
PostgreSQL floating around.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hi Tatsuo.
> Setting of the log_connections setting helped.
> 
> The log showed this:
> 2013-10-09 08:35:01 ERROR: pid 28315: read_startup_packet: incorrect packet
> length (-1928938790)
> 2013-10-09 08:35:17 LOG:   pid 31625: connection received: host=127.0.0.1
> port=59225
> 2013-10-09 08:35:17 ERROR: pid 31625: read_startup_packet: incorrect packet
> length (-1928938790)
> 2013-10-09 08:35:31 LOG:   pid 28640: connection received: host=127.0.0.1
> port=59229
> 2013-10-09 08:35:31 LOG:   pid 6187: do_child: failback event found. restart
> myself.
> 2013-10-09 08:35:31 ERROR: pid 28640: read_startup_packet: incorrect packet
> length (-1928938790)
> 2013-10-09 08:35:47 LOG:   pid 29594: connection received: host=127.0.0.1
> port=59235
> 2013-10-09 08:35:47 ERROR: pid 29594: read_startup_packet: incorrect packet
> length (-1928938790)
> 2013-10-09 08:36:01 LOG:   pid 28800: connection received: host=127.0.0.1
> port=59239
> 2013-10-09 08:36:01 ERROR: pid 28800: read_startup_packet: incorrect packet
> length (-1928938790)
> 2013-10-09 08:36:17 LOG:   pid 6200: do_child: failback event found. restart
> myself.
> 2013-10-09 08:36:17 LOG:   pid 29099: connection received: host=127.0.0.1
> port=59247
> 2013-10-09 08:36:17 ERROR: pid 29099: read_startup_packet: incorrect packet
> length (-1928938790)
> 2013-10-09 08:36:31 LOG:   pid 32454: connection received: host=127.0.0.1
> port=59252
> 2013-10-09 08:36:31 ERROR: pid 32454: read_startup_packet: incorrect packet
> length (-1928938790)
> 2013-10-09 08:36:47 LOG:   pid 29429: connection received: host=127.0.0.1
> port=59258
> 2013-10-09 08:36:47 ERROR: pid 29429: read_startup_packet: incorrect packet
> length (-1928938790)
> 2013-10-09 08:37:01 LOG:   pid 28640: connection received: host=127.0.0.1
> port=59262
> 2013-10-09 08:37:01 ERROR: pid 28640: read_startup_packet: incorrect packet
> length (-1928938790)
> 
> As I found out from my colleague the zabbix agent checked through its
> trigger if the Pgpool and PostgreSQL port is alive. If he stopped it, the
> messages went away.
> 
> Hopefully I helped somebody.
> 
> Best regards,
> Michal Mistina
> -----Original Message-----
> From: Tatsuo Ishii [mailto:ishii at postgresql.org] 
> Sent: Tuesday, October 8, 2013 8:15 AM
> To: Mistina Michal
> Cc: pgpool-general at pgpool.net
> Subject: Re: [pgpool-general: 2141] Re: read_startup_packet: incorrect
> packet length What does that mean?
> 
>> Hi Tatsuo,
>>> Hi Michal,
>>>> Hi Tatsuo.
>>>> Sorry for the delayed answer.
>>>> In the meantime I've changed the configuration, but the message
>>>> "read_startup_packet: incorrect packet length" didn't go away. I am 
>>>> attaching configuration at the end of the e-mail and also a log file 
>>>> right after I started Pgpool.
>>>> 
>>>> I think I understand the issue I've described in the last e-mail. I 
>>>> mean the psql -c "show pool_nodes;" -d postgres -U postgres -p 9898 
>>>> -> the port should be 9999. Just silly typo.
>>>> 
>>>> I would like to summary of the issues I am experiencing now:
>>>> 1. There are periodic ERROR messages in the Pgpool log file -
>>>> "read_startup_packet: incorrect packet length (-1656006549)".
>>>> 2. There are periodic messages in the PostgreSQL log file - 
>>>> "incomplete startup packet".
>>>> 3. If I halt or reboot primary node (soft shutdown), the secondary 
>>>> node triggers fail-over immediatelly. --> NOT DESIRED BEHAVIOUR 4. 
>>>> If I power off the primary node (unexpectedly), the secondary node 
>>>> applies health_check and triggers fail-over after health_check 
>>>> time-out. --> DESIRED BEHAVIOUR
>>>> 
>>>> I think, the first two issues are related. But don't know what could
>> cause it. 
>>>> We are using special services which connects to the Pgpool by using 
>>>> JDBC. Can the services be the source of these error messages?
>> 
>>>No idea. However, you can enable "log_connections" which logs who is
>> connecting to pgpool. This will reveal who is the client when the 
>> errors raise.
>> Good point. I will try to investigate it and also asks our developers 
>> what could do that. From what I know the services are doing the 
>> "SELECT 1" query to determine if the backend is alive. But that is a
> normal thing. We'll see.
>> 
>>>> 
>>>> According to issue number 3.. I can see in the log file, what happened: 
>>>> "postmaster on DB node 0 was shutdown by administrative command".
>>>> How can I force Pgpool to not fail-over after the administrative 
>>>> command was issued? I would like to apply same health_check timouts 
>>>> also in the situation when the administrative command was issued. It 
>>>> is because in our environment one node is represented by more hosts 
>>>> where the Postgresql services are migrating from one host to another
>> automatically. It occurs also if the "halt"
>>>> or "reboot" command is issued on that one node. This is not 
>>>> errorneous condition for our environment, because if the "halt" is 
>>>> issued on one host, PostgreSQL service will be brought up on another 
>>>> node. Virtual IP address is in front of it.
>> 
>>>I understand your situation. What you can do is setting:
>> 
>>>backend_flag0 = 'DISALLOW_TO_FAILOVER'
>> 
>>>This will prevent your primary node from fail-over by "shutdown by
>> administrative command". Please note that this will also prevent the 
>> primary from fail-over by the health check.
>> This is what I was hoping to avoid. Because it will absolutly disallow 
>> to fail-over from primary node to the secondary. But thank you anyway. 
>> I am now certain that, there is no other way and I should include 
>> additional steps to maintenance manual, which should be taken in the 
>> process of restarting one host by administrative command where pgpool
> currently runs.
>> 
>> One more question.. do you plan to add the setting of "not to 
>> fail-over by administrative command" in the future releases or it is 
>> absolutely wrong idea?
> 
> IMO, that would be a good idea. I will add it to the TODO list.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp