[pgpool-general: 6152] Re: What happens when node is assigned as shutdown

Pierre Timmermans ptim007 at yahoo.com
Thu Jun 28 03:45:05 JST 2018

Yes I think a node is detached only when there is a "failover", i.e. when pgpool detects that the node is down either via the health check or the backend error. 
If you had health check, it depends if the interval (health_check_max_retry * health_check_retry_interval) was bigger than the time the network was unavailable: then there would be no failover but otherwise well.
Unfortunately pgpool does not want to implement the feature that would re-attach a node when it comes back again (and it is healthy): in the FAQ section they explain that it would not be difficult but that it is not safe. So if you need this functionality you must implement it yourself,  one solution is to do a cron job that will check if pgpool status down and role standby: if this database is streaming from the master then the job would re-attach it. If it is not streaming than the job could do a pcp_node_recovery
I also believe it is better to use health check instead of fail_over_on_backend_error. In the doc they advise against using both because it does not make sense. With fail_over_on_backend_error, I think the failover does not happen directly, only when a pgpool child process receives a failure signal from postgres (typically if the database was stopped or is crashed), so if no connection is actively used the failover does not happen until a connection is used (to be verified).


    On Wednesday, June 27, 2018, 4:09:55 PM GMT+2, Mariel Cherkassky <mariel.cherkassky at gmail.com> wrote:  
 A node is detached only when failover (in the pgpool level) occurs ? Can you give me an example for a postgresql error that might cause a failover in case the ail_over_on_backend_error is set to on ? Why would someone set then all_over_on_backend_error to on ? Isnt it better to use the health check ?  If I'll set the all_over_on_backend_error to off and use the health check wouldnt it be better ? I mean, in my case I had a network conflict for 1-2 minutes and because of that the slave was failovered and I had to reattach it. If i had health check then after 2 minutes he was available again ?
thanks , Mariel.
2018-06-27 14:33 GMT+03:00 Pierre Timmermans <ptim007 at yahoo.com>:
You have to investigate in the log of pgpool why the node was detached, the log is normally quite verbose. My understanding is that if the node was detached by pgpool it means that a failover has occured. There are 3 cases that trigger a failover
* health_check is one possibility but it is disabled on your system.* Another parameter that can trigger a failover is "fail_over_on_backend_error". If fail_over_on_backend_error is "on" then when a backend postgres connection receives an error it triggers a fail-over.* Finally I believe that when you do a shutdown of the database then a failover is also always triggered, even if you disabled the health_check and the fail_over_on_backend_error.
In all cases, when pgpool detaches a failed standby it will not re-attach it automatically when the database comes back. So one must do like you did: re-attach it manually.
For your last comment, yes I am quite sure that all running queries on the failed node will be lost, the application is responsible for catching the error (connection lost) and to relauch the query. But this is normal, I think Oracle has a kind of functionality that enables (at least in theory) transparent application failure but it is so complex that I believe no one uses that !
As I am learning pgpool myself I am not 100% sure though, Tatsuo or Beng should correct me eventually

    On Wednesday, June 27, 2018, 11:55:33 AM GMT+2, Mariel Cherkassky <mariel.cherkassky at gmail.com> wrote:  
 The problem is that after 2 minutes I connected and attached the node and it worked like magic. I I checked the health_check settings in my pool and I guess that it worked immediatly because the health check parameters are default in my pool :#----------------------------- ------------------------------ -------------------# HEALTH CHECK PER NODE PARAMETERS (OPTIONAL)#----------------------------- ------------------------------ -------------------health_check_period0 = 0health_check_timeout0 = 20health_check_user0 = 'nobody'health_check_password0 = ''health_check_database0 = ''health_check_max_retries0 = 5 health_check_retry_delay0 = 20connect_timeout0 = 10000health_check_max_retries = 0
#----------------------------- ------------------------------ -------------------# HEALTH CHECK GLOBAL PARAMETERS#----------------------------- ------------------------------ -------------------
health_check_period = 0                                   # Health check period                                   # Disabled (0) by defaulthealth_check_timeout = 20                                   # Health check timeout                                   # 0 means no timeouthealth_check_user = 'nobody'                                   # Health check userhealth_check_password = ''                                   # Password for health check userhealth_check_database = ''                                   # Database name for health check. If '', tries 'postgres' frist, then 'template1'
health_check_max_retries = 0                                   # Maximum number of times to retry a failed health check before giving up.health_check_retry_delay = 1                                   # Amount of time to wait (in seconds) between retries.connect_timeout = 10000                                   # Timeout value in milliseconds before giving up to connect to backend.                                                                   # Default is 10000 ms (10 second). Flaky network user may want to increase                                                                   # the value. 0 means no timeout.                                                                   # Note that this value is not only used for health check,                                                                   # but also for ordinary conection to backend.

So, I guess I should change those defaults right ? Even if I dont want the pool to be responsible on the failover of the primary and the replication.
Just to understand your last paragraph, In case my first standby database id down pgpool will destroy all the connections to him(and all the queries that run at that moment will be destroyed) and new sessions will be created to the second node or to the primary. After the failover the pool wont run those queries right ?

2018-06-26 22:20 GMT+03:00 Pierre Timmermans <ptim007 at yahoo.com>:
I understand what you mean, but don't forget that the connections to the postgres database are established by pgpool, so the applications are not connected directly to postgres : the application connects to pgpool and pgpool connects to postgres. Therefore if pgpool cannot see the database anymore, because of a network failure, then all connections are stucked (even if postgres DB is still fine).
So when pgpool cannot see the database anymore (health check), it will try a few time (health_check_max_retries) and then it will do the failover : during the failover all backend connections are destroyed and then re-created (exception is when the standby fails, then backend connections to the primary are not affected, even if load_balance is on, as Tatsuo confirmed in other mail)
So it is not the same situation as in Oracle with the listener, in the case of the oracle listener, the listener process just establish the connection and then it hand it over to the client application and from this point the listener is not needed anymore for this process. The advantage of pgpool is that it can re-establish the connections transparently when a fail-over happens. it is not completely transparent because when the master is is failed-over all connections are first disconnected and then re-connected but it is impossible to avoid that I guess.

    On Tuesday, June 26, 2018, 4:53:05 PM GMT+2, Mariel Cherkassky <mariel.cherkassky at gmail.com> wrote:  
 But what if my standby didnt fail ? I mean, in my case I had a network conflict between the standby and the pool. The standby was up and was working. In that case stil the session will be disconneced ? And why only in case the load_balance_mode is off the seesion will not be disrupted ? Isnt it suppose to be the the opposite ? I have 2 standbys and In case one of my standby`s is down or doesnt work all the current sessions to him will be moved to the second standby.
2018-06-26 17:20 GMT+03:00 Pierre Timmermans <ptim007 at yahoo.com>:
Note that if standby node fails and if load_balance_node is off, then the pgpool connection will not be disrupted by the failover happening on the standby (at least with recent version of pgpool): this is documented in the doc and I can confirm that it is working in my tests.
In all other cases the backend connection will be disconnected and I suppose the client will receive an error and the query result will be lost

    On Tuesday, June 26, 2018, 3:35:41 PM GMT+2, Mariel Cherkassky <mariel.cherkassky at gmail.com> wrote:  
 The moment the pool assign the shutdown status to a specific node all active session to that node are closed immediatly ?
2018-06-26 16:11 GMT+03:00 Tatsuo Ishii <ishii at sraoss.co.jp>:
> Hi,
> I wanted to understand what happens to a session that was opened from the
> pool to   one of the nodes in my cluster (query that is taking x time).
> Suddenly there was a network conflict between the pool node and the
> database node and the pool change the status of the database node to
> shutdown(the node is slave). What happens to the query that runned ? What
> happens to the session that was open ?

The session will be closed. This is pretty much similar when you
directly connect to PostgreSQL.

> Will the output of the query will
> come back to the user ?

Though it really depends on the timing of failure, probably the output
of the query will not come bacl to the user in most cases.

Best regards,
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_ en.php
Japanese:http://www.sraoss.co. jp

______________________________ _________________
pgpool-general mailing list
pgpool-general at pgpool.net
http://www.pgpool.net/mailman/ listinfo/pgpool-general



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20180627/8759ce47/attachment-0001.html>

More information about the pgpool-general mailing list