<html><head></head><body><div style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:16px;"><div><div>I think this is totally expected behavior: this pgpool instance discovered that it cannot ping the trusted server, so it commits suicide to avoid a split brain scenario. You should check that the other pgpool took over as cluster leader and that it acquired the VIP</div><div><br></div><div>So it looks good !</div><div><br></div><div class="ydp676ae631signature">Pierre</div></div>

            <div><br></div><div><br></div>

            

            <div id="ydp86ef1153yahoo_quoted_0690580538" class="ydp86ef1153yahoo_quoted">

                <div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">

                    

                    <div>

                        On Wednesday, February 28, 2018, 11:01:08 PM GMT+1, Alexander Dorogensky &lt;amazinglifetime@gmail.com&gt; wrote:

                    </div>

                    <div><br></div>

                    <div><br></div>

                    <div><div id="ydp86ef1153yiv7985310519"><div><div dir="ltr"><div class="ydp86ef1153yiv7985310519gmail_default" style="font-family:new, monospace;">It looks like pgpool child crashes.. see below, but I'm not sure..<br clear="none"></div><div class="ydp86ef1153yiv7985310519gmail_default" style="font-family:new, monospace;">So the question remains.. is it a bug or expected behavior?<br clear="none"></div><div class="ydp86ef1153yiv7985310519gmail_default" style="font-family:new, monospace;"><br clear="none">DEBUG:&nbsp; watchdog trying to ping host "10.0.0.100"<br clear="none">WARNING:&nbsp; watchdog failed to ping host"10.0.0.100"<br clear="none">DETAIL:&nbsp; ping process exits with code: 2<br clear="none">WARNING:&nbsp; watchdog lifecheck, failed to connect to any trusted servers<br clear="none">LOG:&nbsp; informing the node status change to watchdog<br clear="none">DETAIL:&nbsp; node id :0 status = "NODE DEAD" message:"trusted server is unreachable"<br clear="none">LOG:&nbsp; new IPC connection received<br clear="none">LOCATION:&nbsp; watchdog.c:3319<br clear="none">LOG:&nbsp; received node status change ipc message<br clear="none">DETAIL:&nbsp; trusted server is unreachable<br clear="none">DEBUG:&nbsp; processing node status changed to DEAD event for node ID:0<br clear="none">STATE MACHINE INVOKED WITH EVENT = THIS NODE LOST Current State = MASTER<br clear="none">WARNING:&nbsp; watchdog lifecheck reported, we are disconnected from the network<br clear="none">DETAIL:&nbsp; changing the state to LOST<br clear="none">DEBUG:&nbsp; removing all watchdog nodes from the standby list<br clear="none">DETAIL:&nbsp; standby list contains 1 nodes<br clear="none">LOG:&nbsp; watchdog node state changed from [MASTER] to [LOST]<br clear="none">DEBUG:&nbsp; STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = LOST<br clear="none">FATAL:&nbsp; system has lost the network<br clear="none">LOG:&nbsp; Watchdog is shutting down<br clear="none">DEBUG:&nbsp; sending packet, watchdog node:[<a shape="rect" href="http://10.0.0.2:5432" rel="nofollow" target="_blank">10.0.0.2:5432</a> Linux alex2] command id:[67] type:[INFORM I AM GOING DOWN] state:[LOST]<br clear="none">DEBUG:&nbsp; sending watchdog packet to socket:7, type:[X], command ID:67, data Length:0<br clear="none">DEBUG:&nbsp; sending watchdog packet, command id:[67] type:[INFORM I AM GOING DOWN] state :[LOST]<br clear="none">DEBUG:&nbsp; new cluster command X issued with command id 67<br clear="none">LOG:&nbsp; watchdog: de-escalation started<br clear="none">DEBUG:&nbsp; shmem_exit(-1): 0 callbacks to make<br clear="none">DEBUG:&nbsp; proc_exit(-1): 0 callbacks to make<br clear="none">DEBUG:&nbsp; shmem_exit(3): 0 callbacks to make<br clear="none">DEBUG:&nbsp; proc_exit(3): 1 callbacks to make<br clear="none">DEBUG:&nbsp; exit(3)<br clear="none">DEBUG:&nbsp; shmem_exit(-1): 0 callbacks to make<br clear="none">DEBUG:&nbsp; proc_exit(-1): 0 callbacks to make<br clear="none">DEBUG:&nbsp; reaper handler<br clear="none">DEBUG:&nbsp; watchdog child process with pid: 30288 exit with FATAL ERROR. pgpool-II will be shutdown<br clear="none">LOG:&nbsp; watchdog child process with pid: 30288 exits with status 768<br clear="none">FATAL:&nbsp; watchdog child process exit with fatal error. exiting pgpool-II<br clear="none">LOG:&nbsp; setting the local watchdog node name to "<a shape="rect" href="http://10.0.0.1:5432" rel="nofollow" target="_blank">10.0.0.1:5432</a> Linux alex1"<br clear="none">LOG:&nbsp; watchdog cluster is configured with 1 remote nodes<br clear="none">LOG:&nbsp; watchdog remote node:0 on <a shape="rect" href="http://10.0.0.2:9000" rel="nofollow" target="_blank">10.0.0.2:9000</a><br clear="none">LOG:&nbsp; interface monitoring is disabled in watchdog<br clear="none">DEBUG:&nbsp; pool_write: to backend: 0 kind:X<br clear="none">DEBUG:&nbsp; pool_flush_it: flush size: 5<br clear="none">...<br clear="none">DEBUG:&nbsp; shmem_exit(-1): 0 callbacks to make<br clear="none">...<br clear="none">DEBUG:&nbsp; lifecheck child receives shutdown request signal 2, forwarding to all children<br clear="none">DEBUG:&nbsp; lifecheck child receives fast shutdown request<br clear="none">DEBUG:&nbsp; watchdog heartbeat receiver child receives shutdown request signal 2<br clear="none">DEBUG:&nbsp; shmem_exit(-1): 0 callbacks to make<br clear="none">DEBUG:&nbsp; proc_exit(-1): 0 callbacks to make<br clear="none">...<br clear="none"></div></div><div class="ydp86ef1153yiv7985310519yqt0009626052" id="ydp86ef1153yiv7985310519yqt02754"><div class="ydp86ef1153yiv7985310519gmail_extra"><br clear="none"><div class="ydp86ef1153yiv7985310519gmail_quote">On Wed, Feb 28, 2018 at 1:53 PM, Pierre Timmermans <span dir="ltr">&lt;<a shape="rect" href="mailto:ptim007@yahoo.com" rel="nofollow" target="_blank">ptim007@yahoo.com</a>&gt;</span> wrote:<br clear="none"><blockquote class="ydp86ef1153yiv7985310519gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><div style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:16px;"><div><div><div style="color:rgb(0,0,0);">I am using pgpool inside a docker container so I cannot tell what the service command will say</div><div style="color:rgb(0,0,0);"><br clear="none"></div><div style="color:rgb(0,0,0);">I think you should have a look at the pgpool log file at the moment you unplug the interface: it will probably say something about the fact that it cannot reach the trusted_server and that it will exclude itself from the cluster (I am not sure). You can also start pgpool in debug to get extra logging. I think that I validated that in the past, I cannot find the doc anymore</div><div style="color:rgb(0,0,0);"><br clear="none"></div><div style="color:rgb(0,0,0);">You can also execute the following command:</div><div style="color:rgb(0,0,0);"><br clear="none"></div><div style="color:rgb(0,0,0);">pcp_watchdog_info -h &lt;ip pgpool&gt; -p 9898 -w</div><div style="color:rgb(0,0,0);"><br clear="none"></div><div style="color:rgb(0,0,0);">it will return information about the watchdog, among others the cluster quorum</div><div style="color:rgb(0,0,0);"><br clear="none"></div><div style="color:rgb(0,0,0);">nb: due to a bug in the packaging by postgres, if you installed pgpool from postgres yum repositories (and not from pgpool) then pcp_watchdog_info will not be in the path (but in a directory somewhere, I forgot which)</div><span class="ydp86ef1153yiv7985310519HOEnZb"><font color="#888888"></font></span><div><br clear="none"></div><br clear="none"></div><span class="ydp86ef1153yiv7985310519HOEnZb"><font color="#888888"></font></span><div><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydp37821253signature">Pierre</div></div><div><div class="ydp86ef1153yiv7985310519h5">

            <div><br clear="none"></div><div><br clear="none"></div>

            

            <div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yahoo_quoted" id="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yahoo_quoted_0190394803">

                <div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">

                    

                    <div>

                        On Wednesday, February 28, 2018, 5:37:49 PM GMT+1, Alexander Dorogensky &lt;<a shape="rect" href="mailto:amazinglifetime@gmail.com" rel="nofollow" target="_blank">amazinglifetime@gmail.com</a>&gt; wrote:

                    </div>

                    <div><br clear="none"></div>

                    <div><br clear="none"></div>

                    <div><div id="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741"><div><div dir="ltr"><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741gmail_default" style="font-family:new, monospace;">With 'trusted_servers' configured, when I unplug 10.0.0.1 it kills pgpool, i.e. 'service pgpool status' reports 'pgpool dead but subsys locked'.<br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741gmail_default" style="font-family:new, monospace;">Is that how it should be?<br clear="none"><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741gmail_default" style="font-family:new, monospace;">Plug/unplug = ifconfig eth0 up/down</div><br clear="none"><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741gmail_default" style="font-family:new, monospace;"><br clear="none"></div></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741yqt0522836643" id="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741yqt15324"><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741gmail_extra"><br clear="none"><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741gmail_quote">On Tue, Feb 27, 2018 at 1:49 PM, Pierre Timmermans <span dir="ltr">&lt;<a shape="rect" href="mailto:ptim007@yahoo.com" rel="nofollow" target="_blank">ptim007@yahoo.com</a>&gt;</span> wrote:<br clear="none"><blockquote class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><div style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:16px;"><div><div>To prevent this split brain scenario (caused by a network partition) you can use the configuration trusted_servers. This setting is a list of servers that pgpool can use to determine if a node is suffering a network partition or not. If a node cannot reach any of the servers in the list, then it will assume it is isolated (by a network partition) and will not promote itself to master.</div><div><br clear="none"></div><div>In general, when you have only two nodes, it is not safe to do an automatic failover I believe.&nbsp; Unless you have some kind of fencing mechanism (means: you can shutdown and prevent a failed node to come back after a failure).</div><div><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp2a8d5a3csignature">Pierre</div></div>

            <div><br clear="none"></div><div><br clear="none"></div>

            

            <div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yahoo_quoted" id="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yahoo_quoted_0647994684">

                <div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;"><div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741h5">

                    

                    <div>

                        On Tuesday, February 27, 2018, 7:58:55 PM GMT+1, Alexander Dorogensky &lt;<a shape="rect" href="mailto:amazinglifetime@gmail.com" rel="nofollow" target="_blank">amazinglifetime@gmail.com</a>&gt; wrote:

                    </div>

                    <div><br clear="none"></div>

                    <div><br clear="none"></div>

                    </div></div><div><div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741h5"><div id="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748"><div dir="ltr"><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">Hi All,<br clear="none"><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">I have a <a shape="rect" href="http://10.0.0.1/10.0.0.2" rel="nofollow" target="_blank">10.0.0.1/10.0.0.2</a> master/hot standby configuration with streaming replication, where each node runs pgpool with watchdog enabled and postgres.<br clear="none"><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">I shut down the network interface on 10.0.0.1 and wait until 10.0.0.2 triggers failover and promotes itself to master through my failover script.<br clear="none"><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">Now the watchdogs on 10.0.0.1 and 10.0.0.2 are out of sync, have conflicting views on which node has failed and both think they are master.<br clear="none"><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">When I bring back the network interface on 10.0.0.1, 'show pool_nodes' says that 10.0.0.1 is master/up and 10.0.0.2 is standby/down. <br clear="none"><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">I want 10.0.0.1 to be standby and 10.0.0.2 to be master. <br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;"><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">I've been playing with the failover script.. e.g.<br clear="none"><br clear="none">if (default network gateway is pingable) {<br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">&nbsp;&nbsp;&nbsp; shut down pgpool and postgres<br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">} else if (this node is standby) {<br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">&nbsp;&nbsp;&nbsp; promote this node to master<br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">&nbsp;&nbsp;&nbsp; create a job that will run every minute and try to recover failed node (base backup) <br clear="none">&nbsp;&nbsp;&nbsp; cancel the job upon successful recovery<br clear="none">} <br clear="none"><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">Can you please help me with this? Any ideas would be highly appreciated.<br clear="none"><br clear="none"></div><div class="ydp86ef1153yiv7985310519m_-6840305041535699284ydpaec9d671yiv6739008741m_3379743114426128903ydp44fa9843yiv4431463748gmail_default" style="font-family:new, monospace;">Regards, Alex<br clear="none"></div></div></div></div></div>______________________________ _________________<br clear="none">pgpool-general mailing list<br clear="none"><a shape="rect" href="mailto:pgpool-general@pgpool.net" rel="nofollow" target="_blank">pgpool-general@pgpool.net</a><br clear="none"><a shape="rect" href="http://www.pgpool.net/mailman/listinfo/pgpool-general" rel="nofollow" target="_blank">http://www.pgpool.net/mailman/ listinfo/pgpool-general</a><br clear="none"></div>

                </div>

            </div></div></div></blockquote></div><br clear="none"></div></div></div></div></div>

                </div>

            </div></div></div></div></div></blockquote></div><br clear="none"></div></div></div></div></div>

                </div>

            </div></div></body></html>