[pgpool-general: 2050] Re: Suggestion about two datacenters connected through WAN

Mon Aug 19 15:48:00 JST 2013

> Yes, I am using DRBD. But there is a layer - pacemaker, which should handle
> DRBD.
> It works on what they called "resources". As a resource we understand
> service. In my case the services are: VirtualIP address, PostgreSQL,
> FileSystem (XFS), DRBD, LVM. Pacemaker controls the order of stopping and
> starting services on each node. The order of stopping services in my
> environment is like this (each node): Turn off VirtualIP address -> Stop
> PostgreSQL -> Dismount Filesystem -> Dismount LVM Virtual Groups
> . The DRBD device was in primary role on node 1. As the last thing (after
> everyghing is stopped) the DRBD is promoted to primary role on node 2 and
> degraded to secondary role on node 1. Then each service in reverse order is
> started on the node 2.
> 
> This worked until I attached streaming replication slave.
> 
> I read, the pgpool can run also as the resource, so it is started as the
> last service after PostgreSQL.

So the PostgreSQL database cluster directory is on the file system
managed by Pacemaker? That means on the secondary node initially the
file system is not mounted, that makes PostgreSQL on the secondary
node cannot start because there's no file system. Am I missing
something?

> We needed some level of the "dumb resistence" which means if somebody
> accidentally turn off or reboot one node, the services automatically start
> on the node 2. If somebody then reboots node 2 and node 1 is up services are
> automatically booted on node 1.
> 
> Do you think 4 nodes can be handled by pgpool with some level of that "dumb
> resistence"? I saw pgpool has the option to set up weight on nodes. Is it a
> good option to set which servers are in the primary technical center?
> TC1: node 1, node 2
> TC2: node 3, node 4

The "weight" parameter is nothing to do with making particular server
as "the primary technical center". Do you have any well defined policy
to promote which node when the primary node goes down? I mean, you
have 3 candidates (node2, 3, 4) to be promoted and you need to decide
which one should be promoted in the case. If you have a well defined
policy, which a computer program can implement, you could set up "dumb
Resistance" system.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Streaming replication:
> Node 1 (master) -> Node 2, 3, 4 (slaves)
> 
> Pgpool:
> Node 1: pgpool / postgresql (R/W)
> Node 2: pgpool / postgresql (R)
> Node 3: pgpool / postgresql (R)
> Node 4: pgpool / postgresql (R) 
> 
> Best regards,
> Michal Mistina
> -----Original Message-----
> From: Tatsuo Ishii [mailto:ishii at postgresql.org] 
> Sent: Monday, August 19, 2013 12:28 AM
> To: Mistina Michal
> Cc: pgpool-general at pgpool.net
> Subject: Re: [pgpool-general: 2029] Suggestion about two datacenters
> connected through WAN
> 
> I'm not familiar with pacemaker at all, so this is just a guess. Do you sync
> PostgreSQL database cluster using DRBD? If so, I think you should not do
> that. PostgreSQL modifies the database through the file system mounted and
> in the mean time DRBD modifies the file system, which would lead to
> corruption.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
> 
>> Hi Tatsuo.
>> 
>>> What kind of problem do yo have with PostgreSQL streaming replication?
>> 
>> I don't know if this is the right forum, but maybe you can help me. The
> issue does not concern directly pgpool, but I'd like to use pgpool if I
> solve this one. I already wrote to pgsql-general. Nobody answered yet.
>> 
>> In summary, the main issue rises after I set up streaming replication - I
> am unable to stop postgresql service correctly on master. After issuing
> /etc/init.d/postgresql-9.2 stop the postmaster.pid remains on the filesystem
> and moreover it is corrupted. I am unable to delete it with rm command.
>> 
>> It looks like this:
>> [root at tstcaps01 ~]# ll /var/lib/pgsql/9.2/data/
>> ls: cannot access /var/lib/pgsql/9.2/data/postmaster.pid: No such file 
>> or directory total 56
>> drwx------ 7 postgres postgres    62 Jun 26 17:13 base
>> drwx------ 2 postgres postgres  4096 Aug 18 00:25 global
>> drwx------ 2 postgres postgres    17 Jun 26 09:54 pg_clog
>> -rw------- 1 postgres postgres  5127 Aug 17 16:24 pg_hba.conf
>> -rw------- 1 postgres postgres  1636 Jun 26 09:54 pg_ident.conf
>> drwx------ 2 postgres postgres  4096 Jul  2 00:00 pg_log
>> drwx------ 4 postgres postgres    34 Jun 26 09:53 pg_multixact
>> drwx------ 2 postgres postgres    17 Aug 18 00:23 pg_notify
>> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_serial
>> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_snapshots
>> drwx------ 2 postgres postgres     6 Aug 18 00:25 pg_stat_tmp
>> drwx------ 2 postgres postgres    17 Jun 26 09:54 pg_subtrans
>> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_tblspc
>> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_twophase
>> -rw------- 1 postgres postgres     4 Jun 26 09:53 PG_VERSION
>> drwx------ 3 postgres postgres  4096 Aug 18 00:25 pg_xlog
>> -rw------- 1 postgres postgres 19884 Aug 17 22:54 postgresql.conf
>> -rw------- 1 postgres postgres    71 Aug 18 00:23 postmaster.opts
>> ?????????? ? ?        ?            ?            ? postmaster.pid
>> -rw-r--r-- 1 postgres postgres   491 Aug 17 16:33 recovery.done
>> 
>> Have you been in this kind of curious situation before? Did you solve it
> somehow?
>> 
>> I will try to explain whole situation and how I got into it.
>> 
>> The scenario of redundant environment is in the "graphic" 
>> representation... (http://www.asciiflow.com/#4899844131549967831)
>> 
>>            +------------------------------------+
>>            |                          WAN                        |
>> +-----+-----+------------+                +-----v------+------------+
>> |pgpool      |                    |                |pgpool       |
> |
>> +------------+------------+                +------------+------------+
>> |pgsql         |pgsql          |                |pgsql          |pgsql
> |
>> +------------+------------+                +------------+------------+
>> |drbd-pri   |drbd-sec   |                |drbd-pri    |drbd-sec  |
>> +------------+------------+                +------------+------------+
>> |           pacemaker         |                |           pacemaker
> |
>> +-------------------------+                
>> +-------------------------+ +--------------------------+
>> |            corosync             |                |            corosync
> |
>> +------------+------------+                +------------+------------+
>> |node1       |node2        |                |node1       |node2       |
>> +------------+------------+                +------------+------------+
>>                    TC1
> TC2
>> 
>> In one moment there is only one postgresql active in each technical
> center. Pgpool is currently not managed by pacemaker, because I did want to
> test it. After it works I will make it managed by pacemaker using pgpool-ha
> resource agent.
>> 
>> Before streaming replication was established from TC1 to TC2, the
> migration of resources managed by pacemaker from node1 to node2 within TC1
> has been successful.
>> After I established streaming replication and tried to move resources
> (including pgsql) from node1 to node2, migration of postgres resource
> failed. And I ended up with aforementioned corrupted postmaster.pid file on
> the filesystem of node1. Pacemaker did actually kill postgres process but I
> think it somehow checks if the postmaster.pid still exists or not. If the
> pacemaker find postmaster.pid is still there it ends up with FAILED status.
>> Now I am stucked with this postmaster.pid file and cannot continue further
> with debugging. I cannot start postgres server because even if I start it
> there are two identical postmaster.pid files. These are not clean conditions
> for testing and investigating.
>> 
>> I would be grateful if I can get behind this issue. The day would be 
>> nicer then :-)
>> 
>> Best regards,
>> Michal Mistina
>> 
>>