[pgpool-general: 2049] Re: Suggestion about two datacenters connected through WAN

Mon Aug 19 14:30:36 JST 2013

Yes, I am using DRBD. But there is a layer - pacemaker, which should handle
DRBD.
It works on what they called "resources". As a resource we understand
service. In my case the services are: VirtualIP address, PostgreSQL,
FileSystem (XFS), DRBD, LVM. Pacemaker controls the order of stopping and
starting services on each node. The order of stopping services in my
environment is like this (each node): Turn off VirtualIP address -> Stop
PostgreSQL -> Dismount Filesystem -> Dismount LVM Virtual Groups
. The DRBD device was in primary role on node 1. As the last thing (after
everyghing is stopped) the DRBD is promoted to primary role on node 2 and
degraded to secondary role on node 1. Then each service in reverse order is
started on the node 2.

This worked until I attached streaming replication slave.

I read, the pgpool can run also as the resource, so it is started as the
last service after PostgreSQL.

We needed some level of the "dumb resistence" which means if somebody
accidentally turn off or reboot one node, the services automatically start
on the node 2. If somebody then reboots node 2 and node 1 is up services are
automatically booted on node 1.

Do you think 4 nodes can be handled by pgpool with some level of that "dumb
resistence"? I saw pgpool has the option to set up weight on nodes. Is it a
good option to set which servers are in the primary technical center?
TC1: node 1, node 2
TC2: node 3, node 4

Streaming replication:
Node 1 (master) -> Node 2, 3, 4 (slaves)

Pgpool:
Node 1: pgpool / postgresql (R/W)
Node 2: pgpool / postgresql (R)
Node 3: pgpool / postgresql (R)
Node 4: pgpool / postgresql (R) 

Best regards,
Michal Mistina
-----Original Message-----
From: Tatsuo Ishii [mailto:ishii at postgresql.org] 
Sent: Monday, August 19, 2013 12:28 AM
To: Mistina Michal
Cc: pgpool-general at pgpool.net
Subject: Re: [pgpool-general: 2029] Suggestion about two datacenters
connected through WAN

I'm not familiar with pacemaker at all, so this is just a guess. Do you sync
PostgreSQL database cluster using DRBD? If so, I think you should not do
that. PostgreSQL modifies the database through the file system mounted and
in the mean time DRBD modifies the file system, which would lead to
corruption.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hi Tatsuo.
> 
>> What kind of problem do yo have with PostgreSQL streaming replication?
> 
> I don't know if this is the right forum, but maybe you can help me. The
issue does not concern directly pgpool, but I'd like to use pgpool if I
solve this one. I already wrote to pgsql-general. Nobody answered yet.
> 
> In summary, the main issue rises after I set up streaming replication - I
am unable to stop postgresql service correctly on master. After issuing
/etc/init.d/postgresql-9.2 stop the postmaster.pid remains on the filesystem
and moreover it is corrupted. I am unable to delete it with rm command.
> 
> It looks like this:
> [root at tstcaps01 ~]# ll /var/lib/pgsql/9.2/data/
> ls: cannot access /var/lib/pgsql/9.2/data/postmaster.pid: No such file 
> or directory total 56
> drwx------ 7 postgres postgres    62 Jun 26 17:13 base
> drwx------ 2 postgres postgres  4096 Aug 18 00:25 global
> drwx------ 2 postgres postgres    17 Jun 26 09:54 pg_clog
> -rw------- 1 postgres postgres  5127 Aug 17 16:24 pg_hba.conf
> -rw------- 1 postgres postgres  1636 Jun 26 09:54 pg_ident.conf
> drwx------ 2 postgres postgres  4096 Jul  2 00:00 pg_log
> drwx------ 4 postgres postgres    34 Jun 26 09:53 pg_multixact
> drwx------ 2 postgres postgres    17 Aug 18 00:23 pg_notify
> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_serial
> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_snapshots
> drwx------ 2 postgres postgres     6 Aug 18 00:25 pg_stat_tmp
> drwx------ 2 postgres postgres    17 Jun 26 09:54 pg_subtrans
> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_tblspc
> drwx------ 2 postgres postgres     6 Jun 26 09:53 pg_twophase
> -rw------- 1 postgres postgres     4 Jun 26 09:53 PG_VERSION
> drwx------ 3 postgres postgres  4096 Aug 18 00:25 pg_xlog
> -rw------- 1 postgres postgres 19884 Aug 17 22:54 postgresql.conf
> -rw------- 1 postgres postgres    71 Aug 18 00:23 postmaster.opts
> ?????????? ? ?        ?            ?            ? postmaster.pid
> -rw-r--r-- 1 postgres postgres   491 Aug 17 16:33 recovery.done
> 
> Have you been in this kind of curious situation before? Did you solve it
somehow?
> 
> I will try to explain whole situation and how I got into it.
> 
> The scenario of redundant environment is in the "graphic" 
> representation... (http://www.asciiflow.com/#4899844131549967831)
> 
>            +------------------------------------+
>            |                          WAN                        |
> +-----+-----+------------+                +-----v------+------------+
> |pgpool      |                    |                |pgpool       |
|
> +------------+------------+                +------------+------------+
> |pgsql         |pgsql          |                |pgsql          |pgsql
|
> +------------+------------+                +------------+------------+
> |drbd-pri   |drbd-sec   |                |drbd-pri    |drbd-sec  |
> +------------+------------+                +------------+------------+
> |           pacemaker         |                |           pacemaker
|
> +-------------------------+                
> +-------------------------+ +--------------------------+
> |            corosync             |                |            corosync
|
> +------------+------------+                +------------+------------+
> |node1       |node2        |                |node1       |node2       |
> +------------+------------+                +------------+------------+
>                    TC1
TC2
> 
> In one moment there is only one postgresql active in each technical
center. Pgpool is currently not managed by pacemaker, because I did want to
test it. After it works I will make it managed by pacemaker using pgpool-ha
resource agent.
> 
> Before streaming replication was established from TC1 to TC2, the
migration of resources managed by pacemaker from node1 to node2 within TC1
has been successful.
> After I established streaming replication and tried to move resources
(including pgsql) from node1 to node2, migration of postgres resource
failed. And I ended up with aforementioned corrupted postmaster.pid file on
the filesystem of node1. Pacemaker did actually kill postgres process but I
think it somehow checks if the postmaster.pid still exists or not. If the
pacemaker find postmaster.pid is still there it ends up with FAILED status.
> Now I am stucked with this postmaster.pid file and cannot continue further
with debugging. I cannot start postgres server because even if I start it
there are two identical postmaster.pid files. These are not clean conditions
for testing and investigating.
> 
> I would be grateful if I can get behind this issue. The day would be 
> nicer then :-)
> 
> Best regards,
> Michal Mistina
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3076 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130819/feceb99f/attachment.p7s>