[pgpool-general: 3241] Re: pgpool with Slony replication delay

Tue Oct 21 00:14:44 JST 2014

Hi Tatsuo,

We have two independent replication clusters.  One system has a single
master/slave setup in our primary data center.  The other has a master with
two slaves.  Both systems replicate to a colocation data center, with one
big server acting as the slave for both masters.  (The connection to the
co-lo is over gig fiber, so our latency remains low)

So, that's partially why Slony comes into play.  We need that fine grained
control over replication.  The second reason, is that our write traffic is
47% of our total db traffic.  (And that's a separate, long, shameful
story)  But the fact is, pg's streaming replication had too much overhead
from copying the WAL files around, and choked up our servers during peak
write traffic.  We also had Enterprise DB do a consulting job for us, and
they came to the conclusion that the way we're doing it is the only
reliable way.

So I think that circles around back to square one, of being able to direct
specific select queries to the master under certain conditions.  I really
like your new feature that adds that ability, but would it be possible to
trigger that condition with a client SET variable in the same client
connection?  Something like:

  SET force_master = on;
  SELECT * from hot_data;
  SET force_master = off;
  SELECT * from static_data;

We have a mix of clients with various abilities, but creating a separate
database connection for the master node, and a separate connection for the
slave nodes is going to be problematic.  The main feature I like about
pgpool is having one single client connection work on the replicated
cluster.  A single SET command on that connection to control the master
logic is feasible to implement in our system.  We can identify the hot
queries and add it in manually.  But two separate connections requires a
lot more work.  Is something like that possible to do in pgpool?

Thank you for any advice you have!

--Cal

On Fri, Oct 17, 2014 at 6:53 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:

> Using pgpool-II's own replication mode is an idea. However it has it's
> own downside: write performance could drop up to 50% of single
> PostgreSQL, you may have difficulties with certain queries (see manual
> for more details). I recommend you to test before switching to the
> mode.
>
> BTW why do you use slony? From my experience PostgreSQL's streaming
> replication shows better performance than slony.
>
> If you are going to use streaming replication, you may be interested
> in upcoming pgpool-II 3.4's new feature: more control on load
> balancing. From 3.4 doc:
>
> database_redirect_preference_list
>     you can set "database name:node id" pair to specify the node id
>     when connecting to the database. For example, by specifying
>     "test:1", pgpool-II always redirects SELECT to node 1 in case of
>     connecting to database "test". You can specify multiple "database
>     name:node id" pair by separating them using comma (,). Regular
>     expressions are accepted for database name. Special keyword
>     "primary" indicates the primary node and "standby" indicates one
>     of standby nodes.
>
>     Here is an example.
>
>     database_redirect_preference_list =
> 'postgres:primary,mydb[01]:1,mydb2:standby'
>
>
>     SELECTs will be redirected to primary if you connect to postgres
>     database. Connecting to mydb0 or mydb1 will redirect SELECTs to
>     node 1. Connecting to mydb2 will redirect SELECTs to one of
>     standby nodes.
>
>     You need to reload pgpool.conf if you change this directive.
>
> app_name_redirect_preference_list
>
>     you can set "application name:node id" pair to specify the node id
>     when the application is used. "Application name" is a name
>     specified by a client when it connects to database. You can use it
>     in PostgreSQL 9.0 or later. For example, application of psql
>     command is "psql". pgpool-II recognize application names only when
>     clients sends a start up packet. Clients can send application
>     names later on but pgpool-II will not recognize them.
>
>     The notion of app_name_redirect_preference_list is same as
>     database_redirect_preference_list. Thus you can use regular
>     expressions for application name.
>
>     Here is an example.
>
>     app_name_redirect_preference_list =
> 'psql:primary,myapp1:1,myapp2:standby'
>
>
>     In this example, psql sends SELECTs to primary node, myapp1 sends
>     to node 1, and myapp2 sends to one of standby nodes.
>
>     app_name_redirect_preference_list takes precedence over
>     database_redirect_preference_list. See the next example.
>
>                 database_redirect_preference_list = 'bigdb:primary'
>                 app_name_redirect_preference_list = 'myapp:2'
>
>
>     Applications connecting to bigdb database send SELECTs to primary
>     node. However myapp sends SELECTs to node 2 even if it connects to
>     bigdb. This is useful in a scenario: myapp2 sends very heavy
>     SELECTs to execute analysis jobs. You want to use node 2 solely
>     for analysis purpose.
>
>     You need to reload pgpool.conf if you change this directive.
>
>
> > Thank you Tatsuo,
> >
> > What solution is there for implementing HA in such a scenario?  Threaded
> > requests, to threaded client connections, to a master/slave replication
> > system.  Would I have to use pgpool's Replication Mode and ditch the
> Slony
> > approach?
> >
> > --Cal
> >
> > <cal at fbsdata.com>
> >
> > On Thu, Oct 16, 2014 at 6:06 PM, Tatsuo Ishii <ishii at postgresql.org>
> wrote:
> >
> >> > Hi everyone,
> >> >
> >> > I have encountered a problem with pgpool using Slony for replication
> to
> >> > slaves.  The client to pgpool in my case is an API, so requests come
> in
> >> > concurrently on independent connections.  If we have an insert
> directly
> >> > followed by a select for that record, the select is load balanced
> between
> >> > the master and slave, resulting in a 50% chance of a failure.  Since
> the
> >> > select ends up coming from a separate httpd child process, there is no
> >> way
> >> > to keep the insert and select in the same transaction.
> >> >
> >> > Am I missing something in the documentation and configs?  Can pgpool
> >> > intelligently recognize specific records that have been inserted to a
> >> > table, then direct all selects for that record to the master until
> Slony
> >> > has replicated it to the slaves?
> >>
> >> No. That's the limitaion of Slony (more presicely, asynchronous
> >> replication systems include streaming replication).
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20141020/6b1290e9/attachment.html>