[pgpool-general: 3242] Re: pgpool with Slony replication delay

Tue Oct 21 08:45:39 JST 2014

> Hi Tatsuo,
> 
> We have two independent replication clusters.  One system has a single
> master/slave setup in our primary data center.  The other has a master with
> two slaves.  Both systems replicate to a colocation data center, with one
> big server acting as the slave for both masters.  (The connection to the
> co-lo is over gig fiber, so our latency remains low)
> 
> So, that's partially why Slony comes into play.  We need that fine grained
> control over replication.  The second reason, is that our write traffic is
> 47% of our total db traffic.  (And that's a separate, long, shameful
> story)  But the fact is, pg's streaming replication had too much overhead
> from copying the WAL files around, and choked up our servers during peak
> write traffic.  We also had Enterprise DB do a consulting job for us, and
> they came to the conclusion that the way we're doing it is the only
> reliable way.

Interesting. Our testing (and other companies testing conducted by
Hitachi corp.) showed that streaming replication is pretty efficient
because it sends WAL records rather than WAL files unless streaming
slave cannot catch up and sending WAL files is the only way to catch
up. Maybe in your case the write traffic is so heavy and the slave
cannot catch up.

> So I think that circles around back to square one, of being able to direct
> specific select queries to the master under certain conditions.  I really
> like your new feature that adds that ability, but would it be possible to
> trigger that condition with a client SET variable in the same client
> connection?  Something like:
> 
>   SET force_master = on;
>   SELECT * from hot_data;
>   SET force_master = off;
>   SELECT * from static_data;
> 
> We have a mix of clients with various abilities, but creating a separate
> database connection for the master node, and a separate connection for the
> slave nodes is going to be problematic.  The main feature I like about
> pgpool is having one single client connection work on the replicated
> cluster.  A single SET command on that connection to control the master
> logic is feasible to implement in our system.  We can identify the hot
> queries and add it in manually.  But two separate connections requires a
> lot more work.  Is something like that possible to do in pgpool?

In the first place pgpool-II does not have SET commands at all and we
need to implement SET mechanism in the first place. However I think
your idea is not terribly hard and it will be doable. Will be added to
the TODO list.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Thank you for any advice you have!
> 
> --Cal
> 
> 
> On Fri, Oct 17, 2014 at 6:53 PM, Tatsuo Ishii <ishii at postgresql.org> wrote:
> 
>> Using pgpool-II's own replication mode is an idea. However it has it's
>> own downside: write performance could drop up to 50% of single
>> PostgreSQL, you may have difficulties with certain queries (see manual
>> for more details). I recommend you to test before switching to the
>> mode.
>>
>> BTW why do you use slony? From my experience PostgreSQL's streaming
>> replication shows better performance than slony.
>>
>> If you are going to use streaming replication, you may be interested
>> in upcoming pgpool-II 3.4's new feature: more control on load
>> balancing. From 3.4 doc:
>>
>> database_redirect_preference_list
>>     you can set "database name:node id" pair to specify the node id
>>     when connecting to the database. For example, by specifying
>>     "test:1", pgpool-II always redirects SELECT to node 1 in case of
>>     connecting to database "test". You can specify multiple "database
>>     name:node id" pair by separating them using comma (,). Regular
>>     expressions are accepted for database name. Special keyword
>>     "primary" indicates the primary node and "standby" indicates one
>>     of standby nodes.
>>
>>     Here is an example.
>>
>>     database_redirect_preference_list =
>> 'postgres:primary,mydb[01]:1,mydb2:standby'
>>
>>
>>     SELECTs will be redirected to primary if you connect to postgres
>>     database. Connecting to mydb0 or mydb1 will redirect SELECTs to
>>     node 1. Connecting to mydb2 will redirect SELECTs to one of
>>     standby nodes.
>>
>>     You need to reload pgpool.conf if you change this directive.
>>
>> app_name_redirect_preference_list
>>
>>     you can set "application name:node id" pair to specify the node id
>>     when the application is used. "Application name" is a name
>>     specified by a client when it connects to database. You can use it
>>     in PostgreSQL 9.0 or later. For example, application of psql
>>     command is "psql". pgpool-II recognize application names only when
>>     clients sends a start up packet. Clients can send application
>>     names later on but pgpool-II will not recognize them.
>>
>>     The notion of app_name_redirect_preference_list is same as
>>     database_redirect_preference_list. Thus you can use regular
>>     expressions for application name.
>>
>>     Here is an example.
>>
>>     app_name_redirect_preference_list =
>> 'psql:primary,myapp1:1,myapp2:standby'
>>
>>
>>     In this example, psql sends SELECTs to primary node, myapp1 sends
>>     to node 1, and myapp2 sends to one of standby nodes.
>>
>>     app_name_redirect_preference_list takes precedence over
>>     database_redirect_preference_list. See the next example.
>>
>>                 database_redirect_preference_list = 'bigdb:primary'
>>                 app_name_redirect_preference_list = 'myapp:2'
>>
>>
>>     Applications connecting to bigdb database send SELECTs to primary
>>     node. However myapp sends SELECTs to node 2 even if it connects to
>>     bigdb. This is useful in a scenario: myapp2 sends very heavy
>>     SELECTs to execute analysis jobs. You want to use node 2 solely
>>     for analysis purpose.
>>
>>     You need to reload pgpool.conf if you change this directive.
>>
>>
>> > Thank you Tatsuo,
>> >
>> > What solution is there for implementing HA in such a scenario?  Threaded
>> > requests, to threaded client connections, to a master/slave replication
>> > system.  Would I have to use pgpool's Replication Mode and ditch the
>> Slony
>> > approach?
>> >
>> > --Cal
>> >
>> > <cal at fbsdata.com>
>> >
>> > On Thu, Oct 16, 2014 at 6:06 PM, Tatsuo Ishii <ishii at postgresql.org>
>> wrote:
>> >
>> >> > Hi everyone,
>> >> >
>> >> > I have encountered a problem with pgpool using Slony for replication
>> to
>> >> > slaves.  The client to pgpool in my case is an API, so requests come
>> in
>> >> > concurrently on independent connections.  If we have an insert
>> directly
>> >> > followed by a select for that record, the select is load balanced
>> between
>> >> > the master and slave, resulting in a 50% chance of a failure.  Since
>> the
>> >> > select ends up coming from a separate httpd child process, there is no
>> >> way
>> >> > to keep the insert and select in the same transaction.
>> >> >
>> >> > Am I missing something in the documentation and configs?  Can pgpool
>> >> > intelligently recognize specific records that have been inserted to a
>> >> > table, then direct all selects for that record to the master until
>> Slony
>> >> > has replicated it to the slaves?
>> >>
>> >> No. That's the limitaion of Slony (more presicely, asynchronous
>> >> replication systems include streaming replication).
>> >>
>> >> Best regards,
>> >> --
>> >> Tatsuo Ishii
>> >> SRA OSS, Inc. Japan
>> >> English: http://www.sraoss.co.jp/index_en.php
>> >> Japanese:http://www.sraoss.co.jp
>> >>
>>