[pgpool-general: 5858] Re: What about load balancing?

Tatsuo Ishii ishii at sraoss.co.jp
Tue Jan 16 15:07:53 JST 2018


> Hello!
> 
> We 100% microsoft-product company. Just start to build PostgreSQL HA
> Cluster based on:
> Ubuntu 16.04.3 (in docker) + PostgreSQL 10 + repmgr (for stream
> replication) + pgpool2 (version 3.7.1 - as load balance for one
> primary and two standby nodes) + pgbouncer (for connection pool).
> 
> Now we start load test for our solution (without pgbouncer) on
> i7-3770, 32GB RAM, SSD 240 Kingstone KC300.
> 
> Our changes in default "postgresql.conf":
>> listen_addresses = '*'
>> port = 5433
>> max_connections = 4096
>> shared_buffers = 4096MB
>> effective_cache_size = 16GB
>> checkpoint_timeout = 30min
>> work_mem = 1MB
>> maintenance_work_mem = 128MB
>> synchronous_commit = off
>> wal_level = hot_standby
>> wal_log_hints = on
>> wal_keep_segments = 0
>> max_wal_senders = 10
>> max_replication_slots = 5
>> hot_standby = on
>> archive_mode = on
>> archive_command = '/bin/true'
>> shared_preload_libraries = 'repmgr_funcs, pg_stat_statements'
> 
> Our changes in defauilt "pgpool.conf":
>> listen_addresses = '*'
>> pid_file_name = '/var/run/pgpool.pid'
>> logdir = '/var/log'
>>
>> backend_hostname0 = 'node1'
>> backend_port0 = .*/backend_port0 = 5433
>> backend_data_directory0 = '/var/lib/postgresql/10/main/base'
>> backend_flag0 = 'ALLOW_TO_FAILOVER'
>> backend_hostname1 = 'node2'
>> backend_port1 = 5433
>> backend_weight1 = 1
>> backend_data_directory1 = '/var/lib/postgresql/10/main/base'
>> backend_flag1 = 'ALLOW_TO_FAILOVER'
>> backend_hostname2 = 'node3'
>> backend_port2 = 5433
>> backend_weight2 = 1
>> backend_data_directory2 = '/var/lib/postgresql/10/main/base'
>> backend_flag2 = 'ALLOW_TO_FAILOVER'
>>
>> enable_pool_hba = on
>> authentication_timeout = 30
>> num_init_children = 64 (set "net.core.somaxconn = 65535" in
>> /etc/sysctl.conf)
>> max_pool = 32
>> log_destination = 'stderr,syslog'
>> insert_lock = off
>> load_balance_mode = on
>> black_function_list = 'nextval,setval'
>> master_slave_mode = on
>> master_slave_sub_mode = 'stream'
>> sr_check_period = 0
>> sr_check_user = 'repmgr'
>> health_check_period = 10
>> health_check_user = 'repmgr'
>> health_check_max_retries = 5
>> health_check_retry_delay = 5
>> failover_command = 'echo \">>> Failover - that will initialize new
>> primary node search\"'
>> fail_over_on_backend_error = off
>> search_primary_node_timeout = 0
> 
> With pgbench we emulate SELECT workload on nodes (no pgpool):
>> ## pgbench -h nodeX -p 5433 -c 64 -j 8 -T 60 -S
>> ## (standby -> master)
>> # number of transactions actually processed: 2733742
>> # latency average = 1.405 ms
>> # tps = 45551.063683 (including connections establishing)
>> # tps = 45559.700752 (excluding connections establishing)
>> ## (standby -> standby)
>> # number of transactions actually processed: 2712321
>> # latency average = 1.416 ms
>> # tps = 45199.740907 (including connections establishing)
>> # tps = 45212.390333 (excluding connections establishing)
>> ## (standby -> local)
>> # number of transactions actually processed: 3161213
>> # latency average = 1.215 ms
>> # tps = 52675.138511 (including connections establishing)
>> # tps = 52686.008651 (excluding connections establishing)
> Then we give workload in pgpool node, we have half of one node result:
>> pgbench -h pool -p 9999 -U postgres -c 64 -j 8 -T 60 -S
>> starting vacuum...end.
>> transaction type: <builtin: select only>
>> scaling factor: 1
>> query mode: simple
>> number of clients: 64
>> number of threads: 8
>> duration: 60 s
>> number of transactions actually processed: 1413496
>> latency average = 2.719 ms
>> tps = 23535.490222 (including connections establishing)
>> tps = 23543.872064 (excluding connections establishing)
> With command
>>  psql -h pool -p 9999 -c "show pool_nodes"
> and resource monitoring we see what all nodes use in workload.
> 
> We understand about hardware limitations, and check work with cpu
> limitation - 2 core per node (pgpool no limitation).
> Results:
>> ### node3 => localhost
>> transaction type: <builtin: select only>
>> scaling factor: 1
>> query mode: simple
>> number of clients: 64
>> number of threads: 8
>> duration: 60 s
>> number of transactions actually processed: 1116951
>> latency average = 3.439 ms
>> tps = 18611.745016 (including connections establishing)
>> tps = 18616.499012 (excluding connections establishing)
>> ### node3 => node2
>> transaction type: <builtin: select only>
>> scaling factor: 1
>> query mode: simple
>> number of clients: 64
>> number of threads: 8
>> duration: 60 s
>> number of transactions actually processed: 838843
>> latency average = 4.579 ms
>> tps = 13977.347353 (including connections establishing)
>> tps = 13985.630946 (excluding connections establishing)
> Best result for pgpool - tps = 23 000.

> What we do wrong?

Pgpool-II needs to read packets from a socket and then write them to a
socket as it's working as a proxy. Since pgbench -S produces pretty
subtle load (just selecting one row using a b-tree index), the test
pretty much focuses on the proxy overhead of Pgpool-II. That's why you
see the number. If you do a bench marking using more realistic load,
you may see different results.


> Maybe pgpool have good configuration for work only
> in load balancing mode?

Yes. I guess you are using raw mode (master_slave_mode = off and
replication_mode = off), the actually Pgpool-II uses only one
PostgreSQL

Also if you run all of Pgpool-II and PostgreSQL servers on a same
physical server using docker, even load balancing will not help
because you do not physically distribute loads to different servers.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


More information about the pgpool-general mailing list