[pgpool-general: 1805] Re: insert_lock hangs

Tatsuo Ishii ishii at postgresql.org
Thu May 30 14:40:19 JST 2013


Thanks for the report. I found a nasty bug with pgpool-II. Please try
attached patches.

When insert lock is enabled and pgpool_catalog.insert_lock exists,
pgpool-II looks for the row which matches the oid of the target
table. If non, pgpool-II will insert the row to obtain a row lock.
The bug was in the process looking for the row. If extended protocol
is used like your perl script, pgpool-II was waiting for the row data
forever which will never come because there's no such a row in the
table. Attached patch considers the case.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hi!
> 
> Sorry, it took some time to further isolate the problem, but now I have
> a testcase!
> 
> Nozomi Anzai wrote:
>> Could you tell us some hints to reproduce this?
> 
> pgpool with two nodes,
> pgpool running on node 0, node 1 offline because of the current problem.
> 
> pgpool-II version 3.2.4 under Debian 6.0.6
> postgresql 9.1.9 under Debian 6.0.6
> 
>> - pgpool.conf
>>   - The mode do you use (replication mode, master slave mode with SR, etc.)
>>   - Using load balancing or not
> 
> replication_mode = true
> load_balance_mode = true
> replicate_select = false
> white_function_list = ''
> black_function_list = 'nextval,setval'
> insert_lock = true
> lobj_lock_table = ''
> 
> enable_pool_hba = true
> master_slave_mode = false
> parallel_mode = false
> 
> The table pgpool_catalog.insert_lock exists in all our databases
> (including template1), created by insert_lock.sql.
> 
>> - Steps you executed
> 
> As a test scenario, I created the following two tables:
> 
> test=# \d kd_test_serial_pgpool
>                           Table "public.kd_test_serial_pgpool"
>  Column  |  Type   |                             Modifiers
> ---------+---------+--------------------------------------------------------------------
>  id      | integer | not null default
>          .         . nextval('kd_test_serial_pgpool_id_seq'::regclass)
>  content | text    |
> 
> test=# \d kd_test_nonserial_pgpool
> Table "public.kd_test_nonserial_pgpool"
>  Column  |  Type   | Modifiers
> ---------+---------+-----------
>  id      | integer |
>  content | text    |
> 
> Inserting stuff into kd_test_serial_pgpool should (and does) grab a lock
> on pgpool_catalog.insert_lock, while doing so for
> kd_test_nonserial_pgpool should (and does) not.
> 
> The inserting script is written in perl using prepared queries as
> follows (line wrapping caused by MUA):
> 
> 
> #!/usr/bin/perl
> use strict;
> use DBI;
> my $dbh;
> my $sth;
> $dbh = DBI->connect('DBI:Pg:dbname=XXX;host=XXX','XXX','XXX');
> if ($dbh)
> {
>   $sth = $dbh->prepare("insert into kd_test_serial_pgpool(content)
> values (?)");      # hangs
> #  $sth = $dbh->prepare("insert into kd_test_nonserial_pgpool(content)
> values (?)");   # does not hang
>   $sth->execute('perl lkhzfjktfjktf');
>   $sth->finish;
> }
> $dbh->disconnect();
> 
> 
> Using the "insert into kd_test_serial_pgpool" line will hang
> indefinitely (I let it hang for about 20 days!) on calling
> $sth->execute, while "insert into kd_test_nonserial_pgpool" won't.
> (For log file content see below, as posted previously.)
> 
> 
> I made sure that no actual lock was set on insert_lock before running
> the test by checking the following query (and killing threads via select
> pg_terminate_backend($PID); as needed):
> 
> select
> l.pid,a.datname,a.usename,t.relname,l.locktype,l.mode,l.granted,a.waiting,a.query_start,a.backend_start,a.current_query
> from pg_locks l, pg_stat_all_tables t,pg_stat_activity a where
> l.relation=t.relid and l.pid=a.procpid order by datname,relname,granted
> desc,query_start;
> 
> 
> I tested the perl script on several systems, with equal behaviour:
> - Debian 4.0 (32 bit), perl 5.8.8-7etch6, libdbi-perl 1.53-1etch1
> - Debian 6.0.7 (32 bit), perl 5.10.1-17squeeze6, libdbi-perl 1.612-1
> - Kubuntu 12.10 (64 bit), perl 5.14.2-13ubuntu0.2, libdbi-perl 1.622-1
> 
> 
> Log excerpt as posted previously (different tables, but same symptom):
>>> pgpool.log:
>>> > 2013-05-06 14:22:02.235 CEST [13003] [unknown] 00000 LOG:  duration: 0.035 ms  statement: BEGIN
>>> > 2013-05-06 14:22:02.235 CEST [13003] [unknown] 00000 LOG:  duration: 0.070 ms  parse <unnamed>: 
>>> >                     DELETE FROM sessions
>>> >                     WHERE session_id = $1
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 LOG:  duration: 0.088 ms  bind <unnamed>: 
>>> >                     DELETE FROM sessions
>>> >                     WHERE session_id = $1
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 DETAIL:  parameters: $1 = '1eae4e7eb3f0adf2e2bc3f50f1446530f'
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 LOG:  duration: 0.063 ms  execute <unnamed>: 
>>> >                     DELETE FROM sessions
>>> >                     WHERE session_id = $1
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 DETAIL:  parameters: $1 = '1eae4e7eb3f0adf2e2bc3f50f1446530f'
>>> > 2013-05-06 14:22:02.236 CEST [13003] [unknown] 00000 LOG:  duration: 0.024 ms  statement: COMMIT
>>> > 2013-05-06 14:22:02.238 CEST [13003] [unknown] 00000 LOG:  duration: 0.426 ms  parse pgpool12980: SELECT count(*) FROM pg_catalog.pg_attrdef AS d, pg_catalog.pg_class AS c WHERE d.adrelid = c.oid AND d.adsrc ~ 'nextval' AND c.oid = pgpool_regclass('sessions')
>>> > 2013-05-06 14:22:02.238 CEST [13003] [unknown] 00000 LOG:  duration: 0.054 ms  bind pgpool12980/pgpool12980: SELECT count(*) FROM pg_catalog.pg_attrdef AS d, pg_catalog.pg_class AS c WHERE d.adrelid = c.oid AND d.adsrc ~ 'nextval' AND c.oid = pgpool_regclass('sessions')
>>> > 2013-05-06 14:22:02.239 CEST [13003] [unknown] 00000 LOG:  duration: 0.993 ms  execute pgpool12980/pgpool12980: SELECT count(*) FROM pg_catalog.pg_attrdef AS d, pg_catalog.pg_class AS c WHERE d.adrelid = c.oid AND d.adsrc ~ 'nextval' AND c.oid = pgpool_regclass('sessions')
>>> > 2013-05-06 14:22:02.240 CEST [13003] [unknown] 00000 LOG:  duration: 0.757 ms  parse pgpool12980: SELECT attname, d.adsrc, coalesce((d.adsrc LIKE '%now()%' OR d.adsrc LIKE '%''now''::text%') AND (a.atttypid = 'timestamp'::regtype::oid OR a.atttypid = 'timestamp with time zone'::regtype::oid OR a.atttypid = 'date'::regtype::oid OR a.atttypid = 'time'::regtype::oid OR a.atttypid = 'time with time zone'::regtype::oid) , false) FROM pg_catalog.pg_class c, pg_catalog.pg_attribute a  LEFT JOIN pg_catalog.pg_attrdef d ON (a.attrelid = d.adrelid AND a.attnum = d.adnum) WHERE c.oid = a.attrelid AND a.attnum >= 1 AND a.attisdropped = 'f' AND c.oid = pgpool_regclass('sessions') ORDER BY a.attnum
>>> > 2013-05-06 14:22:02.240 CEST [13003] [unknown] 00000 LOG:  duration: 0.087 ms  bind pgpool12980/pgpool12980: SELECT attname, d.adsrc, coalesce((d.adsrc LIKE '%now()%' OR d.adsrc LIKE '%''now''::text%') AND (a.atttypid = 'timestamp'::regtype::oid OR a.atttypid = 'timestamp with time zone'::regtype::oid OR a.atttypid = 'date'::regtype::oid OR a.atttypid = 'time'::regtype::oid OR a.atttypid = 'time with time zone'::regtype::oid) , false) FROM pg_catalog.pg_class c, pg_catalog.pg_attribute a  LEFT JOIN pg_catalog.pg_attrdef d ON (a.attrelid = d.adrelid AND a.attnum = d.adnum) WHERE c.oid = a.attrelid AND a.attnum >= 1 AND a.attisdropped = 'f' AND c.oid = pgpool_regclass('sessions') ORDER BY a.attnum
>>> > 2013-05-06 14:22:02.241 CEST [13003] [unknown] 00000 LOG:  duration: 0.787 ms  execute pgpool12980/pgpool12980: SELECT attname, d.adsrc, coalesce((d.adsrc LIKE '%now()%' OR d.adsrc LIKE '%''now''::text%') AND (a.atttypid = 'timestamp'::regtype::oid OR a.atttypid = 'timestamp with time zone'::regtype::oid OR a.atttypid = 'date'::regtype::oid OR a.atttypid = 'time'::regtype::oid OR a.atttypid = 'time with time zone'::regtype::oid) , false) FROM pg_catalog.pg_class c, pg_catalog.pg_attribute a  LEFT JOIN pg_catalog.pg_attrdef d ON (a.attrelid = d.adrelid AND a.attnum = d.adnum) WHERE c.oid = a.attrelid AND a.attnum >= 1 AND a.attisdropped = 'f' AND c.oid = pgpool_regclass('sessions') ORDER BY a.attnum
>>> > 2013-05-06 14:22:02.241 CEST [13003] [unknown] 00000 LOG:  duration: 0.026 ms  statement: BEGIN
>>> > 2013-05-06 14:22:02.263 CEST [13003] [unknown] 00000 LOG:  duration: 21.331 ms  parse pgpool12980: SELECT count(*) FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON (c.relnamespace = n.oid) WHERE nspname = 'pgpool_catalog' AND relname = 'insert_lock'
>>> > 2013-05-06 14:22:02.263 CEST [13003] [unknown] 00000 LOG:  duration: 0.067 ms  bind pgpool12980/pgpool12980: SELECT count(*) FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON (c.relnamespace = n.oid) WHERE nspname = 'pgpool_catalog' AND relname = 'insert_lock'
>>> > 2013-05-06 14:22:02.263 CEST [13003] [unknown] 00000 LOG:  duration: 0.038 ms  execute pgpool12980/pgpool12980: SELECT count(*) FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON (c.relnamespace = n.oid) WHERE nspname = 'pgpool_catalog' AND relname = 'insert_lock'
>>> > 2013-05-06 14:22:02.264 CEST [13003] [unknown] 00000 LOG:  duration: 0.231 ms  parse pgpool12980: SELECT 1 FROM pgpool_catalog.insert_lock WHERE reloid = pgpool_regclass('sessions') FOR UPDATE
>>> > 2013-05-06 14:22:02.264 CEST [13003] [unknown] 00000 LOG:  duration: 0.044 ms  bind pgpool12980/pgpool12980: SELECT 1 FROM pgpool_catalog.insert_lock WHERE reloid = pgpool_regclass('sessions') FOR UPDATE
>>> > 2013-05-06 14:22:02.267 CEST [13003] [unknown] 00000 LOG:  duration: 3.819 ms  execute pgpool12980/pgpool12980: SELECT 1 FROM pgpool_catalog.insert_lock WHERE reloid = pgpool_regclass('sessions') FOR UPDATE
>>> (… then nothing happens in this thread anymore.)
>>> 
>>> 
>>> The original hanging query isn't logged by pgpool, not even when turning
>>> on pgpool's extensive debug output, but running OTRS against postgresql
>>> directly (without pgpool) shows this as the offending query:
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 LOG:  duration: 0.068 ms  parse <unnamed>: INSERT INTO sessions (session_id, data_key, data_value, serialized) VALUES ($1,$2,$3,$4)
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 LOG:  duration: 0.045 ms  bind <unnamed>: INSERT INTO sessions (session_id, data_key, data_value, serialized) VALUES ($1,$2,$3,$4)
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 DETAIL:  parameters: $1 = '11a98dce334f5dd318594a76edf08819b', $2 = 'LastScreenOverview', $3 = 'Action=AgentTicketQueue;QueueID=9;View=', $4 = '0'
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 LOG:  duration: 0.106 ms  execute <unnamed>: INSERT INTO sessions (session_id, data_key, data_value, serialized) VALUES ($1,$2,$3,$4)
>>> > 2013-05-06 14:47:10.487 CEST [14141] [unknown] 00000 DETAIL:  parameters: $1 = '11a98dce334f5dd318594a76edf08819b', $2 = 'LastScreenOverview', $3 = 'Action=AgentTicketQueue;QueueID=9;View=', $4 = '0'
>>> 
>>> The context of this query suggests that no other query is using/blocking
>>> pgpool's insert_lock table at this moment, especially not for
>>> minutes/hours, hence I'm quite at a loss here.
> 
> 
> Karsten
> _______________________________________________
> pgpool-general mailing list
> pgpool-general at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-general
-------------- next part --------------
A non-text attachment was scrubbed...
Name: insert_lock_hangs.diff
Type: text/x-patch
Size: 617 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-general/attachments/20130530/249d961a/attachment.bin>


More information about the pgpool-general mailing list