[pgpool-hackers: 3136] Re: New feature proposal: shared relation cache

Mon Nov 19 13:46:22 JST 2018

An anxiety of this approach is lock contention. Since we don't have a
shared lock (yet), before accessing storage of query cache we need to
acquire an exclusive lock. This could happen every time each pgpool
child checks whether the data is already in cache. To reduce the lock
contention, we could copy the cache contents to local memory for the
first time. Subsequent access to the local cache will not need any
locking of course.

My plan is, firstly implement this without cache, then do a bench
marking. If the result of the bench marking is disappointing, I will
add the local cache.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> * Proposal background
> 
> Unlike PostgreSQL Pgpool-II cannot directly access system catalogs, it
> issues SQL to PostgreSQL system catalog whenever necessary. For
> example to execute a simple query
> 
> select count(*) from pgbench_accounts;
> 
> Pgpool-II issues 9 SELECTs to system catalogs:
> 
> 2018-11-15 12:57:34: pid 14885: LOG:  DB node id: 0 backend pid: 14904 statement: SELECT count(*) FROM pg_catalog.pg_proc AS p WHERE p.proname = 'count' AND p.provolatile = 'i'
> 2018-11-15 12:57:34: pid 14885: LOG:  DB node id: 0 backend pid: 14904 statement: SELECT count(*) FROM pg_catalog.pg_class AS c, pg_attribute AS a WHERE c.relname = 'pg_class' AND a.attrelid = c.oid AND a.attname = 'relistemp'
> 2018-11-15 12:57:34: pid 14885: LOG:  DB node id: 0 backend pid: 14904 statement: SELECT count(*) FROM pg_catalog.pg_class AS c, pg_namespace AS n WHERE c.relname = 'pgbench_accounts' AND c.relnamespace = n.oid AND n.nspname ~ '^pg_temp_'
> 2018-11-15 12:57:34: pid 14885: LOG:  DB node id: 0 backend pid: 14904 statement: SELECT count(*) from (SELECT has_function_privilege('t-ishii', 'pg_catalog.to_regclass(cstring)', 'execute') WHERE EXISTS(SELECT * FROM pg_catalog.pg_proc AS p WHERE p.proname = 'to_regclass')) AS s
> 2018-11-15 12:57:34: pid 14885: LOG:  DB node id: 0 backend pid: 14904 statement: SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.relname = 'pg_namespace'
> 2018-11-15 12:57:34: pid 14885: LOG:  DB node id: 0 backend pid: 14904 statement: SELECT count(*) FROM pg_class AS c, pg_namespace AS n WHERE c.oid = pg_catalog.to_regclass('"pgbench_accounts"') AND c.relnamespace = n.oid AND n.nspname = 'pg_catalog'
> 2018-11-15 12:57:34: pid 14885: LOG:  DB node id: 0 backend pid: 14904 statement: SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pg_catalog.to_regclass('"pgbench_accounts"') AND (c.relkind = 'v' OR c.relkind = 'm')
> 2018-11-15 12:57:34: pid 14885: LOG:  DB node id: 0 backend pid: 14904 statement: SELECT count(*) FROM pg_catalog.pg_class AS c, pg_catalog.pg_attribute AS a WHERE c.relname = 'pg_class' AND a.attrelid = c.oid AND a.attname = 'relpersistence'
> 2018-11-15 12:57:34: pid 14885: LOG:  DB node id: 0 backend pid: 14904 statement: SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = pg_catalog.to_regclass('"pgbench_accounts"') AND c.relpersistence = 'u'
> 
> Of course this is only done for the first time the table (in this case
> pgbench_accounts) is accessed. The query result is cached and
> subsequent queries use the cache.
> 
> However the cache is in pgpool child process local memory. That means:
> 
> 1) the cache is not shared among pgpool child process
> 
> 2) if the child process dies, newly created pgpool child process needs
> to re-select the system catalogs.
> 
> Especially #1 is a headache if num_init_children is large since each
> pgpool child needs to issue query. For example if num_init_children
> is 100, as many as 900 queries will be sent to PostgreSQL for the
> first time.
> 
> And I only show the data for single table. If user needs to access 100
> different tables, 900*1000 = 90,000 SELECTs are necessary.
> 
> * The proposal
> 
> So I propose to move the relation cache (relcache) to shared
> memory. This should eliminate the problem #1 since now the cache can
> be shared different pgpool child process. Moreover the cache remains
> even after the pgpool child exits, and problem #2 will be eliminated
> together. In the example above the number of SELECTs issued with this
> approch will reduce from 90,000 to 9.
> 
> * Implementation
> 
> Instead of re-invent a relcache in shared memory, I propose to use
> existing infrastructure, namely the in memory query cache. By adding a
> few low level API, the relcache data can be easily placed on query
> cache storage. A bonus of this plan is, the relache can be on
> memcached without any additional coding.
> 
> Comments and/or suggestuons are welcome.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> _______________________________________________
> pgpool-hackers mailing list
> pgpool-hackers at pgpool.net
> http://www.pgpool.net/mailman/listinfo/pgpool-hackers