[pgpool-hackers: 165] Improvements

Wed Jan 30 18:00:52 JST 2013

Hello, everyone!

I've added new features and improvements to pgpool (for parallel mode)
which I would like to contribute as well as get feedback from the community.
Features:
 - Transaction support added (for SELECT * FROM distributed_table);
 - CURSORS support added (for distributed tables);
 - Dynamic connections to System DB (optional);
 - SELECT rewriting fixes (some improvements and some bugfixes).

1) Transactions.
When client sends 'BEGIN' statement, pgpool child connects to another child
and resend all client statements to it, except for 'SELECT ... FROM
distributed_table', which executes in usual way. In this case, 'SELECT ...
FROM distributed_table' executes inside the same transaction block as well
as other statements.
The better way would be to use the same child's connections as it uses for
non-distributed SELECTs, but I haven't found the means of doing it.

2) Cursors.
For example, we have two nodes and distributed table:
 Node 0             Node 1

 value              value
-------            -------
 4                  3
 6                  1
 8                  7
 2                  9
 0                  5

 a. BEGIN;
 b. DECLARE cursor_name CURSOR FOR SELECT value FROM distributed_table
ORDER BY value;
 c. MOVE FORWARD 2 IN cursor_name;
 d. FETCH FORWARD 3 IN cursor_name:

 value
-------
 2
 3
 4

 CURSORS work for FORWARD/BACKWARD ALL/(positive number).
 Cursors work for any type of distribution, e.g. round-robin.

 3) Dynamic connections (optional).
 If number of clients is not usually too high, it is not necessary to have
connections to System DB for free childs.
 New parameter 'system_db_dynamic_connection' added to config.
 If value of parameter is 0, child works as usual.
 If value > 0, child connects to System DB when client connects. Also,
child without System DB connections sleeps for
'system_db_dynamic_connection' mcs before accepting new client. So new
client will be accepted by child, which already has connections to System
DB.

 4) SELECT.
 SELECT rewriting logic was changed a little, and now more complex queries
work correctly, such as:
  a. SELECT field_1, (SELECT field_2 FROM replicated_table WHERE
field_3=S1.field_3) AS field_2 FROM distributed_table S1;
  b. (SELECT field_1, (SELECT field_2 FROM replicated_table_1) AS field_2
FROM distributed_table_1 S1) UNION ALL (SELECT field_1, (SELECT field_3
FROM replicated_table_2) AS field_3 FROM distributed_table_2 S2) ORDER BY
field_1;
  c. Also, 'ORDER BY + LIMIT + OFFSET' optimization added:
SELECT field FROM distributed_table ORDER BY field LIMIT 10 OFFSET 100;
rewrites into:
SELECT "pool_c$1" AS field FROM dblink('host=localhost.localdomain
dbname=dbname port=9999 user=username','SELECT pool_parallel("SELECT
distributed_table.time FROM distributed_table ORDER BY field LIMIT 10 +
100")',false) AS pool_t$0("pool_c$1" field_type) ORDER BY "pool_c$1" OFFSET
100 LIMIT 10;
  d. some bugfixes.

If you are interested in, I could explain each change in detail.

Source code is available at https://github.com/afedonin/repo-test.git

P.S. All changes was made and tested for parallel_mode.
P.P.S. I don't know pgpool as good as you, so I appreciate any feedback.

Best regards,
Alexey Fedonin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20130130/36535f69/attachment.html>