[pgpool-hackers: 166] Re: Improvements

Wed Jan 30 18:10:39 JST 2013

Awesome! I will look into this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Hello, everyone!
> 
> I've added new features and improvements to pgpool (for parallel mode)
> which I would like to contribute as well as get feedback from the community.
> Features:
>  - Transaction support added (for SELECT * FROM distributed_table);
>  - CURSORS support added (for distributed tables);
>  - Dynamic connections to System DB (optional);
>  - SELECT rewriting fixes (some improvements and some bugfixes).
> 
> 1) Transactions.
> When client sends 'BEGIN' statement, pgpool child connects to another child
> and resend all client statements to it, except for 'SELECT ... FROM
> distributed_table', which executes in usual way. In this case, 'SELECT ...
> FROM distributed_table' executes inside the same transaction block as well
> as other statements.
> The better way would be to use the same child's connections as it uses for
> non-distributed SELECTs, but I haven't found the means of doing it.
> 
> 2) Cursors.
> For example, we have two nodes and distributed table:
>  Node 0             Node 1
> 
>  value              value
> -------            -------
>  4                  3
>  6                  1
>  8                  7
>  2                  9
>  0                  5
> 
>  a. BEGIN;
>  b. DECLARE cursor_name CURSOR FOR SELECT value FROM distributed_table
> ORDER BY value;
>  c. MOVE FORWARD 2 IN cursor_name;
>  d. FETCH FORWARD 3 IN cursor_name:
> 
>  value
> -------
>  2
>  3
>  4
> 
>  CURSORS work for FORWARD/BACKWARD ALL/(positive number).
>  Cursors work for any type of distribution, e.g. round-robin.
> 
>  3) Dynamic connections (optional).
>  If number of clients is not usually too high, it is not necessary to have
> connections to System DB for free childs.
>  New parameter 'system_db_dynamic_connection' added to config.
>  If value of parameter is 0, child works as usual.
>  If value > 0, child connects to System DB when client connects. Also,
> child without System DB connections sleeps for
> 'system_db_dynamic_connection' mcs before accepting new client. So new
> client will be accepted by child, which already has connections to System
> DB.
> 
>  4) SELECT.
>  SELECT rewriting logic was changed a little, and now more complex queries
> work correctly, such as:
>   a. SELECT field_1, (SELECT field_2 FROM replicated_table WHERE
> field_3=S1.field_3) AS field_2 FROM distributed_table S1;
>   b. (SELECT field_1, (SELECT field_2 FROM replicated_table_1) AS field_2
> FROM distributed_table_1 S1) UNION ALL (SELECT field_1, (SELECT field_3
> FROM replicated_table_2) AS field_3 FROM distributed_table_2 S2) ORDER BY
> field_1;
>   c. Also, 'ORDER BY + LIMIT + OFFSET' optimization added:
> SELECT field FROM distributed_table ORDER BY field LIMIT 10 OFFSET 100;
> rewrites into:
> SELECT "pool_c$1" AS field FROM dblink('host=localhost.localdomain
> dbname=dbname port=9999 user=username','SELECT pool_parallel("SELECT
> distributed_table.time FROM distributed_table ORDER BY field LIMIT 10 +
> 100")',false) AS pool_t$0("pool_c$1" field_type) ORDER BY "pool_c$1" OFFSET
> 100 LIMIT 10;
>   d. some bugfixes.
> 
> If you are interested in, I could explain each change in detail.
> 
> Source code is available at https://github.com/afedonin/repo-test.git
> 
> P.S. All changes was made and tested for parallel_mode.
> P.P.S. I don't know pgpool as good as you, so I appreciate any feedback.
> 
> Best regards,
> Alexey Fedonin.