[pgpool-hackers: 166] Re: Improvements
Tatsuo Ishii
ishii at postgresql.org
Wed Jan 30 18:10:39 JST 2013
Awesome! I will look into this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
> Hello, everyone!
>
> I've added new features and improvements to pgpool (for parallel mode)
> which I would like to contribute as well as get feedback from the community.
> Features:
> - Transaction support added (for SELECT * FROM distributed_table);
> - CURSORS support added (for distributed tables);
> - Dynamic connections to System DB (optional);
> - SELECT rewriting fixes (some improvements and some bugfixes).
>
> 1) Transactions.
> When client sends 'BEGIN' statement, pgpool child connects to another child
> and resend all client statements to it, except for 'SELECT ... FROM
> distributed_table', which executes in usual way. In this case, 'SELECT ...
> FROM distributed_table' executes inside the same transaction block as well
> as other statements.
> The better way would be to use the same child's connections as it uses for
> non-distributed SELECTs, but I haven't found the means of doing it.
>
> 2) Cursors.
> For example, we have two nodes and distributed table:
> Node 0 Node 1
>
> value value
> ------- -------
> 4 3
> 6 1
> 8 7
> 2 9
> 0 5
>
> a. BEGIN;
> b. DECLARE cursor_name CURSOR FOR SELECT value FROM distributed_table
> ORDER BY value;
> c. MOVE FORWARD 2 IN cursor_name;
> d. FETCH FORWARD 3 IN cursor_name:
>
> value
> -------
> 2
> 3
> 4
>
> CURSORS work for FORWARD/BACKWARD ALL/(positive number).
> Cursors work for any type of distribution, e.g. round-robin.
>
> 3) Dynamic connections (optional).
> If number of clients is not usually too high, it is not necessary to have
> connections to System DB for free childs.
> New parameter 'system_db_dynamic_connection' added to config.
> If value of parameter is 0, child works as usual.
> If value > 0, child connects to System DB when client connects. Also,
> child without System DB connections sleeps for
> 'system_db_dynamic_connection' mcs before accepting new client. So new
> client will be accepted by child, which already has connections to System
> DB.
>
> 4) SELECT.
> SELECT rewriting logic was changed a little, and now more complex queries
> work correctly, such as:
> a. SELECT field_1, (SELECT field_2 FROM replicated_table WHERE
> field_3=S1.field_3) AS field_2 FROM distributed_table S1;
> b. (SELECT field_1, (SELECT field_2 FROM replicated_table_1) AS field_2
> FROM distributed_table_1 S1) UNION ALL (SELECT field_1, (SELECT field_3
> FROM replicated_table_2) AS field_3 FROM distributed_table_2 S2) ORDER BY
> field_1;
> c. Also, 'ORDER BY + LIMIT + OFFSET' optimization added:
> SELECT field FROM distributed_table ORDER BY field LIMIT 10 OFFSET 100;
> rewrites into:
> SELECT "pool_c$1" AS field FROM dblink('host=localhost.localdomain
> dbname=dbname port=9999 user=username','SELECT pool_parallel("SELECT
> distributed_table.time FROM distributed_table ORDER BY field LIMIT 10 +
> 100")',false) AS pool_t$0("pool_c$1" field_type) ORDER BY "pool_c$1" OFFSET
> 100 LIMIT 10;
> d. some bugfixes.
>
> If you are interested in, I could explain each change in detail.
>
> Source code is available at https://github.com/afedonin/repo-test.git
>
> P.S. All changes was made and tested for parallel_mode.
> P.P.S. I don't know pgpool as good as you, so I appreciate any feedback.
>
> Best regards,
> Alexey Fedonin.
More information about the pgpool-hackers
mailing list