[pgpool-hackers: 168] Re: Improvements

Sun Feb 10 09:36:09 JST 2013

Thanks.

I have tested a little bit your patches and now I would like to hear
more details of implementation because your changes are big and hard
to follow implementation details for me(especially
pool_process_query.c).

Also here are some requests:

1) please include doc changes. (i.e. doc/pgpool-en.html)

2) we do not use C++ style codings (we follow the PostgreSQL coding
   style). So please do not use '//' comments. Also please do not
   declare variable in the middle of blocks. I mean:

   if (a < b)
   {
		int d;

		a = 1;
		b = 2;
		c = 3;
		:
		:
	}

	is ok but this is not ok:

   if (a < b)
   {
		a = 1;
		b = 2;
		int c  = 3;
		:
		:
	}
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Sure.
> 
> Best regards,
> Alexey Fedonin.
> 
> 2013/2/1 Tatsuo Ishii <ishii at postgresql.org>
> 
>> Alexey,
>>
>> I tried to make a diff between current git repo master and you repo in
>> github and found yoursis based on a little bit older version of
>> repo. Can you please rebase it? Or even better, could you send diff
>> against current git repo master?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>> > Hello, everyone!
>> >
>> > I've added new features and improvements to pgpool (for parallel mode)
>> > which I would like to contribute as well as get feedback from the
>> community.
>> > Features:
>> >  - Transaction support added (for SELECT * FROM distributed_table);
>> >  - CURSORS support added (for distributed tables);
>> >  - Dynamic connections to System DB (optional);
>> >  - SELECT rewriting fixes (some improvements and some bugfixes).
>> >
>> > 1) Transactions.
>> > When client sends 'BEGIN' statement, pgpool child connects to another
>> child
>> > and resend all client statements to it, except for 'SELECT ... FROM
>> > distributed_table', which executes in usual way. In this case, 'SELECT
>> ...
>> > FROM distributed_table' executes inside the same transaction block as
>> well
>> > as other statements.
>> > The better way would be to use the same child's connections as it uses
>> for
>> > non-distributed SELECTs, but I haven't found the means of doing it.
>> >
>> > 2) Cursors.
>> > For example, we have two nodes and distributed table:
>> >  Node 0             Node 1
>> >
>> >  value              value
>> > -------            -------
>> >  4                  3
>> >  6                  1
>> >  8                  7
>> >  2                  9
>> >  0                  5
>> >
>> >  a. BEGIN;
>> >  b. DECLARE cursor_name CURSOR FOR SELECT value FROM distributed_table
>> > ORDER BY value;
>> >  c. MOVE FORWARD 2 IN cursor_name;
>> >  d. FETCH FORWARD 3 IN cursor_name:
>> >
>> >  value
>> > -------
>> >  2
>> >  3
>> >  4
>> >
>> >  CURSORS work for FORWARD/BACKWARD ALL/(positive number).
>> >  Cursors work for any type of distribution, e.g. round-robin.
>> >
>> >  3) Dynamic connections (optional).
>> >  If number of clients is not usually too high, it is not necessary to
>> have
>> > connections to System DB for free childs.
>> >  New parameter 'system_db_dynamic_connection' added to config.
>> >  If value of parameter is 0, child works as usual.
>> >  If value > 0, child connects to System DB when client connects. Also,
>> > child without System DB connections sleeps for
>> > 'system_db_dynamic_connection' mcs before accepting new client. So new
>> > client will be accepted by child, which already has connections to System
>> > DB.
>> >
>> >  4) SELECT.
>> >  SELECT rewriting logic was changed a little, and now more complex
>> queries
>> > work correctly, such as:
>> >   a. SELECT field_1, (SELECT field_2 FROM replicated_table WHERE
>> > field_3=S1.field_3) AS field_2 FROM distributed_table S1;
>> >   b. (SELECT field_1, (SELECT field_2 FROM replicated_table_1) AS field_2
>> > FROM distributed_table_1 S1) UNION ALL (SELECT field_1, (SELECT field_3
>> > FROM replicated_table_2) AS field_3 FROM distributed_table_2 S2) ORDER BY
>> > field_1;
>> >   c. Also, 'ORDER BY + LIMIT + OFFSET' optimization added:
>> > SELECT field FROM distributed_table ORDER BY field LIMIT 10 OFFSET 100;
>> > rewrites into:
>> > SELECT "pool_c$1" AS field FROM dblink('host=localhost.localdomain
>> > dbname=dbname port=9999 user=username','SELECT pool_parallel("SELECT
>> > distributed_table.time FROM distributed_table ORDER BY field LIMIT 10 +
>> > 100")',false) AS pool_t$0("pool_c$1" field_type) ORDER BY "pool_c$1"
>> OFFSET
>> > 100 LIMIT 10;
>> >   d. some bugfixes.
>> >
>> > If you are interested in, I could explain each change in detail.
>> >
>> > Source code is available at https://github.com/afedonin/repo-test.git
>> >
>> > P.S. All changes was made and tested for parallel_mode.
>> > P.P.S. I don't know pgpool as good as you, so I appreciate any feedback.
>> >
>> > Best regards,
>> > Alexey Fedonin.
>>