[pgpool-hackers: 3265] Re: Some performance improvements for Pgpool-II

Fri Mar 8 11:40:40 JST 2019

Hi Usama,

Thank you for the new patch. The essential idea to have two parsers
seems to be good. However, I have a concern of maintenance cost for
those two distinct parsers. It seems the difference between two
parsers are subtle. So why don't you have just 1 original parser file
(gram.y) then generate the other automatically by using a script? Or
you could have 1 gram.y file with some ifdef's to differentiate those
two parsers. What do you think?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hi Ishii-San
> 
> Thanks for looking into the patch and your observation is correct, the
> patch I sent would break
> the timestamp overwriting functionality,
> 
> So here is another go at this, See the new attached patch, which uses two
> parsers,
> 1- src/parser/gram.y which is the standard parser brought in from PG code
> 2- src/parser/gram_minimal.y modified parser that short-circuits the INSERT
> and UPDATE statements
> 
> The idea here is when replication mode is enabled we use the standard
> parser and parse everything since
> we need the information in case of timestamp or other things. but when we
> are operating in the master-slave
> mode we don't need any extra information for WRITE queries, since we will
> always be sending them to the primary
> anyways, so the newly added minimal parser which short-circuits the INSERT
> and UPDATE queries
> (can also be extended to short-circuit all the write queries like CREATE
> and ALTER where ever we can)
> and try to enhance the write query performance of the Pgpool-II,
> 
> The parser selection is made in raw_parser() function and is controlled by
> the "bool use_minimal" argument
> and currently in the patch we always use the minimal parser whenever the
> replication mode is disabled in the
> pgpool.conf
> 
> This is the very radical change in the parsing function of Pgpool-II, but
> it can provide us with the platform to minimise
> the parsing overhead of Pgpool-II for master-slave mode.
> 
> Your thoughts and comments..
> 
> Thanks
> Best Regards
> Muhammad Usama
> 
> On Wed, Feb 27, 2019 at 3:58 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> 
>> Hi Usama,
>>
>> I think this patch breaks replication mode because it throws away
>> information needed for time stamp rewriting. What do you think?
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese:http://www.sraoss.co.jp
>>
>> > Hi Ishii San
>> >
>> > Can you have a look at the attached patch which tries to extract some
>> > performance in the area of query parsing and query analysis for routing
>> > decisions. Most of the performance gains from the changes in the patch
>> can
>> > be observed in large data INSERT statements.
>> >
>> > Patch contains the following changes
>> > ==========
>> > 1-- The idea here is since Pgpool-II only needs a very little information
>> > about the queries especially for the insert queries to decide where it
>> > needs to send the query,
>> > for example: For the INSERT queries we only need the type of query and
>> the
>> > relation name.
>> > But since the parser we use in Pgpool-II is taken from PostgreSQL source
>> > which parses the complete query including the value lists ( which is not
>> > required by Pgpool).
>> > This parsing of value part seems very harmless in small statements but in
>> > case of INSERTs with lots of column values and large data in each value
>> > item, this becomes significant.
>> > So this patch adds a smaller bison grammar rule to short circuit the
>> INSERT
>> > statement parsing when it gets the enough information required for
>> > Pgpool-II.
>> >
>> > 2-- The patch also re-arranges some of the if statements in
>> > pool_where_to_send() function and tries to make sure the pattern_compare
>> > and pool_has_function_call calls should only be made when they are
>> > absolutely necessary.
>> >
>> > 3--Another thing this patch does is, it tries to save the raw_parser()
>> > calls in case of un-recognised queries. Instead of invoking the parser of
>> > "dummy read" and "dummy write" queries in case of syntax error in
>> original
>> > query, the patch adds the functions to get pre-built parse_trees for
>> these
>> > dummy queries.
>> >
>> > 4-- strlen() call is removed from scanner_init() function and is passed
>> in
>> > as an argument to it, and the reason is we already have the query length
>> in
>> > most cases before invoking the parser so why waste CPU cycles on it.
>> Again
>> > this becomes significant in case of large query strings.
>> >
>> > Finally the patch tries to remove the unnecessary calls of
>> > pool_is_likely_select()
>> >
>> > As mentioned above the area of improvements in this patch are mostly
>> around
>> > writing queries and for the testing purpose I used a INSERT query with
>> > large binary insert data and I am getting a very huge performance gains
>> > with this patch
>> >
>> > *Current Master Branch*
>> > ================
>> > usama=# \i sample.sql
>> >  id
>> > -----
>> >  104
>> > (1 row)
>> >
>> > INSERT 0 1
>> > Time: *2059.807* ms (00:02.060)
>> >
>> > *WITH PATCH*
>> > ===============
>> > usama=# \i sample.sql
>> >  id
>> > -----
>> >  102
>> > (1 row)
>> >
>> > INSERT 0 1
>> > Time: *314.237* ms
>> >
>> >
>> > Performance gain* 655.50 %*
>> >
>> >
>> > Comments and suggestions?
>> >
>> > Please let me know if you also want the test data I used for the INSERT
>> test
>> >
>> > Thanks
>> > Best regards
>> > Muhammad Usama
>>