[pgpool-hackers: 3264] Re: Some performance improvements for Pgpool-II

Muhammad Usama m.usama at gmail.com
Fri Mar 1 23:36:19 JST 2019


Hi Ishii-San

Thanks for looking into the patch and your observation is correct, the
patch I sent would break
the timestamp overwriting functionality,

So here is another go at this, See the new attached patch, which uses two
parsers,
1- src/parser/gram.y which is the standard parser brought in from PG code
2- src/parser/gram_minimal.y modified parser that short-circuits the INSERT
and UPDATE statements

The idea here is when replication mode is enabled we use the standard
parser and parse everything since
we need the information in case of timestamp or other things. but when we
are operating in the master-slave
mode we don't need any extra information for WRITE queries, since we will
always be sending them to the primary
anyways, so the newly added minimal parser which short-circuits the INSERT
and UPDATE queries
(can also be extended to short-circuit all the write queries like CREATE
and ALTER where ever we can)
and try to enhance the write query performance of the Pgpool-II,

The parser selection is made in raw_parser() function and is controlled by
the "bool use_minimal" argument
and currently in the patch we always use the minimal parser whenever the
replication mode is disabled in the
pgpool.conf

This is the very radical change in the parsing function of Pgpool-II, but
it can provide us with the platform to minimise
the parsing overhead of Pgpool-II for master-slave mode.

Your thoughts and comments..

Thanks
Best Regards
Muhammad Usama

On Wed, Feb 27, 2019 at 3:58 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> I think this patch breaks replication mode because it throws away
> information needed for time stamp rewriting. What do you think?
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
> > Hi Ishii San
> >
> > Can you have a look at the attached patch which tries to extract some
> > performance in the area of query parsing and query analysis for routing
> > decisions. Most of the performance gains from the changes in the patch
> can
> > be observed in large data INSERT statements.
> >
> > Patch contains the following changes
> > ==========
> > 1-- The idea here is since Pgpool-II only needs a very little information
> > about the queries especially for the insert queries to decide where it
> > needs to send the query,
> > for example: For the INSERT queries we only need the type of query and
> the
> > relation name.
> > But since the parser we use in Pgpool-II is taken from PostgreSQL source
> > which parses the complete query including the value lists ( which is not
> > required by Pgpool).
> > This parsing of value part seems very harmless in small statements but in
> > case of INSERTs with lots of column values and large data in each value
> > item, this becomes significant.
> > So this patch adds a smaller bison grammar rule to short circuit the
> INSERT
> > statement parsing when it gets the enough information required for
> > Pgpool-II.
> >
> > 2-- The patch also re-arranges some of the if statements in
> > pool_where_to_send() function and tries to make sure the pattern_compare
> > and pool_has_function_call calls should only be made when they are
> > absolutely necessary.
> >
> > 3--Another thing this patch does is, it tries to save the raw_parser()
> > calls in case of un-recognised queries. Instead of invoking the parser of
> > "dummy read" and "dummy write" queries in case of syntax error in
> original
> > query, the patch adds the functions to get pre-built parse_trees for
> these
> > dummy queries.
> >
> > 4-- strlen() call is removed from scanner_init() function and is passed
> in
> > as an argument to it, and the reason is we already have the query length
> in
> > most cases before invoking the parser so why waste CPU cycles on it.
> Again
> > this becomes significant in case of large query strings.
> >
> > Finally the patch tries to remove the unnecessary calls of
> > pool_is_likely_select()
> >
> > As mentioned above the area of improvements in this patch are mostly
> around
> > writing queries and for the testing purpose I used a INSERT query with
> > large binary insert data and I am getting a very huge performance gains
> > with this patch
> >
> > *Current Master Branch*
> > ================
> > usama=# \i sample.sql
> >  id
> > -----
> >  104
> > (1 row)
> >
> > INSERT 0 1
> > Time: *2059.807* ms (00:02.060)
> >
> > *WITH PATCH*
> > ===============
> > usama=# \i sample.sql
> >  id
> > -----
> >  102
> > (1 row)
> >
> > INSERT 0 1
> > Time: *314.237* ms
> >
> >
> > Performance gain* 655.50 %*
> >
> >
> > Comments and suggestions?
> >
> > Please let me know if you also want the test data I used for the INSERT
> test
> >
> > Thanks
> > Best regards
> > Muhammad Usama
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190301/b1e64775/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: multiple_parsers.diff.zip
Type: application/zip
Size: 361939 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190301/b1e64775/attachment-0001.zip>


More information about the pgpool-hackers mailing list