[pgpool-hackers: 3357] Re: Some performance improvements for Pgpool-II

Thu Aug 1 06:11:15 JST 2019

Hi Ishii-San

While rebasing the multiple parser patch after the PostgreSQL parser import
commit, I was thinking of the ways
to minimize the maintenance overhead and merging overheads for the new
minimal parser as suggested by you
upthread.

So in the newer attached version of the patch, instead of maintaining two
separate gram files for standard and
minimal parsers I have changed the original gram.y file to gram_template.y
and added a new make target in
src/parser/ Makefile *'make generate_parsrers'* to generate (gram_minimal.y
and gram_standard.y) grammar files
for minimal and standard parsers.

Makefile uses the sunifdef (http://www.linuxcertif.com/man/1/sunifdef/)
utility to generate the grammar files from
gram_template.y file, So I have also made the appropriate changes to
Pgpool-II configure so that sunifdef utility path
can be configured. After the patch, we can also give the explicit sunifdef
patch to configure using the *--with-sunifdef*
switch if the utility is not present at the standard location.

The patch also contains the README file in src/parser/ directory which has
all the information about what needs
to be done to generate the grammar files and about importing the PostgreSQL
parser.

Also, note that sunifdef will only be required when we want to generate the
grammar files, which would be only after
importing the new PostgreSQL parser or when we want to make some explicit
grammar changes. And for the normal
build of Pgpool-II, we would not need sunifdef.

Thoughts and comments

P.S Sorry for the very huge patch because of autogenerated files.

Thanks
Best regards
Muhammad Usama

On Thu, May 2, 2019 at 4:11 PM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:

> Hi Usama,
>
> > Hi Ishii-San
> >
> > Thanks for the feedback, I have been exploring the options to generate
> two
> > parsers from one gram.y file but unfortunately can't figure out any clean
> > and easy way to
> > do that. The problem is the bison assembler does not support the
> > preprocessors
> > off the shelf and I am not sure about how we can achieve that without
> > preprocessor
> > using the scripts.
> >
> > However, after analysing the end result of using one gram.y file for
> > generating
> > two parser I think it would be more hectic to maintain and especially
> merge
> > the gram.y
> > file in that case. Since adding the code in the existing gram.y file will
> > increase the number
> > of changes in our gram.y file from that of PostgreSQL, and effectively
> that
> > mean at every
> > PostgreSQL release we would require a bigger merging effort.
> >
> > And if we go down the path of having two gram files as in the original
> > patch I shared,
> > despite the downside of having another parser file to maintain, the
> > approach has its own
> > benefits as well. Like it would be easier to track down the errors and
> > problems.
> >
> > Also the idea of having the minimal parser for master-slave mode is to
> > speed up the queries by
> > quickly analysing if we we need to parse the complete query or can we
> just
> > do with the minimal
> > information. Now my motivation of adding the minimal parser was that, in
> a
> > perfect scenario we should
> > only parse the read statements and short-circuit the parsing of almost
> all
> > write queries. So although the
> > patch does that short-circuiting only with INSERT statements, but for
> > future it would be easier to make
> > enhancements in the minimal parser if we go with 2 gram files approach.
> >
> > Another thing is since the minimal parser is based on the philosophy that
> > only queries that can be load-balanced
> > must be parsed completely and rest of the statements need to be routed to
> > primary anyways, and can
> > be consider as default write statements no matter the actual statement.
> So
> > effectively we might
> > not need to merge the minimal parser with PostgreSQL's parser in most of
> > the cases and the minimal parser
> > would not add too much maintenance overhead.
>
> But SET statement needs special treatment and needs full parser for
> it. See is_set_transaction_serializable (in
> src/context/pool_query_context.c) for an example. Same thing can be
> said to LOCK, SAVEPOINT, BEGIN (START TRANSACTION), and PREPARE.
>
> Also for DML, at least the target table info needs to be analyzed
> because in memory query cache needs the info.
>
> > What are your thought and suggestions on this?
>
> If we go for two gram.y approach, probably we want to minimize the
> diff to create minimal parser from the full parser in order to save
> the work to create minimal parser, no?
>
> > Thanks
> > Best Regards
> > Muhammad Usama
> >
> >
> > to maintain
> >
> >
> >
> >
> > On Fri, Mar 8, 2019 at 7:40 AM Tatsuo Ishii <ishii at sraoss.co.jp> wrote:
> >
> >> Hi Usama,
> >>
> >> Thank you for the new patch. The essential idea to have two parsers
> >> seems to be good. However, I have a concern of maintenance cost for
> >> those two distinct parsers. It seems the difference between two
> >> parsers are subtle. So why don't you have just 1 original parser file
> >> (gram.y) then generate the other automatically by using a script? Or
> >> you could have 1 gram.y file with some ifdef's to differentiate those
> >> two parsers. What do you think?
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> > Hi Ishii-San
> >> >
> >> > Thanks for looking into the patch and your observation is correct, the
> >> > patch I sent would break
> >> > the timestamp overwriting functionality,
> >> >
> >> > So here is another go at this, See the new attached patch, which uses
> two
> >> > parsers,
> >> > 1- src/parser/gram.y which is the standard parser brought in from PG
> code
> >> > 2- src/parser/gram_minimal.y modified parser that short-circuits the
> >> INSERT
> >> > and UPDATE statements
> >> >
> >> > The idea here is when replication mode is enabled we use the standard
> >> > parser and parse everything since
> >> > we need the information in case of timestamp or other things. but
> when we
> >> > are operating in the master-slave
> >> > mode we don't need any extra information for WRITE queries, since we
> will
> >> > always be sending them to the primary
> >> > anyways, so the newly added minimal parser which short-circuits the
> >> INSERT
> >> > and UPDATE queries
> >> > (can also be extended to short-circuit all the write queries like
> CREATE
> >> > and ALTER where ever we can)
> >> > and try to enhance the write query performance of the Pgpool-II,
> >> >
> >> > The parser selection is made in raw_parser() function and is
> controlled
> >> by
> >> > the "bool use_minimal" argument
> >> > and currently in the patch we always use the minimal parser whenever
> the
> >> > replication mode is disabled in the
> >> > pgpool.conf
> >> >
> >> > This is the very radical change in the parsing function of Pgpool-II,
> but
> >> > it can provide us with the platform to minimise
> >> > the parsing overhead of Pgpool-II for master-slave mode.
> >> >
> >> > Your thoughts and comments..
> >> >
> >> > Thanks
> >> > Best Regards
> >> > Muhammad Usama
> >> >
> >> > On Wed, Feb 27, 2019 at 3:58 AM Tatsuo Ishii <ishii at sraoss.co.jp>
> wrote:
> >> >
> >> >> Hi Usama,
> >> >>
> >> >> I think this patch breaks replication mode because it throws away
> >> >> information needed for time stamp rewriting. What do you think?
> >> >>
> >> >> Best regards,
> >> >> --
> >> >> Tatsuo Ishii
> >> >> SRA OSS, Inc. Japan
> >> >> English: http://www.sraoss.co.jp/index_en.php
> >> >> Japanese:http://www.sraoss.co.jp
> >> >>
> >> >> > Hi Ishii San
> >> >> >
> >> >> > Can you have a look at the attached patch which tries to extract
> some
> >> >> > performance in the area of query parsing and query analysis for
> >> routing
> >> >> > decisions. Most of the performance gains from the changes in the
> patch
> >> >> can
> >> >> > be observed in large data INSERT statements.
> >> >> >
> >> >> > Patch contains the following changes
> >> >> > ==========
> >> >> > 1-- The idea here is since Pgpool-II only needs a very little
> >> information
> >> >> > about the queries especially for the insert queries to decide
> where it
> >> >> > needs to send the query,
> >> >> > for example: For the INSERT queries we only need the type of query
> and
> >> >> the
> >> >> > relation name.
> >> >> > But since the parser we use in Pgpool-II is taken from PostgreSQL
> >> source
> >> >> > which parses the complete query including the value lists ( which
> is
> >> not
> >> >> > required by Pgpool).
> >> >> > This parsing of value part seems very harmless in small statements
> >> but in
> >> >> > case of INSERTs with lots of column values and large data in each
> >> value
> >> >> > item, this becomes significant.
> >> >> > So this patch adds a smaller bison grammar rule to short circuit
> the
> >> >> INSERT
> >> >> > statement parsing when it gets the enough information required for
> >> >> > Pgpool-II.
> >> >> >
> >> >> > 2-- The patch also re-arranges some of the if statements in
> >> >> > pool_where_to_send() function and tries to make sure the
> >> pattern_compare
> >> >> > and pool_has_function_call calls should only be made when they are
> >> >> > absolutely necessary.
> >> >> >
> >> >> > 3--Another thing this patch does is, it tries to save the
> raw_parser()
> >> >> > calls in case of un-recognised queries. Instead of invoking the
> >> parser of
> >> >> > "dummy read" and "dummy write" queries in case of syntax error in
> >> >> original
> >> >> > query, the patch adds the functions to get pre-built parse_trees
> for
> >> >> these
> >> >> > dummy queries.
> >> >> >
> >> >> > 4-- strlen() call is removed from scanner_init() function and is
> >> passed
> >> >> in
> >> >> > as an argument to it, and the reason is we already have the query
> >> length
> >> >> in
> >> >> > most cases before invoking the parser so why waste CPU cycles on
> it.
> >> >> Again
> >> >> > this becomes significant in case of large query strings.
> >> >> >
> >> >> > Finally the patch tries to remove the unnecessary calls of
> >> >> > pool_is_likely_select()
> >> >> >
> >> >> > As mentioned above the area of improvements in this patch are
> mostly
> >> >> around
> >> >> > writing queries and for the testing purpose I used a INSERT query
> with
> >> >> > large binary insert data and I am getting a very huge performance
> >> gains
> >> >> > with this patch
> >> >> >
> >> >> > *Current Master Branch*
> >> >> > ================
> >> >> > usama=# \i sample.sql
> >> >> >  id
> >> >> > -----
> >> >> >  104
> >> >> > (1 row)
> >> >> >
> >> >> > INSERT 0 1
> >> >> > Time: *2059.807* ms (00:02.060)
> >> >> >
> >> >> > *WITH PATCH*
> >> >> > ===============
> >> >> > usama=# \i sample.sql
> >> >> >  id
> >> >> > -----
> >> >> >  102
> >> >> > (1 row)
> >> >> >
> >> >> > INSERT 0 1
> >> >> > Time: *314.237* ms
> >> >> >
> >> >> >
> >> >> > Performance gain* 655.50 %*
> >> >> >
> >> >> >
> >> >> > Comments and suggestions?
> >> >> >
> >> >> > Please let me know if you also want the test data I used for the
> >> INSERT
> >> >> test
> >> >> >
> >> >> > Thanks
> >> >> > Best regards
> >> >> > Muhammad Usama
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190801/5b755562/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: multiple_parsers_new.diff.zip
Type: application/zip
Size: 1126318 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190801/5b755562/attachment-0001.zip>