[pgpool-hackers: 3242] Some performance improvements for Pgpool-II

Sat Feb 16 01:12:42 JST 2019

Hi Ishii San

Can you have a look at the attached patch which tries to extract some
performance in the area of query parsing and query analysis for routing
decisions. Most of the performance gains from the changes in the patch can
be observed in large data INSERT statements.

Patch contains the following changes
==========
1-- The idea here is since Pgpool-II only needs a very little information
about the queries especially for the insert queries to decide where it
needs to send the query,
for example: For the INSERT queries we only need the type of query and the
relation name.
But since the parser we use in Pgpool-II is taken from PostgreSQL source
which parses the complete query including the value lists ( which is not
required by Pgpool).
This parsing of value part seems very harmless in small statements but in
case of INSERTs with lots of column values and large data in each value
item, this becomes significant.
So this patch adds a smaller bison grammar rule to short circuit the INSERT
statement parsing when it gets the enough information required for
Pgpool-II.

2-- The patch also re-arranges some of the if statements in
pool_where_to_send() function and tries to make sure the pattern_compare
and pool_has_function_call calls should only be made when they are
absolutely necessary.

3--Another thing this patch does is, it tries to save the raw_parser()
calls in case of un-recognised queries. Instead of invoking the parser of
"dummy read" and "dummy write" queries in case of syntax error in original
query, the patch adds the functions to get pre-built parse_trees for these
dummy queries.

4-- strlen() call is removed from scanner_init() function and is passed in
as an argument to it, and the reason is we already have the query length in
most cases before invoking the parser so why waste CPU cycles on it. Again
this becomes significant in case of large query strings.

Finally the patch tries to remove the unnecessary calls of
pool_is_likely_select()

As mentioned above the area of improvements in this patch are mostly around
writing queries and for the testing purpose I used a INSERT query with
large binary insert data and I am getting a very huge performance gains
with this patch

*Current Master Branch*
================
usama=# \i sample.sql
 id
-----
 104
(1 row)

INSERT 0 1
Time: *2059.807* ms (00:02.060)

*WITH PATCH*
===============
usama=# \i sample.sql
 id
-----
 102
(1 row)

INSERT 0 1
Time: *314.237* ms

Performance gain* 655.50 %*

Comments and suggestions?

Please let me know if you also want the test data I used for the INSERT test

Thanks
Best regards
Muhammad Usama
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190215/1c51d953/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: some_performance_improvements.diff
Type: application/octet-stream
Size: 3664770 bytes
Desc: not available
URL: <http://www.sraoss.jp/pipermail/pgpool-hackers/attachments/20190215/1c51d953/attachment-0001.obj>