<div dir="ltr"><div>Hi Ishii-San</div><div><br></div><div>Thanks for the feedback, I have been exploring the options to generate two</div><div>parsers from one gram.y file but unfortunately can't figure out any clean and easy way to</div><div>do that. The problem is the bison assembler does not support the preprocessors</div><div>off the shelf and I am not sure about how we can achieve that without preprocessor</div><div>using the scripts.</div><div><br></div><div>However, after analysing the end result of using one gram.y file for generating</div><div>two parser I think it would be more hectic to maintain and especially merge the gram.y</div><div>file in that case. Since adding the code in the existing gram.y file will increase the number</div><div>of changes in our gram.y file from that of PostgreSQL, and effectively that mean at every</div><div>PostgreSQL release we would require a bigger merging effort.</div><div><br></div><div>And if we go down the path of having two gram files as in the original patch I shared,</div><div>despite the downside of having another parser file to maintain, the approach has its own</div><div>benefits as well. Like it would be easier to track down the errors and problems.</div><div><br></div><div>Also the idea of having the minimal parser for master-slave mode is to speed up the queries by</div><div>quickly analysing if we we need to parse the complete query or can we just do with the minimal</div><div>information. Now my motivation of adding the minimal parser was that, in a perfect scenario we should</div><div>only parse the read statements and short-circuit the parsing of almost all write queries. So although the</div><div>patch does that short-circuiting only with INSERT statements, but for future it would be easier to make</div><div>enhancements in the minimal parser if we go with 2 gram files approach.</div><div><br></div><div>Another thing is since the minimal parser is based on the philosophy that only queries that can be load-balanced</div><div>must be parsed completely and rest of the statements need to be routed to primary anyways, and can</div><div>be consider as default write statements no matter the actual statement. So effectively we might</div><div>not need to merge the minimal parser with PostgreSQL's parser in most of the cases and the minimal parser</div><div>would not add too much maintenance overhead.</div><div><br></div><div>What are your thought and suggestions on this?</div><div><br></div><div><br></div><div>Thanks</div><div>Best Regards</div><div>Muhammad Usama</div><div><br></div><div><br></div><div>to maintain </div><div><br></div><div><br></div><div> </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 8, 2019 at 7:40 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp">ishii@sraoss.co.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Usama,<br>
<br>
Thank you for the new patch. The essential idea to have two parsers<br>
seems to be good. However, I have a concern of maintenance cost for<br>
those two distinct parsers. It seems the difference between two<br>
parsers are subtle. So why don't you have just 1 original parser file<br>
(gram.y) then generate the other automatically by using a script? Or<br>
you could have 1 gram.y file with some ifdef's to differentiate those<br>
two parsers. What do you think?<br>
<br>
Best regards,<br>
--<br>
Tatsuo Ishii<br>
SRA OSS, Inc. Japan<br>
English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>
Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
<br>
> Hi Ishii-San<br>
> <br>
> Thanks for looking into the patch and your observation is correct, the<br>
> patch I sent would break<br>
> the timestamp overwriting functionality,<br>
> <br>
> So here is another go at this, See the new attached patch, which uses two<br>
> parsers,<br>
> 1- src/parser/gram.y which is the standard parser brought in from PG code<br>
> 2- src/parser/gram_minimal.y modified parser that short-circuits the INSERT<br>
> and UPDATE statements<br>
> <br>
> The idea here is when replication mode is enabled we use the standard<br>
> parser and parse everything since<br>
> we need the information in case of timestamp or other things. but when we<br>
> are operating in the master-slave<br>
> mode we don't need any extra information for WRITE queries, since we will<br>
> always be sending them to the primary<br>
> anyways, so the newly added minimal parser which short-circuits the INSERT<br>
> and UPDATE queries<br>
> (can also be extended to short-circuit all the write queries like CREATE<br>
> and ALTER where ever we can)<br>
> and try to enhance the write query performance of the Pgpool-II,<br>
> <br>
> The parser selection is made in raw_parser() function and is controlled by<br>
> the "bool use_minimal" argument<br>
> and currently in the patch we always use the minimal parser whenever the<br>
> replication mode is disabled in the<br>
> pgpool.conf<br>
> <br>
> This is the very radical change in the parsing function of Pgpool-II, but<br>
> it can provide us with the platform to minimise<br>
> the parsing overhead of Pgpool-II for master-slave mode.<br>
> <br>
> Your thoughts and comments..<br>
> <br>
> Thanks<br>
> Best Regards<br>
> Muhammad Usama<br>
> <br>
> On Wed, Feb 27, 2019 at 3:58 AM Tatsuo Ishii <<a href="mailto:ishii@sraoss.co.jp" target="_blank">ishii@sraoss.co.jp</a>> wrote:<br>
> <br>
>> Hi Usama,<br>
>><br>
>> I think this patch breaks replication mode because it throws away<br>
>> information needed for time stamp rewriting. What do you think?<br>
>><br>
>> Best regards,<br>
>> --<br>
>> Tatsuo Ishii<br>
>> SRA OSS, Inc. Japan<br>
>> English: <a href="http://www.sraoss.co.jp/index_en.php" rel="noreferrer" target="_blank">http://www.sraoss.co.jp/index_en.php</a><br>
>> Japanese:<a href="http://www.sraoss.co.jp" rel="noreferrer" target="_blank">http://www.sraoss.co.jp</a><br>
>><br>
>> > Hi Ishii San<br>
>> ><br>
>> > Can you have a look at the attached patch which tries to extract some<br>
>> > performance in the area of query parsing and query analysis for routing<br>
>> > decisions. Most of the performance gains from the changes in the patch<br>
>> can<br>
>> > be observed in large data INSERT statements.<br>
>> ><br>
>> > Patch contains the following changes<br>
>> > ==========<br>
>> > 1-- The idea here is since Pgpool-II only needs a very little information<br>
>> > about the queries especially for the insert queries to decide where it<br>
>> > needs to send the query,<br>
>> > for example: For the INSERT queries we only need the type of query and<br>
>> the<br>
>> > relation name.<br>
>> > But since the parser we use in Pgpool-II is taken from PostgreSQL source<br>
>> > which parses the complete query including the value lists ( which is not<br>
>> > required by Pgpool).<br>
>> > This parsing of value part seems very harmless in small statements but in<br>
>> > case of INSERTs with lots of column values and large data in each value<br>
>> > item, this becomes significant.<br>
>> > So this patch adds a smaller bison grammar rule to short circuit the<br>
>> INSERT<br>
>> > statement parsing when it gets the enough information required for<br>
>> > Pgpool-II.<br>
>> ><br>
>> > 2-- The patch also re-arranges some of the if statements in<br>
>> > pool_where_to_send() function and tries to make sure the pattern_compare<br>
>> > and pool_has_function_call calls should only be made when they are<br>
>> > absolutely necessary.<br>
>> ><br>
>> > 3--Another thing this patch does is, it tries to save the raw_parser()<br>
>> > calls in case of un-recognised queries. Instead of invoking the parser of<br>
>> > "dummy read" and "dummy write" queries in case of syntax error in<br>
>> original<br>
>> > query, the patch adds the functions to get pre-built parse_trees for<br>
>> these<br>
>> > dummy queries.<br>
>> ><br>
>> > 4-- strlen() call is removed from scanner_init() function and is passed<br>
>> in<br>
>> > as an argument to it, and the reason is we already have the query length<br>
>> in<br>
>> > most cases before invoking the parser so why waste CPU cycles on it.<br>
>> Again<br>
>> > this becomes significant in case of large query strings.<br>
>> ><br>
>> > Finally the patch tries to remove the unnecessary calls of<br>
>> > pool_is_likely_select()<br>
>> ><br>
>> > As mentioned above the area of improvements in this patch are mostly<br>
>> around<br>
>> > writing queries and for the testing purpose I used a INSERT query with<br>
>> > large binary insert data and I am getting a very huge performance gains<br>
>> > with this patch<br>
>> ><br>
>> > *Current Master Branch*<br>
>> > ================<br>
>> > usama=# \i sample.sql<br>
>> > id<br>
>> > -----<br>
>> > 104<br>
>> > (1 row)<br>
>> ><br>
>> > INSERT 0 1<br>
>> > Time: *2059.807* ms (00:02.060)<br>
>> ><br>
>> > *WITH PATCH*<br>
>> > ===============<br>
>> > usama=# \i sample.sql<br>
>> > id<br>
>> > -----<br>
>> > 102<br>
>> > (1 row)<br>
>> ><br>
>> > INSERT 0 1<br>
>> > Time: *314.237* ms<br>
>> ><br>
>> ><br>
>> > Performance gain* 655.50 %*<br>
>> ><br>
>> ><br>
>> > Comments and suggestions?<br>
>> ><br>
>> > Please let me know if you also want the test data I used for the INSERT<br>
>> test<br>
>> ><br>
>> > Thanks<br>
>> > Best regards<br>
>> > Muhammad Usama<br>
>><br>
</blockquote></div></div>