[pgpool-general: 2355] Re: Problems with recovered master node.

Tatsuo Ishii ishii at postgresql.org
Fri Dec 20 16:01:07 JST 2013


I see now the source of the problem:

> 2013-12-19 20:35:16 LOG:   pid 4328: statement: BEGIN; SET TRANSACTION
> ISOLATION LEVEL READ COMMITTED

pgpool-II cannot handle "multi statement queries" (multiple SQL
statement in single line like this). If pgpool-II finds such a query,
it sends to the primary node (in your case node 1) regardless the
contents of the query. So it sends "BEGIN" to node 1, while it does
not send to node 0. Later on "ROLLBACK" is sent to both node 0 and
1. Of course there's no explicit transaction is running on node 0, it
fails.

If such a query is generated by the flame work, There's not much
which pgpool-II can do...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> I actually don't know too  much about the internals of the application.
> And the problem is when the application is trying to start a connection.
> (The RollBack are probably internals of sqlalchemy, I really don't know)
> The only thing I know is that they are using sqlalchemy and flask.
> 
> Here is the pgppol log about the error:
> 2013-12-19 20:32:29 LOG:   pid 31014: received smart shutdown request
> 2013-12-19 20:32:30 LOG:   pid 4295: pgpool-II successfully started.
> version 3.3.2 (tokakiboshi)
> 2013-12-19 20:32:30 LOG:   pid 4295: find_primary_node: primary node id is 1
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SET DATESTYLE TO 'ISO'
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 0 backend pid: 5382
> statement: BEGIN
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: BEGIN
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 2 backend pid: 16458
> statement: BEGIN
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 0 backend pid: 5382
> statement: SET DATESTYLE TO 'ISO'
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: SET DATESTYLE TO 'ISO'
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 2 backend pid: 16458
> statement: SET DATESTYLE TO 'ISO'
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: COMMIT
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 2 backend pid: 16458
> statement: COMMIT
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 0 backend pid: 5382
> statement: COMMIT
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SHOW client_encoding
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: SHOW client_encoding
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SHOW
> default_transaction_isolation
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: SHOW default_transaction_isolation
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: BEGIN; SET TRANSACTION
> ISOLATION LEVEL READ COMMITTED
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: select version()
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: select version()
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: select current_schema()
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: select current_schema()
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: show transaction
> isolation level
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: show transaction isolation level
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SELECT CAST('test plain
> returns' AS VARCHAR(60)) AS anon_1
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: SELECT CAST('test
> unicode returns' AS VARCHAR(60)) AS anon_1
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
> 2013-12-19 20:35:16 LOG:   pid 4328: statement: ROLLBACK
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 1 backend pid: 9744
> statement: ROLLBACK
> 2013-12-19 20:35:16 LOG:   pid 4328: DB node id: 0 backend pid: 5382
> statement: ROLLBACK
> 2013-12-19 20:35:16 LOG:   pid 4328: pool_send_and_wait: Error or notice
> message from backend: : DB node id: 0 backend pid: 5382 statement:
> ROLLBACK message: there is no transaction in progress
> 2013-12-19 20:35:16 ERROR: pid 4328: read_kind_from_backend: 1 th kind C
> does not match with master or majority connection kind N
> 2013-12-19 20:35:16 ERROR: pid 4328: kind mismatch among backends.
> Possible last query was: "ROLLBACK" kind details are: 0[N: there is no
> transaction in progress] 1[C]
> 2013-12-19 20:35:16 LOG:   pid 4328: do_child: exits with status 1 due
> to error
> 
> And here is the stack trace:
> 2013-12-19 20:35:25,080 - application.flaskapp - ERROR - kind mismatch
> among backends. Possible last query was: "ROLLBACK" kind details are:
> 0[N: there is no transaction in progress] 1[C]
> HINT:  check data consistency among db nodes
> ERROR:  kind mismatch among backends. Possible last query was:
> "ROLLBACK" kind details are: 0[N: there is no transaction in progress] 1[C]
> HINT:  check data consistency among db nodes
> Traceback (most recent call last):
>   File
> "/usr/lib/python2.6/site-packages/application/server/flaskapp.py", line
> 34, in __init__
>     self._init_db()
>   File "/usr/lib/python2.6/site-packages/application/server/orgtool.py",
> line 47, in _init_db
>     self.db = ApplicationMutableDb(self.config)
>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
> 1014, in __init__
>     ApplicationAppendDb.__init__(self, *args, **kwargs)
>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
> 937, in __init__
>     ApplicationQueryDb.__init__(self, *args, **kwargs)
>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
> 234, in __init__
>     BaseDb.__init__(self, *args, **kwargs)
>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
> 192, in __init__
>     self.db_conn()
>   File "/usr/lib/python2.6/site-packages/application/server/db.py", line
> 216, in db_conn
>     self.conn = self.engine.connect()
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/base.py",
> line 2322, in connect
>     return self._connection_cls(self, **kwargs)
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/base.py",
> line 872, in __init__
>     self.__connection = connection or engine.raw_connection()
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/base.py",
> line 2408, in raw_connection
>     return self.pool.unique_connection()
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 169,
> in unique_connection
>     return _ConnectionFairy(self).checkout()
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 371,
> in __init__
>     rec = self._connection_record = pool._do_get()
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 758,
> in _do_get
>     return self._create_connection()
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 174,
> in _create_connection
>     return _ConnectionRecord(self)
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/pool.py", line 259,
> in __init__
>     pool.dispatch.first_connect.exec_once(self.connection, self)
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/event.py", line 265,
> in exec_once
>     self(*args, **kw)
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/event.py", line 274,
> in __call__
>     fn(*args, **kw)
>   File
> "/usr/lib/python2.6/site-packages/sqlalchemy/engine/strategies.py", line
> 166, in first_connect
>     dialect.initialize(c)
>   File
> "/usr/lib/python2.6/site-packages/sqlalchemy/dialects/postgresql/base.py",
> line 977, in initialize
>     super(PGDialect, self).initialize(connection)
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/default.py",
> line 187, in initialize
>     self.do_rollback(connection.connection)
>   File "/usr/lib/python2.6/site-packages/sqlalchemy/engine/default.py",
> line 299, in do_rollback
>     connection.rollback()
> InternalError: kind mismatch among backends. Possible last query was:
> "ROLLBACK" kind details are: 0[N: there is no transaction in progress] 1[C]
> HINT:  check data consistency among db nodes
> ERROR:  kind mismatch among backends. Possible last query was:
> "ROLLBACK" kind details are: 0[N: there is no transaction in progress] 1[C]
> HINT:  check data consistency among db nodes
> 
> 2013-12-19 20:35:25,085 - application - ERROR - Action failed: 500:
> Internal Server Error
> Traceback (most recent call last):
>   File "./orgtool", line 65, in main
>     org = self._get_orgtool(options.confbase)
>   File "./orgtool", line 19, in _get_orgtool
>     org = orgtool.OrgTool(confbase)
>   File "/usr/lib/python2.6/site-packages/application/server/orgtool.py",
> line 41, in __init__
>     BaseServer.__init__(self, environ, os.path.join(confbase,
> ".application"))
>   File
> "/usr/lib/python2.6/site-packages/application/server/flaskapp.py", line
> 37, in __init__
>     raise wzex.InternalServerError("DATABASE CONNECTION FAILED")
> 
> On 19/12/13 17:28, Tatsuo Ishii wrote:
>>> Hi.
>>>
>>> It wasn't installed, but I installed it and had the same result.
>> Can you show the PostgreSQL around "ROLLBACK"?
>>
>>> By the way, to install it I just need to execute this right?
>>> CREATE EXTENSION pgpool_regclass;
>> Yes.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>> On 18/12/13 18:17, Tatsuo Ishii wrote:
>>>> Have you installed pgpool_regclass?
>>>>
>>>> Best regards,
>>>> --
>>>> Tatsuo Ishii
>>>> SRA OSS, Inc. Japan
>>>> English: http://www.sraoss.co.jp/index_en.php
>>>> Japanese: http://www.sraoss.co.jp
>>>>
>>>>> Hi everyone.
>>>>>
>>>>> I'm having an issue with pgpool.
>>>>> First of all my configuration:
>>>>> 1 Server with pgpool-II version 3.3.2 (tokakiboshi)
>>>>> 3 Servers with:
>>>>> postgresql93-contrib-9.3.2-1PGDG.rhel6.x86_64
>>>>> postgresql93-devel-9.3.2-1PGDG.rhel6.x86_64
>>>>> postgresql93-libs-9.3.2-1PGDG.rhel6.x86_64
>>>>> postgresql93-9.3.2-1PGDG.rhel6.x86_64
>>>>> postgresql93-server-9.3.2-1PGDG.rhel6.x86_64
>>>>>
>>>>> All 4 servers are RHEL 6 machines. (Amazon AMI's actually)
>>>>>
>>>>> Pgpool is configured to have Master/Slave Nodes with replication
>>>>> and Load Balancing.
>>>>>
>>>>> #------------------------------------------------------------------------------
>>>>> # LOAD BALANCING MODE
>>>>> #------------------------------------------------------------------------------
>>>>>
>>>>> load_balance_mode = on
>>>>>                                    # Activate load balancing mode
>>>>>                                    # (change requires restart)
>>>>> ignore_leading_white_space = on
>>>>>                                    # Ignore leading white spaces of each
>>>>> query
>>>>> white_function_list = ''
>>>>>                                    # Comma separated list of function names
>>>>>                                    # that don't write to database
>>>>>                                    # Regexp are accepted
>>>>> black_function_list = 'nextval,setval'
>>>>>                                    # Comma separated list of function names
>>>>>                                    # that write to database
>>>>>                                    # Regexp are accepted
>>>>>
>>>>>
>>>>> #------------------------------------------------------------------------------
>>>>> # MASTER/SLAVE MODE
>>>>> #------------------------------------------------------------------------------
>>>>>
>>>>> master_slave_mode = on
>>>>>                                    # Activate master/slave mode
>>>>>                                    # (change requires restart)
>>>>> master_slave_sub_mode = 'stream'
>>>>>                                    # Master/slave sub mode
>>>>>                                    # Valid values are combinations slony or
>>>>>                                    # stream. Default is slony.
>>>>>                                    # (change requires restart)
>>>>>
>>>>> # - Streaming -
>>>>>
>>>>> sr_check_period = 1
>>>>>                                    # Streaming replication check period
>>>>>                                    # Disabled (0) by default
>>>>> sr_check_user = 'postgres'
>>>>>                                    # Streaming replication check user
>>>>>                                    # This is necessary even if you disable
>>>>>                                    # streaming replication delay check with
>>>>>                                    # sr_check_period = 0
>>>>> sr_check_password = ''
>>>>>                                    # Password for streaming replication
>>>>> check user
>>>>> delay_threshold = 1
>>>>>                                    # Threshold before not dispatching
>>>>> query to standby node
>>>>>                                    # Unit is in bytes
>>>>>                                    # Disabled (0) by default
>>>>>
>>>>> # - Special commands -
>>>>>
>>>>> follow_master_command = '/bin/follow_master.sh %d %h %D %m %H %R'
>>>>>                                    # Executes this command after master
>>>>> failover
>>>>>                                    # Special values:
>>>>>                                    #   %d = node id
>>>>>                                    #   %h = host name
>>>>>                                    #   %p = port number
>>>>>                                    #   %D = database cluster path
>>>>>                                    #   %m = new master node id
>>>>>                                    #   %H = hostname of the new master node
>>>>>                                    #   %M = old master node id
>>>>>                                    #   %P = old primary node id
>>>>>                                    #   %r = new master port number
>>>>>                                    #   %R = new master database cluster path
>>>>>                                    #   %% = '%' character
>>>>>
>>>>>
>>>>> #All three nodes are configured like this
>>>>> backend_hostnamen = 'ip_node_n'
>>>>> backend_portn = 5432
>>>>> backend_weightn = 0
>>>>> backend_data_directoryn = '/var/lib/pgsql/9.3/data/'
>>>>> backend_flagn = 'ALLOW_TO_FAILOVER'
>>>>>
>>>>> I'm using repmgr to handle the replication in the nodes.
>>>>>
>>>>>
>>>>> My problem occurs when I kill the master node, everything works as
>>>>> expected and a slave takes it's place.
>>>>> My application continues working with no major issues until I decide to
>>>>> recover the dead master node.
>>>>>
>>>>> After that pgool starts sending this message:
>>>>> 2013-12-18 17:19:07 LOG:   pid 30801: do_child: exits with status 1 due
>>>>> to error
>>>>> 2013-12-18 17:19:10 LOG:   pid 30802: pool_send_and_wait: Error or
>>>>> notice message from backend: : DB node id: 0 backend pid: 27585
>>>>> statement: ROLLBACK message: there is no transaction in progress
>>>>> 2013-12-18 17:19:10 ERROR: pid 30802: read_kind_from_backend: 1 th kind
>>>>> C does not match with master or majority connection kind N
>>>>> 2013-12-18 17:19:10 ERROR: pid 30802: kind mismatch among backends.
>>>>> Possible last query was: "ROLLBACK" kind details are: 0[N: there is no
>>>>> transaction in progress] 1[C]
>>>>>
>>>>> And I identified that the problem is that pgpool is sending a rollback
>>>>> to the old master node [node 0] (Something illegal as far as I know) and
>>>>> then all the transaction is cancelled and my application fails.
>>>>> If I promote the old master again, then the application works with no
>>>>> problem.
>>>>>
>>>>> Anyone has an idea why is this ocurring?
>>>>>
>>>>> Regards!
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>>
>>>>> Be aware that if you reply directly to this message, your reply may not be 
>>>>> secure. Do not send us communications that contain unencrypted confidential 
>>>>> information such as passwords, account numbers, or Social Security numbers.
>>>>>
>>>>> This message, including any attachments, is for the sole use of the 
>>>>> intended recipient(s) and may contain confidential and privileged 
>>>>> information. If you are not the intended recipient, please destroy all 
>>>>> copies of this message and any attachments. In addition, please notify 
>>>>> Gazzang immediately by email to info at gazzang.com.
>>>>> _______________________________________________
>>>>> pgpool-general mailing list
>>>>> pgpool-general at pgpool.net
>>>>> http://www.pgpool.net/mailman/listinfo/pgpool-general
>>>
>>> -- 
>>>
>>>
>>> Be aware that if you reply directly to this message, your reply may not be 
>>> secure. Do not send us communications that contain unencrypted confidential 
>>> information such as passwords, account numbers, or Social Security numbers.
>>>
>>> This message, including any attachments, is for the sole use of the 
>>> intended recipient(s) and may contain confidential and privileged 
>>> information. If you are not the intended recipient, please destroy all 
>>> copies of this message and any attachments. In addition, please notify 
>>> Gazzang immediately by email to info at gazzang.com.
> 
> 
> -- 
> 
> 
> Be aware that if you reply directly to this message, your reply may not be 
> secure. Do not send us communications that contain unencrypted confidential 
> information such as passwords, account numbers, or Social Security numbers.
> 
> This message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient, please destroy all 
> copies of this message and any attachments. In addition, please notify 
> Gazzang immediately by email to info at gazzang.com.


More information about the pgpool-general mailing list