Welcome to the Tutorial for pgpool-II. From here, you can learn how
to install, setup, and run parallel queries or do replication using
pgpool-II. We assume that you already know basic PostreSQL operations,
! so please refer to the PostgreSQL document if necessary.
- Table of Contents
*************** so please refer to the PostgreSQL docume
*** 37,43 ****
! First, we must learn how to install, configure pgpool-II and
database nodes before using replication or parallel query.
--- 37,43 ----
! First, we must learn how to install and configure pgpool-II and
database nodes before using replication or parallel query.
*************** permission on the installation directory
*** 65,83 ****
/usr/local
directory.
Note: pgpool-II requires libpq library in PostgreSQL 7.4
! or later (version 3 protocol). If configure
script
! displays the following error message, libpq library may not be
installed, or it is not of version 3.
configure: error: libpq is not installed or libpq is old
! If the library is version 3, but above message is still displayed,
your libpq library is probably not recognized by
! configure
script.
! configure
script searches for libpq library under
! /usr/local/pgsql
libaray. If you have installed
! PostgreSQL to a directory other than /usr/local/pgsql
,
use --with-pgsql
, or --with-pgsql-includedir
and --with-pgsql-libdir
command line options when you
execute configure
.
--- 65,83 ----
/usr/local
directory.
Note: pgpool-II requires libpq library in PostgreSQL 7.4
! or later (version 3 protocol). If the configure
script
! displays the following error message, the libpq library may not be
installed, or it is not of version 3.
configure: error: libpq is not installed or libpq is old
! If the library is version 3, but the above message is still displayed,
your libpq library is probably not recognized by
! the configure
script.
! The configure
script searches for the libpq library under
! /usr/local/pgsql
. If you have installed
! PostgreSQL in a directory other than /usr/local/pgsql
,
use --with-pgsql
, or --with-pgsql-includedir
and --with-pgsql-libdir
command line options when you
execute configure
.
*************** execute configure
.
*** 85,91 ****
pgpool-II configuration parameters are saved in
! pgpool.conf
file. The file is in "parameter =
value
" per line format. When you install pgpool-II,
pgpool.conf.sample
is automatically created. We recommend
copying and renaming it to pgpool.conf
, and edit it as
--- 85,91 ----
pgpool-II configuration parameters are saved in
! the pgpool.conf
file. The file is in "parameter =
value
" per line format. When you install pgpool-II,
pgpool.conf.sample
is automatically created. We recommend
copying and renaming it to pgpool.conf
, and edit it as
*************** port = 9999
*** 106,117 ****
! pgpool-II has an interface for administration purpose to retrieve
information on database nodes, shutdown pgpool-II, etc. via
network. To use PCP commands, user authentication is required. This
authentication is different from PostgreSQL's user authentication. A
! username and password need to be defined in pcp.conf
! file. In the file, a username and password are listed as a pair on
each line, and they are separated by a colon (:). Passwords are
encrypted in md5 hash format.
--- 106,117 ----
! pgpool-II has an interface for administrative purposes to retrieve
information on database nodes, shutdown pgpool-II, etc. via
network. To use PCP commands, user authentication is required. This
authentication is different from PostgreSQL's user authentication. A
! username and password need to be defined in the pcp.conf
! file. In the file, a user name and password are listed as a pair on
each line, and they are separated by a colon (:). Passwords are
encrypted in md5 hash format.
*************** automatically created. We recommend copy
*** 123,135 ****
$ cp /usr/local/etc/pcp.conf.sample /usr/local/etc/pcp.conf
! To encrypt your password into md5 hash format, use pg_md5 command,
! which is installed as a part of pgpool-II
! executables. pg_md5
takes text as an command line
argument, and displays its md5-hashed text.
! For example, give "postgres" as the command line argument, at
! pg_md5
displays md5-hashed text to the standard
output.
--- 123,135 ----
$ cp /usr/local/etc/pcp.conf.sample /usr/local/etc/pcp.conf
! To encrypt your password into md5 hash format, use the pg_md5 command,
! which is installed as one of pgpool-II's
! executables. pg_md5
takes text as a command line
argument, and displays its md5-hashed text.
! For example, give "postgres" as the command line argument, and
! pg_md5
displays md5-hashed text on its standard
output.
*************** SELECT queries are equally distributed a
*** 189,195 ****
The above command, however, prints no log messages because pgpool
detaches the terminal. If you want to show pgpool log messages, you
! pass -n
option to pgpool command. pgpool-II is executed
as non-daemon process, and the terminal will not be detached.
--- 189,195 ----
The above command, however, prints no log messages because pgpool
detaches the terminal. If you want to show pgpool log messages, you
! pass -n
option to pgpool command so pgpool-II is executed
as non-daemon process, and the terminal will not be detached.
*************** $ pgpool -n &
*** 199,235 ****
The log messages are printed on the terminal, so the recommended
! options to use are like the following.
$ pgpool -n -d > /tmp/pgpool.log 2>&1 &
! -d
option enables debug messages to be generated.
The above command keeps appending log messages to /tmp/pgpool.log. If
you need to rotate log files, pass the logs to a external command
! which have log rotation function.
! For example, you can use rotatelogs coming with Apache2:
$ pgpool -n 2>&1 | /usr/local/apache2/bin/rotatelogs \
-l -f /var/log/pgpool/pgpool.log.%A 86400 &
! This will generate a log file named "pgpool.log.Thursday" everyday then
! rotate it 00:00 at midnight. Rotatelogs adds log to a file if it already
exists. To delete old log files before rotation, you could use cron:
55 23 * * * /usr/bin/find /var/log/pgpool -type f -mtime +5 -exec /bin/rm -f '{}' \;
! Please note that rotaelogs may exist as /usr/sbin/rotatelogs2 in some
distributions.
-f option generates a log file as soon as rotatelogs starts and is
! available apache2 2.2.9 or greater.
Also cronolog
helps you.
--- 199,235 ----
The log messages are printed on the terminal, so the recommended
! options to use the following.
$ pgpool -n -d > /tmp/pgpool.log 2>&1 &
! The -d
option enables debug messages to be generated.
The above command keeps appending log messages to /tmp/pgpool.log. If
you need to rotate log files, pass the logs to a external command
! which has log rotation functionality.
! For example, you can use rotatelogs from Apache2:
$ pgpool -n 2>&1 | /usr/local/apache2/bin/rotatelogs \
-l -f /var/log/pgpool/pgpool.log.%A 86400 &
! This will generate a log file named "pgpool.log.Thursday" then
! rotate it 00:00 at midnight. Rotatelogs adds logs to a file if it already
exists. To delete old log files before rotation, you could use cron:
55 23 * * * /usr/bin/find /var/log/pgpool -type f -mtime +5 -exec /bin/rm -f '{}' \;
! Please note that rotatelogs may exist as /usr/sbin/rotatelogs2 in some
distributions.
-f option generates a log file as soon as rotatelogs starts and is
! available in apache2 2.2.9 or later.
Also cronolog
can be used.
*************** $ pgpool -n 2>&1 | /usr/sbin/cronolog \
*** 238,249 ****
'/var/log/pgsql/%Y-%m-%d-pgpool.log' &
! To stop pgpool-II process, execute the following command.
$ pgpool stop
! If any client is still connected, pgpool-II waits for them to
! disconnect, and then terminate itself. Execute the following command
instead if you want to shutdown pgpool-II forcibly.
$ pgpool -m fast stop
--- 238,249 ----
'/var/log/pgsql/%Y-%m-%d-pgpool.log' &
! To stop pgpool-II, execute the following command.
$ pgpool stop
! If any client is still connected, pgpool-II waits for it to
! disconnect, and then terminates itself. Run the following command
instead if you want to shutdown pgpool-II forcibly.
$ pgpool -m fast stop
*************** nodes.
*** 256,262 ****
In this section, we'll use three database nodes, which we have
already set up in section "1. Let's Begin!", and
takes you step by step to create a database replication system. Sample
! data to be replicated will be generated by pgbench benchmark
program.
--- 256,262 ----
In this section, we'll use three database nodes, which we have
already set up in section "1. Let's Begin!", and
takes you step by step to create a database replication system. Sample
! data to be replicated will be generated by the pgbench benchmark
program.
*************** href="#start-shutdown">Starting/Stopping
*** 286,296 ****
After configuring pgpool.conf
and restarting
pgpool-II, let's try the actual replication and see if everything is
! working OK.
First, we need to create a database to be replicated. We will name
it "bench_replication". This database needs to be created on all the
! nodes. Use createdb
commands through pgpool-II, and the
database will be created on all the nodes.
$ createdb -p 9999 bench_replication
--- 286,296 ----
After configuring pgpool.conf
and restarting
pgpool-II, let's try the actual replication and see if everything is
! working.
First, we need to create a database to be replicated. We will name
it "bench_replication". This database needs to be created on all the
! nodes. Use the createdb
command through pgpool-II, and the
database will be created on all the nodes.
$ createdb -p 9999 bench_replication
*************** branches, tellers, accounts, and history
*** 344,352 ****
! Data within the different range is stored in two or more data base nodes in a parallel Query. This is called a partitioning.
Moreover you could replicate some of tables among database nodes even
! in the parallel query mode.
To enable parallel query in pgpool-II, you must set up another
--- 344,352 ----
! Data within the range is stored in two or more data base nodes in a parallel Query. This is called a partitioning.
Moreover you could replicate some of tables among database nodes even
! in parallel query mode.
To enable parallel query in pgpool-II, you must set up another
*************** pgbench again to create sample data.
*** 368,374 ****
parallel_mode = true
! Setting paralle_mode
to true does not start parallel
query automatically. pgpool-II needs SystemDB and the rules
to know how to distribute data to the database nodes.
--- 368,374 ----
parallel_mode = true
! Setting parallel_mode
to true does not start parallel
query automatically. pgpool-II needs SystemDB and the rules
to know how to distribute data to the database nodes.
*************** configured so that pgpool-II accepts tho
*** 379,391 ****
listen_addresses = '*'
Attention: The replication is not done for the table that does the partitioning
though a parallel Query and the replication can be made effective at the same time.
!
Attention: You can have both partitioned tables and replicated
! tables. However a table cannot be a partioned and replicated one at
the same time. Because the data structure of partioned tables and
! replicated tables are different, "bench_replication" database created
in section "2. Your First Replication"
cannot be reused in parallel query mode.
--- 379,392 ----
listen_addresses = '*'
+
Attention: The replication is not done for the table that does the partitioning
though a parallel Query and the replication can be made effective at the same time.
!
Attention: You can have both partitioned tables and replicated
! tables. However a table cannot be a partioned and replicated at
the same time. Because the data structure of partioned tables and
! replicated tables is different, "bench_replication" database created
in section "2. Your First Replication"
cannot be reused in parallel query mode.
*************** load_balance_mode = true
*** 407,419 ****
"System database" is just an ordinaly database. The only
! requirement is, dblink functions and the dist_def table, which
describes partioning rule, must be installed in the system
! database. You could have system database on a database node, or you
! could have multiple node having system database by using cascade
! configuration of pgpool-II.
! In this section, we will create SystemDB within the 5432 port
node. The following list is the configuration parameters for
SystemDB
--- 408,420 ----
"System database" is just an ordinaly database. The only
! requirement is that dblink functions and the dist_def table, which
describes partioning rule, must be installed in the system
! database. You could have a system database on a database node, or you
! could have multiple nodes having system databases by using a cascade
! configuration in pgpool-II.
! In this section, we will create SystemDB on the 5432 port
node. The following list is the configuration parameters for
SystemDB
*************** $ createdb -p 5432 -O pgpool pgpool
*** 438,444 ****
3.2.1. Installing dblink
Next, we must install dblink into "pgpool" database. dblink is one
! of the tools included in contrib
directory in the
PostgreSQL source code.
To install dblink to your system, execute the following commands.
--- 439,445 ----
3.2.1. Installing dblink
Next, we must install dblink into "pgpool" database. dblink is one
! of the tools included in the contrib
directory in the
PostgreSQL source code.
To install dblink to your system, execute the following commands.
*************** $ USE_PGXS=1 make -C contrib/dblink inst
*** 449,455 ****
After dblink has been installed into your system, we will define
! dblink functions in "pgpool" database. If PostgreSQL is installed in
/usr/local/pgsql
, dblink.sql
(a file with
function definitions) should have been installed in
/usr/local/pgsql/share/contrib
. Now, execute the
--- 450,456 ----
After dblink has been installed into your system, we will define
! dblink functions in the "pgpool" database. If PostgreSQL is installed in
/usr/local/pgsql
, dblink.sql
(a file with
function definitions) should have been installed in
/usr/local/pgsql/share/contrib
. Now, execute the
*************** CREATE TABLE pgpool_catalog.dist_def (
*** 498,504 ****
A distribution rule decides how to distribute data to a
particular node. Data will be distributed depending on the value of
! "col_name" column. "dist_def_func" is a function that takes the value
of "col_name" as its argument, and returns an integer which
points to the appropriate database node ID where the data should be
stored.
--- 499,505 ----
A distribution rule decides how to distribute data to a
particular node. Data will be distributed depending on the value of
! the "col_name" column. "dist_def_func" is a function that takes the value
of "col_name" as its argument, and returns an integer which
points to the appropriate database node ID where the data should be
stored.
*************** nodes can be merged into one result.
*** 510,516 ****
3.2.3. Defining replicate_def table
If you want to use replicated tables in SELECT in parallel mode, you
! need to register information of such tables(replication rule) to a table
called replicate_def.
The replicate_def table has already been made when making it from the
system_db.sql file when dist_def is defined.
--- 511,517 ----
3.2.3. Defining replicate_def table
If you want to use replicated tables in SELECT in parallel mode, you
! need to register information about such tables (replication rule) in a table
called replicate_def.
The replicate_def table has already been made when making it from the
system_db.sql file when dist_def is defined.
*************** CREATE TABLE pgpool_catalog.replicate_de
*** 530,536 ****
replicate_def includes table's meta data information(dbname, schema_name, table_name, col_list, type_list).
! All the query analysis and query rewriting process are depending on the
information (table, column and type) stored in dist_def and/or
replicate_def table. If the information is not correct, analysis and
query rewriting process will produce wrong results.
--- 531,537 ----
replicate_def includes table's meta data information(dbname, schema_name, table_name, col_list, type_list).
! All the query analysis and query rewriting process are dependent on the
information (table, column and type) stored in dist_def and/or
replicate_def table. If the information is not correct, analysis and
query rewriting process will produce wrong results.
*************** sample data into three database nodes. T
*** 543,549 ****
by "pgbench -i -s 3" (i.e. scale factor of 3). We will create a new
database called "bench_parallel" for this section.
! In pgpool-II's source code, you can find
dist_def_pgbench.sql
file in sample
directoy. We will use this sample file here to create distribution
rules for pgbench. Execute the following command in extracted
--- 544,550 ----
by "pgbench -i -s 3" (i.e. scale factor of 3). We will create a new
database called "bench_parallel" for this section.
! In pgpool-II's source code, you can find the
dist_def_pgbench.sql
file in sample
directoy. We will use this sample file here to create distribution
rules for pgbench. Execute the following command in extracted
*************** pgpool-II source code directory.
*** 556,562 ****
Inside dist_def_pgbench.sql
, we are inserting one
row into "dist_def" table. There is a distribution
function for accounts table.
! For key-column, aid is defined for accounts respectively (which is primary keys)
INSERT INTO pgpool_catalog.dist_def VALUES (
--- 557,564 ----
Inside dist_def_pgbench.sql
, we are inserting one
row into "dist_def" table. There is a distribution
function for accounts table.
! For the key-column, aid is defined for accounts respectively (which is
! the primary key)
INSERT INTO pgpool_catalog.dist_def VALUES (
*************** INSERT INTO pgpool_catalog.dist_def VALU
*** 570,587 ****
);
! Now, we must define the distribution function for accounts table. Note
that you can use the same function from different tables. Also, you
can define functions using languages other than SQL (e.g. PL/pgSQL,
PL/Tcl, etc.).
The accounts table when data is initialized specifying 3 scale factor,
The value of the aid is 1 to 300000.
The function is defined so that data is evenly distributed to three data base nodes.
! SQL function will be defined as the return of the number of the data base node.
--- 572,590 ----
);
! Now, we must define the distribution function for the accounts table. Note
that you can use the same function from different tables. Also, you
can define functions using languages other than SQL (e.g. PL/pgSQL,
PL/Tcl, etc.).
+
The accounts table when data is initialized specifying 3 scale factor,
The value of the aid is 1 to 300000.
The function is defined so that data is evenly distributed to three data base nodes.
! An SQL function will be defined to return of the number of the data base node.
*************** $$ LANGUAGE sql;
*** 596,605 ****
--- 599,610 ----
+
The replication rule is the one that which table decides the replication whether to be done.
+
Here, it is made with pgbench With the branches table and tellers table are registered.
As a result, the accounts table and the inquiry that uses the branches table
*************** INSERT INTO pgpool_catalog.replicate_def
*** 623,630 ****
--- 628,637 ----
);
+
Replicate_def_pgbench.sql is prepared in sample directory.
+
In the directory that progresses the source code to define a replicate rule by using this as follows The psql command is executed.
*************** be restarted. Please refer to section "1
*** 638,649 ****
href="#start-shutdown">Starting/Stopping pgpool-II".
After configuring pgpool.conf
and restarting
! pgpool-II, let's try and see if parallel query is working OK.
First, we need to create a database to be distributed. We will name
it "bench_parallel". This database needs to be created on all the
! nodes. Use createdb
commands through pgpool-II, and the
! database will be created on all the nodes.
$ createdb -p 9999 bench_parallel
--- 645,656 ----
href="#start-shutdown">Starting/Stopping pgpool-II".
After configuring pgpool.conf
and restarting
! pgpool-II, let's try and see if parallel query is working.
First, we need to create a database to be distributed. We will name
it "bench_parallel". This database needs to be created on all the
! nodes. Use the createdb
command through pgpool-II, and the
! databases will be created on all the nodes.
$ createdb -p 9999 bench_parallel