User Tools

Site Tools


operations:db_ncedc:ops_man

PostgreSQL Test AQMS System Operations Manual

AQMS/DB: Commands

  • Start PostgreSQL database:
[postgres@rt1 ~]$ sudo systemctl [re]start postgresql-10
  • Stop PostgreSQL database:
[postgres@rt1 ~]$ sudo systemctl stop postgresql-10
  • PostgreSQL database Status:
[postgres@rt1 ~]$ systemctl status postgresql-10
  • Show RT/PP system role:
[aqms@rt2 ~]$ showrole
System is shadow
[aqms@rt2 ~]$ 

[aqms@pp2 events]$ showrole
System is primary
[aqms@pp2 events]$ 
  • AQMS command:
[aqms@rt1 ~]$ aqms -h
AQMS realtime-processing control script can be used to start,stop, restart, show status

 Usage: aqms [start|stop|status|restart|forcestop]

stop does a kill -1 and forcestop does a kill -9, returns 1 when stopped
status exits with 0 if all required processes are running and 1 if not.
restart does a stop 1 or 9 followed by start and status.

[aqms@rt1 ~]$ 
  • Start all AQMS+EW codes on each system:
[aqms@rt1 ~]$ aqms start
  • Status of what is running:
[aqms@rt1 ~]$ aqms status

 2334 java -Xms16m -Xmx512m -Dcms -jar /home/aqms/jars/QWServer.jar -configFile /home/aqms/cms/QWServerConfig.xml
 2375 grep -E /home/aqms/jars/QWServer.jar
adadup_slave /home/aqms/configs/adadup_slave/adadup_rt1_to_menlo1.cfg NOT RUNNING
adadup_slave /home/aqms/configs/adadup_slave/adadup_rt1_to_menlo2.cfg NOT RUNNING
adadup_slave /home/aqms/configs/adadup_slave/adadup_rt1_to_ucb1.cfg NOT RUNNING
adadup_slave /home/aqms/configs/adadup_slave/adadup_rt1_to_ucb2.cfg NOT RUNNING
ntrcg2 /home/aqms/configs/ntrcg2/ntrcg2.cfg running on pid 9237
eqrun /home/aqms/configs/eqrun/eqrun.cfg running on pid 9262
ec /home/aqms/configs/ec/ec.cfg running on pid 9173
tc2 /home/aqms/configs/tc2/tc2.cfg running on pid 9200
trig2db /home/aqms/configs/trig2db/trig2db.cfg running on pid 9223
trimag /home/aqms/configs/trimag/trimag.cfg NOT RUNNING
[aqms@rt1 ~]$ 
[aqms@rt1 ~]$ status
using default config file startstop_unix.d
NOTE: If next line reads "ERROR: tport_attach...", Earthworm is not running.
      Sent request for status; waiting for response...

                    EARTHWORM-64 SYSTEM STATUS

        Hostname-OS:            rt1 - Linux 3.10.0-1160.2.1.el7.x86_64
        Start time (UTC):       Mon Aug 23 21:59:56 2021
        Current time (UTC):     Fri Sep 17 16:01:06 2021
        Disk space avail:       35686244 kb
        Ring  1 name/key/size:  HYPO_RING / 1015 / 1024 kb
        Ring  2 name/key/size:  PICK_RING / 1005 / 1024 kb
        Ring  3 name/key/size:  FILTERPICK_RING / 1039 / 1024 kb
        Startstop's Log Dir:    /home/aqms/logs/
        Startstop's Params Dir: /home/aqms/ew/params/
        Startstop's Bin Dir:    /home/aqms/ew/bin
        Startstop Version:      v7.10 2019-08-13 (64 bit)

         Process  Process               Class/    CPU
          Name      Id      Status     Priority   Used    Argument
         -------  -------   ------     --------   ----    --------
       startstop    2152    Alive         ??/ 0 00:01:31  -
      import_ack   31047    Alive         ??/ 0 00:01:15  imp_pktrg_menlo1.d
      import_ack   31009    Alive         ??/ 0 00:01:15  imp_pktrg_menlo2.d
      import_ack    2228    Alive         ??/ 0 00:00:31  imp_pktrg_ucb1.d
      import_ack    2229    Alive         ??/ 0 00:00:31  imp_pktrg_ucb2.d
        pkfilter    2231    Alive         ??/ 0 00:15:02  pkfilter.d
   statrigfilter    2232    Alive         ??/ 0 00:15:26  statrigfilter.d
       binder_ew    2233    Alive         ??/ 0 00:12:00  binder_ew.d
      eqassemble    2234    Alive         ??/ 0 00:07:11  eqassemble.d
         hyps2ps    2235    Alive         ??/ 0 00:01:32  hyps2ps.d
     carlsubtrig    2236    Alive         ??/ 0 00:05:45  carlsubtrig.d
         trig2ps    2237    Alive         ??/ 0 00:01:31  trig2ps.d
         statmgr    2238    Alive         ??/ 0 00:01:40  statmgr.d
      telafeeder    2239    NoExec                     telafeeder.d
         diskmgr    2240    Alive         ??/ 0 00:01:28  diskmgr.d
      copystatus    2241    Alive         ??/ 0 00:01:26  PICK_RING HYPO_RING
      copystatus    2242    Alive         ??/ 0 00:01:21  FILTERPICK_RING <_RING

[aqms@rt1 ~]$ 
[aqms@pp1 ~]$ aqms status
aqms      2345  0.0  0.0  96308  4264 ?        S    Nov03   0:01 pws pws_dart.cfg
aqms      2346  0.0  0.0  14908  1464 ?        S    Nov03   0:14 conlog /home/aqms/logs/pws_dart_con
aqms      2351  0.0  0.0 113480  4592 ?        S    Nov03   0:01 pws pws_CE.cfg
aqms      2352  0.0  0.0  14908  1464 ?        S    Nov03   0:19 conlog /home/aqms/logs/pws_CE_con
aqms      2359  0.0  0.0  96308  4272 ?        S    Nov03   0:01 pws pws_UCB.cfg
aqms      2360  0.0  0.0  14908  1468 ?        S    Nov03   0:22 conlog /home/aqms/logs/pws_UCB_con
aqms      2367  0.0  0.0 113480  4604 ?        S    Nov03   0:01 pws pws_NN.cfg
aqms      2368  0.0  0.0  14908  1460 ?        S    Nov03   0:25 conlog /home/aqms/logs/pws_NN_con
aqms      2373  0.0  0.0 113480  4668 ?        S    Nov03   0:03 pws pws.cfg
aqms      2374  0.0  0.0  14908  1464 ?        S    Nov03   0:52 conlog /home/aqms/logs/pws_con
aqms     22902  0.1  0.0 219132  5212 ?        S    16:43   0:00 pws pws_UCB.cfg
aqms     22903  0.0  0.0 218168  3948 ?        S    16:43   0:00 pws pws_dart.cfg
aqms     22904  0.1  0.0 236312  5336 ?        S    16:43   0:00 pws pws_NN.cfg
aqms      2407  0.0  0.0 209876  7008 ?        S    Nov03   1:12 wanc wanc_AZBKBPLBPBUS.cfg
aqms      2408  0.0  0.0  14908  1464 ?        S    Nov03   0:00 conlog /home/aqms/logs/wanc_AZBKBPLBPBUS_con
aqms      2430  0.0  0.0 208728  5876 ?        S    Nov03   2:05 wanc wanc_BG.cfg
aqms      2431  0.0  0.0  14908  1464 ?        S    Nov03   0:00 conlog /home/aqms/logs/wanc_BG_con
aqms      2466  0.0  0.0 209812  6916 ?        S    Nov03   0:59 wanc wanc_CE.cfg
aqms      2467  0.0  0.0  14908  1464 ?        S    Nov03   0:01 conlog /home/aqms/logs/wanc_CE_con
aqms      2490  0.0  0.0 211464  6596 ?        S    Nov03   0:36 wanc wanc_CINN.cfg
aqms      2491  0.0  0.0  14908  1464 ?        S    Nov03   0:00 conlog /home/aqms/logs/wanc_CINN_con
aqms      2509  0.0  0.0 209332  6520 ?        S    Nov03   1:11 wanc wanc_NC.cfg
aqms      2510  0.0  0.0  14908  1464 ?        S    Nov03   0:00 conlog /home/aqms/logs/wanc_NC_con
aqms      2531  0.0  0.0 209404  6544 ?        S    Nov03   1:12 wanc wanc_GMGSNPPGSBSFUOUWWR.cfg
aqms      2532  0.0  0.0  14908  1468 ?        S    Nov03   0:00 conlog /home/aqms/logs/wanc_GMGSNPPGSBSFUOUWWR_con
solserver.pl running on pids 2758
2771
[aqms@pp1 ~]$ 
  • RT1 Processes:
UID        PID  PPID  C STIME TTY          TIME CMD

aqms      2334     1  0 Nov03 ?        00:11:06 java -Xms16m -Xmx512m -Dcms -jar /home/aqms/jars/QWServer.jar -configFile /home/aqms/cms/QWServerConfig.xml
aqms      9167     1  0 Nov03 ?        00:00:11 startstop
aqms      9173     1  0 Nov03 ?        00:00:09 ec /home/aqms/configs/ec/ec.cfg
aqms      9174     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/ec_console
aqms      9180  9167  0 Nov03 ?        00:01:34 import_ack imp_pktrg_menlo1.d
aqms      9181  9167  0 Nov03 ?        00:01:34 import_ack imp_pktrg_menlo2.d
aqms      9182  9167  0 Nov03 ?        00:00:03 import_ack imp_pktrg_ucb1.d
aqms      9183  9167  0 Nov03 ?        00:00:03 import_ack imp_pktrg_ucb2.d
aqms      9184  9167  0 Nov03 ?        00:02:00 pkfilter pkfilter.d
aqms      9185  9167  0 Nov03 ?        00:01:56 statrigfilter statrigfilter.d
aqms      9186  9167  0 Nov03 ?        00:02:05 binder_ew binder_ew.d
aqms      9188  9167  0 Nov03 ?        00:01:00 eqassemble eqassemble.d
aqms      9189  9167  0 Nov03 ?        00:00:11 hyps2ps hyps2ps.d
aqms      9190  9167  0 Nov03 ?        00:00:43 carlsubtrig carlsubtrig.d
aqms      9191  9167  0 Nov03 ?        00:00:11 trig2ps trig2ps.d
aqms      9192  9167  0 Nov03 ?        00:00:12 statmgr statmgr.d
aqms      9193  9167  0 Nov03 ?        00:00:00 [startstop] <defunct>
aqms      9194  9167  0 Nov03 ?        00:00:10 diskmgr diskmgr.d
aqms      9195  9167  0 Nov03 ?        00:00:10 copystatus PICK_RING HYPO_RING
aqms      9196  9167  0 Nov03 ?        00:00:10 copystatus FILTERPICK_RING HYPO_RING
aqms      9200     1  0 Nov03 ?        00:00:06 tc2 /home/aqms/configs/tc2/tc2.cfg
aqms      9201     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/tc2_console
aqms      9205  9188  0 Nov03 ?        00:00:00 eqcoda eqcoda.d
aqms      9222  9205  0 Nov03 ?        00:00:02 hyp2000_mgr hyp2000_mgr.d cal4-1.4.hyp
aqms      9223     1  0 Nov03 ?        00:00:05 trig2db /home/aqms/configs/trig2db/trig2db.cfg
aqms      9224     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/trig2db_console
aqms      9237     1  0 Nov03 ?        00:00:08 ntrcg2 /home/aqms/configs/ntrcg2/ntrcg2.cfg
aqms      9238     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/ntrcg2_console
aqms      9262     1  0 Nov03 ?        00:00:22 eqrun /home/aqms/configs/eqrun/eqrun.cfg
aqms      9263     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/eqrun_console

postgres  1426     1  0 Nov03 ?        00:00:11 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/
postgres  1932  1426  0 Nov03 ?        00:00:00 postgres: logger process   
postgres  2013  1426  0 Nov03 ?        00:00:01 postgres: checkpointer process   
postgres  2016  1426  0 Nov03 ?        00:00:03 postgres: writer process   
postgres  2018  1426  0 Nov03 ?        00:00:05 postgres: wal writer process   
postgres  2019  1426  0 Nov03 ?        00:00:06 postgres: autovacuum launcher process   
postgres  2020  1426  0 Nov03 ?        00:00:09 postgres: stats collector process   
postgres  2021  1426  0 Nov03 ?        00:00:00 postgres: bgworker: pglogical supervisor   
postgres  2022  1426  0 Nov03 ?        00:00:00 postgres: bgworker: logical replication launcher   
postgres  2222  1426  0 Nov03 ?        00:00:00 postgres: bgworker: pglogical manager 16384   
postgres  9385  1426  0 Nov03 ?        00:00:04 postgres: wal sender process repadmin 169.229.197.23(55226) idle
  • PP1 Processes:
UID        PID  PPID  C STIME TTY          TIME CMD

aqms      2345     1  0 Nov03 ?        00:00:01 pws pws_dart.cfg
aqms      2346     1  0 Nov03 ?        00:00:14 conlog /home/aqms/logs/pws_dart_con
aqms      2351     1  0 Nov03 ?        00:00:01 pws pws_CE.cfg
aqms      2352     1  0 Nov03 ?        00:00:19 conlog /home/aqms/logs/pws_CE_con
aqms      2359     1  0 Nov03 ?        00:00:01 pws pws_UCB.cfg
aqms      2360     1  0 Nov03 ?        00:00:22 conlog /home/aqms/logs/pws_UCB_con
aqms      2367     1  0 Nov03 ?        00:00:01 pws pws_NN.cfg
aqms      2368     1  0 Nov03 ?        00:00:25 conlog /home/aqms/logs/pws_NN_con
aqms      2373     1  0 Nov03 ?        00:00:03 pws pws.cfg
aqms      2374     1  0 Nov03 ?        00:00:52 conlog /home/aqms/logs/pws_con
aqms      2407     1  0 Nov03 ?        00:01:11 wanc wanc_AZBKBPLBPBUS.cfg
aqms      2408     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/wanc_AZBKBPLBPBUS_con
aqms      2430     1  0 Nov03 ?        00:02:05 wanc wanc_BG.cfg
aqms      2431     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/wanc_BG_con
aqms      2466     1  0 Nov03 ?        00:00:59 wanc wanc_CE.cfg
aqms      2467     1  0 Nov03 ?        00:00:01 conlog /home/aqms/logs/wanc_CE_con
aqms      2490     1  0 Nov03 ?        00:00:36 wanc wanc_CINN.cfg
aqms      2491     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/wanc_CINN_con
aqms      2509     1  0 Nov03 ?        00:01:11 wanc wanc_NC.cfg
aqms      2510     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/wanc_NC_con
aqms      2531     1  0 Nov03 ?        00:01:12 wanc wanc_GMGSNPPGSBSFUOUWWR.cfg
aqms      2532     1  0 Nov03 ?        00:00:00 conlog /home/aqms/logs/wanc_GMGSNPPGSBSFUOUWWR_con
aqms      2758     1  0 Nov03 ?        00:00:00 /bin/perl -w? /home/aqms/bin/solserver.pl -p 6600
aqms      2771     1  0 Nov03 ?        00:00:00 /bin/perl -w? /home/aqms/bin/solserver.pl -p 6601

postgres  1421     1  0 Nov03 ?        00:00:44 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/
postgres  1948  1421  0 Nov03 ?        00:00:00 postgres: logger process   
postgres  2127  1421  0 Nov03 ?        00:00:18 postgres: checkpointer process   
postgres  2128  1421  0 Nov03 ?        00:00:03 postgres: writer process   
postgres  2129  1421  0 Nov03 ?        00:00:18 postgres: wal writer process   
postgres  2130  1421  0 Nov03 ?        00:00:06 postgres: autovacuum launcher process   
postgres  2131  1421  0 Nov03 ?        00:00:28 postgres: stats collector process   
postgres  2132  1421  0 Nov03 ?        00:00:00 postgres: bgworker: pglogical supervisor   
postgres  2133  1421  0 Nov03 ?        00:00:00 postgres: bgworker: logical replication launcher   
postgres  2252  1421  0 Nov03 ?        00:00:00 postgres: bgworker: pglogical manager 16384   
postgres  3506  1421  0 Nov03 ?        00:00:54 postgres: wal sender process repadmin 130.118.43.106(58750) streaming F/3151AF8
postgres  9408  1421  0 Nov03 ?        00:00:24 postgres: bgworker: pglogical apply 16384:1687662095   

AQMS: Metadata Loading

LoadStationXML (from aqms_ir), deletes all the meta-data for a station when you load the StationXML in.
The assumption is that the StationXML file contains everything that you want to have in the database for
that station. The script also cannot deal with multiple station epochs yet (multiple channel and other
epochs is fine). If you have a newer version: the pole-zero part is a bit buggy, for example, can load
them only once, so I made it an option: we don't need the poles and zeros loaded at the moment and have
no time to debug that part now. Was added by ISTI for use in northern Cal.
Remark:
Currently, it seems to populate only the following tables:
  * d_abbreviation
  * d_unit
  * d_format
  * station_data
  * channel_data
  * simple_response
  * channelmap_ampparms   <--- Should not populate this table
  * channelmap_codaparms  <--- Should not populate this table
  * sensitivity

loadStationXML from aqms_ir has a PZ filling option, but that needs some work (for UW to be interested
in using it, NCSN may be able to use as-is).
  • Metadata loading at NCSS:
    • As the 'aqms' user on the active pp, rt1 & rt2, cd into /home/aqms/configs/metadata/aqms-ir/.
    • To load all stations for all networks:
      • Run 'getNCEDCstaXML' to retrieve all stationXML files from the NCEDC.
      • Run 'loadAllStaXML.sh' to load all stations (takes ~1h30).
    • To load a given network:
      • Run 'getNCEDCstaXML <Network>' to retrieve all stationXML files for the given network.
      • Run 'loadAllStaXML.sh' to load all stations for the given network.
    • To load a given station:
      • Run 'getNCEDCstaXML <Network> <Station>' to retrieve the stationXML file for the given station.
      • Run 'loadAllStaXML.sh' to load the station.
    • To load a list of networks and stations:
      • Run 'getFromList' to retrieve the stationXML files for the networks and stations specified in 'netsta.lst', e.g.:
[aqms@rt1 aqms-ir]$ cat netsta.lst 
SB
BK MERC
NP 1858
UO
[aqms@rt1 aqms-ir]$ 
  • Note: Instead of running those scripts on all machines, run it on one (e.g. rt1) and execute the script 'refreshMetadata.csh' on rt2 & the active pp.

AQMS: Adding a Station

Remark:
This is currently a workaround. In production mode:
 * StationUI will be used to maintain:
    * The AppChannels & Applications tables (for auxiliary lists and section 101?).
 * AppChan.sql will be used to generate:
    * The AppChannels & Applications tables (based on the auxiliary lists).
    * The Config_Channel & Program tables.
 * codaparamsDB will generate:
    * The ChannelMap_AmpParms & ChannelMap_CodaParms tables,
    * The StaCorrections table section Md. 
    * The AppChannels & Applications tables (section 111).
  * loadCISNmagcorr will generate:
    * The StaCorrections table section Ml (changes very rarely).
  * Logger_Info table update? Integrate with metadata updates?
  • AppChannels/StaCorrections loading at NCSS:
    • For the Application tables, cd into /home/aqms/configs/db/ncsn_sql/AppCha/.
    • Run the following script on an Oracle DB (e.g. RTUCB2) to create the file 'app_chan_101_111.sql':
set colsep ','
set echo off
set feedback off
set linesize 1000
set pagesize 0
set sqlprompt ''
set trimspool on
set headsep off

spool app_chan_101_111.sql
select 'INSERT INTO AppChannels (progid,net,sta,seedchan,location,config,ondate,offdate,lddate) VALUES ('||''''||progid||''''||','||''''||net||''''||','||''''||sta||''''||','||''''||seedchan||''''||','||''''||location||''''||','||''''||config||''''||','||''''||ondate||''''||','||''''||offdate||''''||','||''''||lddate||''''||');' FROM AppChannels WHERE progid IN (101,111);
spool off
quit
  • Run the following script on an Oracle DB (e.g. RTUCB2) to create the file 'stacorrections.sql':
set colsep ','
set echo off
set feedback off
set linesize 1000
set pagesize 0
set sqlprompt ''
set trimspool on
set headsep off

spool stacorrections.sql
select 'INSERT INTO StaCorrections (net,sta,seedchan,channel,channelsrc,location,auth,corr,corr_flag,corr_type,ondate,offdate,lddate) VALUES ('||''''||net||''''||','||''''||sta||''''||','||''''||seedchan||''''||','||''''||channel||''''||','||''''||channelsrc||''''||','||''''||location||''''||','||''''||auth||''''||','||''''||corr||''''||','||''''||corr_flag||''''||','||''''||corr_type||''''||','||''''||ondate||''''||','||''''||offdate||''''||','||''''||lddate||''''||');' FROM StaCorrections;
spool off
quit
  • FTP the files 'app_chan_101_111.sql' and 'stacorrections.sql' to the PostgreSQL system in /home/aqms/configs/db/ncsn_sql/AppCha/.
  • Update the 'appChan_pg.sql' script if needed (Oracle version on ucbpp:/home/ncss/run/SQL).
  • Run 'refreshAppCha.csh' to refresh the Applications, AppChannels & StaCorrections table.

AQMS: Real Time Role Switch

NOTES:

The command 'switchrole' can be launched on either rt1 or rt2.
The real time switch should happen before the post processing one because currently a PP system
is considered primary at startup if its corresponding RT system is primary (showrole script logic).
  • Switchover:
[aqms@rt1 ~]$ switchrole 
Will switch rt1 (rtdb1) from primary to shadow
and rt2(rtdb2) from shadow to primary
OK (Y or N):y
Will do it.
rtem@rtdb1
System was primary, changing database to shadow
spawn psql -q -h rt1.geo.berkeley.edu -d rtdb1 -U rtem -W
Password for user rtem: 
rtdb1=> \i /home/aqms/utils/scripts/set_shadow.sql
rtdb1=> \q
rtem@rtdb2
System was shadow, changing database to primary
spawn psql -q -h rt2.wr.usgs.gov -d rtdb2 -U rtem -W
Password for user rtem: 
rtdb2=> \i /home/aqms/utils/scripts/set_primary.sql
rtdb2=> \q
[aqms@rt1 ~]$ switchrole 
Will switch rt1 (rtdb1) from shadow to primary
and rt2(rtdb2) from primary to shadow
OK (Y or N):n
NOT doing it. Exit.
  • Primary Status:
[aqms@rt1 ~]$ psql -q -h rt1.geo.berkeley.edu -d rtdb1 -U rtem -W
Password for user rtem: 
rtdb1=> select * from rt_role;
 primary_system |     modification_time      
----------------+----------------------------
 FALSE          | 2019-11-01 00:00:00
 true           | 2019-11-25 21:28:10.59765
 false          | 2021-06-30 22:11:19.622779
(3 rows)

rtdb1=> \q
  • Standby Status:
[aqms@rt1 ~]$ psql -q -h rt2.wr.usgs.gov -d rtdb2 -U rtem -W
Password for user rtem: 
rtdb2=> select * from rt_role;
 primary_system |     modification_time      
----------------+----------------------------
 TRUE           | 2019-05-01 00:00:00
 false          | 2019-11-25 23:08:39.9445
 true           | 2021-06-30 22:11:19.731711
(3 rows)

rtdb2=> \q
[aqms@rt1 ~]$ exit

AQMS/DB: Post Processing Role Switch

NOTE:

MAKE SURE that the DB instances are started and stopped with systemctl, not pg_ctl.
          systemctl status postgresql-10 
          sudo systemctl stop postgresql-10 
          sudo systemctl [re]start postgresql-10 
  • Post Processing Failover:
- In a controlled environment (e.g. patching the active PP):

  * On the standby system as 'postgres', do a dry run first:

% repmgr standby switchover --username repadmin --verbose --remote-user=postgres --force-rewind --dry-run

  * If you get no errors from the dry run, on the primary system as 'aqms' shut down AQMS:

% aqms stop

  * Check to see if anything is still running, kill remaining processes:

% aqms status

  * On the standby system, sync the event waveform archive:

% rsync -av -e ssh --progress aqms@{primary}:/home/aqms/data/events/active/ /home/aqms/data/events/active
  where {primary} = {pp1|pp2}

  * On the standby system as 'postgres', do the switchover:

% repmgr standby switchover --username repadmin --verbose --remote-user=postgres --force-rewind

  * On the NEW primary system as 'aqms', start AQMS:

% aqms start

  * Check to make sure that all processes are running:

% aqms status


- In an emergency (e.g. active PP has crashed):

  If something bad happened or you don't expect to be able to have the
  standby become primary, instead of using --force-rewind use this
  command which kills the original primary after taking over.

% repmgr standby switchover --always-promote

  * Stop AQMS on the old primary if it isn't already, and start it up on the new primary (see above).
  • Switchover dry run:
[postgres@pp2 ~]$ repmgr standby switchover --username repadmin --verbose --remote-user=postgres --force-rewind --dry-run
INFO: checking for package configuration file "/etc/repmgr/10/repmgr.conf"
INFO: configuration file found at: "/etc/repmgr/10/repmgr.conf"
WARNING: following problems with command line parameters detected:
  database connection parameters not required when executing STANDBY SWITCHOVER
NOTICE: checking switchover on node "walreceiver" (ID: 2) in --dry-run mode
WARNING: no superuser connection available
DETAIL: it is recommended to perform switchover operations with a database superuser
HINT: provide the name of a superuser with -S/--superuser
INFO: searching for primary node
INFO: checking if node 1 is primary
INFO: current primary node is 1
INFO: prerequisites for using pg_rewind are met
INFO: SSH connection to host "pp1" succeeded
INFO: able to execute "repmgr" on remote host "pp1"
INFO: 1 walsenders required, 3 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: archive mode is "off"
INFO: replication lag on this standby is 0 seconds
NOTICE: local node "walreceiver" (ID: 2) would be promoted to primary; current primary "pp1" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "pp1":
  "sudo /bin/systemctl stop postgresql-10"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
[postgres@pp2 ~]$ 
  • Switchover:
[postgres@pp2 ~]$ repmgr standby switchover --username repadmin --verbose --remote-user=postgres --force-rewind
INFO: checking for package configuration file "/etc/repmgr/10/repmgr.conf"
INFO: configuration file found at: "/etc/repmgr/10/repmgr.conf"
WARNING: following problems with command line parameters detected:
  database connection parameters not required when executing STANDBY SWITCHOVER
NOTICE: executing switchover on node "walreceiver" (ID: 2)
WARNING: no superuser connection available
DETAIL: it is recommended to perform switchover operations with a database superuser
HINT: provide the name of a superuser with -S/--superuser
INFO: searching for primary node
INFO: checking if node 1 is primary
INFO: current primary node is 1
INFO: prerequisites for using pg_rewind are met
INFO: SSH connection to host "pp1" succeeded
INFO: archive mode is "off"
INFO: replication lag on this standby is 0 seconds
NOTICE: local node "walreceiver" (ID: 2) will be promoted to primary; current primary "pp1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "pp1" (ID: 1)
WARNING: a superuser connection is required to issue a CHECKPOINT
HINT: provide a superuser with -S/--superuser
DETAIL: executing server command "sudo /bin/systemctl stop postgresql-10"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 33/22B08078
NOTICE: promoting standby to primary
DETAIL: promoting server "walreceiver" (ID: 2) using "pg_ctl  -w -D '/home/pgsql/data' promote"
waiting for server to promote.... done
server promoted
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
INFO: standby promoted to primary after 0 second(s)
NOTICE: STANDBY PROMOTE successful
DETAIL: server "walreceiver" (ID: 2) was successfully promoted to primary
WARNING: no superuser connection available, unable to issue CHECKPOINT
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node's recovery point: 33/22B08078; rejoin target node's fork point: 33/22B080E8
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=pp1 user=repadmin dbname=repmgr connect_timeout=2 application_name=pp1"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "sudo /bin/systemctl start postgresql-10"
WARNING: node "pp1" not found in "pg_stat_replication"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
INFO: node "pp1" (ID: 1) is pingable
WARNING: node "pp1" not found in "pg_stat_replication"
INFO: waiting for node "pp1" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
DETAIL: checking for record in node "walreceiver"'s "pg_stat_replication" table where "application_name" is "pp1"
INFO: node "pp1" (ID: 1) has attached to its upstream node
NOTICE: node  "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby
NOTICE: switchover was successful
DETAIL: node "walreceiver" is now primary and node "pp1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
[postgres@pp2 ~]$ 
  • Primary Status:
[postgres@pp1 ~]$ repmgr node status
Node "pp1":
	PostgreSQL version: 10.10
	Total data size: 2172 MB
	Conninfo: host=pp1 user=repadmin dbname=repmgr connect_timeout=2
	Role: primary
	WAL archiving: off
	Archive command: (none)
	Replication connections: 1 (of maximal 3)
	Replication slots: 0 physical (of maximal 80; 0 missing)
	Replication lag: n/a

[postgres@pp1 ~]$ psql
psql (10.10)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 22980
usesysid         | 18520
usename          | repadmin
application_name | walreceiver
client_addr      | 130.118.43.106
client_hostname  |
client_port      | 39488
backend_start    | 2021-05-27 08:53:11.665653-07
backend_xmin     |
state            | streaming
sent_lsn         | 28/5A603470
write_lsn        | 28/5A603470
flush_lsn        | 28/5A603470
replay_lsn       | 28/5A603470
write_lag        | 00:00:00.004746
flush_lag        | 00:00:00.005036
replay_lag       | 00:00:00.005777
sync_priority    | 0
sync_state       | async

postgres=# \q
[postgres@pp1 ~]$
  • Standy Status:
[postgres@pp2 ~]$ repmgr node status
Node "walreceiver":
	PostgreSQL version: 10.14
	Total data size: 2172 MB
	Conninfo: host=pp2 user=repadmin dbname=repmgr connect_timeout=2
	Role: standby
	WAL archiving: off
	Archive command: (none)
	Replication connections: 0 (of maximal 3)
	Replication slots: 0 physical (of maximal 80; 0 missing)
	Upstream node: pp1 (ID: 1)
	Replication lag: 0 seconds
	Last received LSN: 28/5A3D6EC8
	Last replayed LSN: 28/5A3D6EC8

[postgres@pp2 ~]$ psql
psql (10.14)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 2913
status                | streaming
receive_start_lsn     | 28/1D000000
receive_start_tli     | 11
received_lsn          | 28/5A501748
received_tli          | 11
last_msg_send_time    | 2021-05-27 13:23:52.2518-07
last_msg_receipt_time | 2021-05-27 13:23:55.041877-07
latest_end_lsn        | 28/5A501748
latest_end_time       | 2021-05-27 13:23:52.2518-07
slot_name             | 
conninfo              | user=repadmin passfile=/var/lib/pgsql/.pgpass connect_timeout=2 dbname=replication host=pp1 port=5432 application_name=walreceiver fallback_application_name=walreceiver sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any

postgres=# \q
[postgres@pp2 ~]$ 

DB: Shadow Post Processing DB Recovery

  • Streaming Replication Status (shadow node pp1 is detached):
[postgres@pp2 ~]$ repmgr cluster show
WARNING: node "pp1" not found in "pg_stat_replication"
 ID | Name        | Role    | Status    | Upstream      | Location | Priority | Timeline | Connection string
----+-------------+---------+-----------+---------------+----------+----------+----------+--------------------------------------------------------
 1  | pp1         | standby |   running | ! walreceiver | default  | 100      | 12       | host=pp1 user=repadmin dbname=repmgr connect_timeout=2
 2  | walreceiver | primary | * running |               | default  | 100      | 12       | host=pp2 user=repadmin dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - node "pp1" (ID: 1) is not attached to its upstream node "walreceiver" (ID: 2)

[postgres@pp2 ~]$
  • If WAL segments have not yet been recycled:
==> Run the following on the shadow PP system (e.g. pp1):
[postgres@pp1 ~]$ repmgr node rejoin --force-rewind --verbose -d 'host= pp2 user=repadmin dbname=repmgr connect_timeout=2'
  • If WAL segments have been recycled:
- If errors are present in log or pg_rewind fails when running 'repmgr node rejoin':

2021-07-01 14:46:12.115 PDT [2491] ERROR:  requested WAL segment 0000000C0000003A000000AA has already been removed

First on the shadow PP DB, move $PGDATA out of the way and back it up if you want to.
Then run:

/usr/pgsql-10/bin/pg_basebackup -h source.host.com -R -D $PGDATA -U repadmin -v -P 

The "-R" generates a recovery file when doing this copy, so when you start up this instance it will be a streaming standby kept in sync with source.host.com.
  • Recovery example:
[postgres@pp1 ~]$ sudo systemctl stop postgresql-10

[postgres@pp1 ~]$ pwd
/var/lib/pgsql
[postgres@pp1 ~]$ mv 10 10.0
[postgres@pp1 ~]$ /usr/pgsql-10/bin/pg_basebackup -h pp2 -R -D $PGDATA -U repadmin -v -P
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 6D/4F1BE930 on timeline 12
pg_basebackup: starting background WAL receiver
5474940/5474940 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 6D/83E8EE50
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
[postgres@pp1 ~]$

[postgres@pp1 ~]$ cd 10/data
[postgres@pp1 ~]$ cp ../../10.0/data/recovery.conf .
[postgres@pp1 ~]$ cp ../../10.0/data/recovery.done .
[postgres@pp1 ~]$ cp ../../10.0/data/pg_hba.conf .

[postgres@pp1 ~]$ sudo systemctl start postgresql-10

[postgres@pp1 ~]$ repmgr node check
INFO: connection is not a superuser connection, falling back to simple check
HINT: provide a superuser with -S/--superuser, or add the "repadmin" user to role "pg_read_all_settings" or "pg_monitor"
Node "pp1":
        Server role: OK (node is standby)
        Replication lag: OK (0 seconds)
        WAL archiving: OK (0 pending archive ready files)
        Upstream connection: OK (node "pp1" (ID: 1) is attached to expected upstream node "walreceiver" (ID: 2))
        Downstream servers: OK (this node has no downstream nodes)
        Replication slots: OK (node has no physical replication slots)
        Missing physical replication slots: OK (node has no missing physical replication slots)
        Configured data directory: OK (configured "data_directory" is "/home/pgsql/data")
[postgres@pp1 ~]$ repmgr node status
Node "pp1":
        PostgreSQL version: 10.10
        Total data size: 3746 MB
        Conninfo: host=pp1 user=repadmin dbname=repmgr connect_timeout=2
        Role: standby
        WAL archiving: off
        Archive command: (none)
        Replication connections: 0 (of maximal 3)
        Replication slots: 0 physical (of maximal 80; 0 missing)
        Upstream node: walreceiver (ID: 2)
        Replication lag: 0 seconds
        Last received LSN: 6D/869FEEA8
        Last replayed LSN: 6D/869FEEA8

[postgres@pp1 ~]$ repmgr cluster show
 ID | Name        | Role    | Status    | Upstream    | Location | Priority | Timeline | Connection string
----+-------------+---------+-----------+-------------+----------+----------+----------+--------------------------------------------------------
 1  | pp1         | standby |   running | walreceiver | default  | 100      | 12       | host=pp1 user=repadmin dbname=repmgr connect_timeout=2
 2  | walreceiver | primary | * running |             | default  | 100      | 12       | host=pp2 user=repadmin dbname=repmgr connect_timeout=2
[postgres@pp1 ~]$

[postgres@pp2 ~]$ repmgr node check
INFO: connection is not a superuser connection, falling back to simple check
HINT: provide a superuser with -S/--superuser, or add the "repadmin" user to role "pg_read_all_settings" or "pg_monitor"
Node "walreceiver":
        Server role: OK (node is primary)
        Replication lag: OK (N/A - node is primary)
        WAL archiving: OK (0 pending archive ready files)
        Upstream connection: OK (N/A - node is primary)
        Downstream servers: OK (1 of 1 downstream nodes attached)
        Replication slots: OK (node has no physical replication slots)
        Missing physical replication slots: OK (node has no missing physical replication slots)
        Configured data directory: OK (configured "data_directory" is "/home/pgsql/data")
[postgres@pp2 ~]$ repmgr node status
Node "walreceiver":
        PostgreSQL version: 10.14
        Total data size: 3746 MB
        Conninfo: host=pp2 user=repadmin dbname=repmgr connect_timeout=2
        Role: primary
        WAL archiving: off
        Archive command: (none)
        Replication connections: 1 (of maximal 3)
        Replication slots: 0 physical (of maximal 80; 0 missing)
        Replication lag: n/a

[postgres@pp2 ~]$ repmgr cluster show
 ID | Name        | Role    | Status    | Upstream    | Location | Priority | Timeline | Connection string
----+-------------+---------+-----------+-------------+----------+----------+----------+--------------------------------------------------------
 1  | pp1         | standby |   running | walreceiver | default  | 100      | 12       | host=pp1 user=repadmin dbname=repmgr connect_timeout=2
 2  | walreceiver | primary | * running |             | default  | 100      | 12       | host=pp2 user=repadmin dbname=repmgr connect_timeout=2
[postgres@pp2 ~]$

DB: Streaming Replication Status

  • Node Status:
[postgres@pp2 ~]$ repmgr node check
WARNING: node "pp1" not found in "pg_stat_replication"
INFO: connection is not a superuser connection, falling back to simple check
HINT: provide a superuser with -S/--superuser, or add the "repadmin" user to role "pg_read_all_settings" or "pg_monitor"
Node "walreceiver":
	Server role: OK (node is primary)
	Replication lag: OK (N/A - node is primary)
	WAL archiving: OK (0 pending archive ready files)
	Upstream connection: OK (N/A - node is primary)
	Downstream servers: CRITICAL (1 of 1 downstream nodes not attached; missing: pp1 (ID: 1))
	Replication slots: OK (node has no physical replication slots)
	Missing physical replication slots: OK (node has no missing physical replication slots)
	Configured data directory: OK (configured "data_directory" is "/home/pgsql/data")
[postgres@pp2 ~]$ 
[postgres@pp2 ~]$ repmgr node status
WARNING: node "pp1" not found in "pg_stat_replication"
Node "walreceiver":
	PostgreSQL version: 10.14
	Total data size: 3683 MB
	Conninfo: host=pp2 user=repadmin dbname=repmgr connect_timeout=2
	Role: primary
	WAL archiving: off
	Archive command: (none)
	Replication connections: 0 (of maximal 3)
	Replication slots: 0 physical (of maximal 80; 0 missing)
	Replication lag: n/a

WARNING: following issue(s) were detected:
  - 1 of 1 downstream nodes not attached:
    - pp1 (ID: 1)

HINT: execute "repmgr node check" for more details
[postgres@pp2 ~]$ 
[postgres@pp1 ~]$ repmgr node check
WARNING: node "pp1" not found in "pg_stat_replication"
INFO: connection is not a superuser connection, falling back to simple check
HINT: provide a superuser with -S/--superuser, or add the "repadmin" user to role "pg_read_all_settings" or "pg_monitor"
Node "pp1":
	Server role: OK (node is standby)
	Replication lag: CRITICAL (3678118 seconds, critical threshold: 600))
	WAL archiving: OK (0 pending archive ready files)
	Upstream connection: CRITICAL (node "pp1" (ID: 1) is not attached to expected upstream node "walreceiver" (ID: 2))
	Downstream servers: OK (this node has no downstream nodes)
	Replication slots: OK (node has no physical replication slots)
	Missing physical replication slots: OK (node has no missing physical replication slots)
	Configured data directory: OK (configured "data_directory" is "/home/pgsql/data")
[postgres@pp1 ~]$ 
[postgres@pp1 ~]$ repmgr node status
Node "pp1":
	PostgreSQL version: 10.10
	Total data size: 2639 MB
	Conninfo: host=pp1 user=repadmin dbname=repmgr connect_timeout=2
	Role: standby
	WAL archiving: off
	Archive command: (none)
	Replication connections: 0 (of maximal 3)
	Replication slots: 0 physical (of maximal 80; 0 missing)
	Upstream node: walreceiver (ID: 2)
	Replication lag: 3678160 seconds
	Last received LSN: 3A/AA000000
	Last replayed LSN: 3A/AA19ECF8

[postgres@pp1 ~]$ 
  • Cluster Status:
[postgres@pp2 ~]$ repmgr cluster show
WARNING: node "pp1" not found in "pg_stat_replication"
 ID | Name        | Role    | Status    | Upstream      | Location | Priority | Timeline | Connection string                                     
----+-------------+---------+-----------+---------------+----------+----------+----------+--------------------------------------------------------
 1  | pp1         | standby |   running | ! walreceiver | default  | 100      | 12       | host=pp1 user=repadmin dbname=repmgr connect_timeout=2
 2  | walreceiver | primary | * running |               | default  | 100      | 12       | host=pp2 user=repadmin dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - node "pp1" (ID: 1) is not attached to its upstream node "walreceiver" (ID: 2)

[postgres@pp2 ~]$ repmgr cluster event
 Node ID | Name        | Event              | OK | Timestamp           | Details                                                                                                         
---------+-------------+--------------------+----+---------------------+------------------------------------------------------------------------------------------------------------------
 1       | pp1         | node_rejoin        | t  | 2021-07-01 13:49:18 | node 1 is now attached to node 2                                                                                
 2       | walreceiver | standby_switchover | t  | 2021-06-03 13:32:54 | node  "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby                          
 2       | walreceiver | standby_promote    | t  | 2021-06-03 13:32:50 | server "walreceiver" (ID: 2) was successfully promoted to primary                                               
 1       | pp1         | standby_switchover | t  | 2021-04-27 00:56:54 | node  "pp1" (ID: 1) promoted to primary, node "walreceiver" (ID: 2) demoted to standby                          
 1       | pp1         | standby_promote    | t  | 2021-04-27 00:56:52 | server "pp1" (ID: 1) was successfully promoted to primary                                                       
 2       | walreceiver | standby_switchover | t  | 2021-04-27 00:53:00 | node  "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby                          
 2       | walreceiver | standby_promote    | t  | 2021-04-27 00:52:59 | server "walreceiver" (ID: 2) was successfully promoted to primary                                               
 1       | pp1         | standby_switchover | t  | 2021-04-27 00:51:10 | node  "pp1" (ID: 1) promoted to primary, node "walreceiver" (ID: 2) demoted to standby                          
 1       | pp1         | standby_promote    | t  | 2021-04-27 00:51:08 | server "pp1" (ID: 1) was successfully promoted to primary                                                       
 2       | walreceiver | standby_switchover | t  | 2021-04-27 00:48:40 | node  "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby                          
 2       | walreceiver | standby_promote    | t  | 2021-04-27 00:48:40 | server "walreceiver" (ID: 2) was successfully promoted to primary                                               
 1       | pp1         | standby_switchover | t  | 2021-04-27 00:08:55 | node  "pp1" (ID: 1) promoted to primary, node "walreceiver" (ID: 2) demoted to standby                          
 1       | pp1         | standby_promote    | t  | 2021-04-27 00:08:53 | server "pp1" (ID: 1) was successfully promoted to primary                                                       
 2       | walreceiver | standby_switchover | t  | 2021-04-26 23:54:35 | node  "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby                          
 2       | walreceiver | standby_promote    | t  | 2021-04-26 23:54:35 | server "walreceiver" (ID: 2) was successfully promoted to primary                                               
 1       | pp1         | standby_switchover | t  | 2021-04-26 23:46:28 | node  "pp1" (ID: 1) promoted to primary, node "walreceiver" (ID: 2) demoted to standby                          
 1       | pp1         | standby_promote    | t  | 2021-04-26 23:46:26 | server "pp1" (ID: 1) was successfully promoted to primary                                                       
 2       | walreceiver | standby_switchover | t  | 2021-04-26 23:34:39 | node  "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby                          
 2       | walreceiver | standby_promote    | t  | 2021-04-26 23:34:38 | server "walreceiver" (ID: 2) was successfully promoted to primary                                               
 1       | pp1         | standby_switchover | f  | 2021-04-26 20:52:10 | node "pp1" (ID: 1) promoted to primary, but demote node "walreceiver" (ID: 2) did not connect to the new primary

[postgres@pp2 ~]$ 

DB: DDL Updates

===== If rt1/pp1 are primary =====

0.- Use 'postgres' DB user for all queries.

1.- Alter subscription on pp1:

	= \c archdb1
	= SELECT pglogical.alter_subscription_add_replication_set('rt1subscr', 'ddl_sql');

2.- Create subscription on rt2:

	= \c rtdb2
	= SELECT pglogical.create_subscription(
	subscription_name := 'rtsubscrddl',
	provider_dsn := 'host=rt1 port=5432 user=repadmin
	password=repadmin_pass_ucb dbname=rtdb1',
	replication_sets := ARRAY['ddl_sql'],
	synchronize_structure := 'FALSE',
	synchronize_data := FALSE
	);

3.- Apply schema change on rt1:

	= \c rtdb1
	= SELECT pglogical.replicate_ddl_command('alter table trinetdb.swarm_state add testfield bigint');

4.- Alter subscription on pp1:

	= \c archdb1
	= SELECT pglogical.alter_subscription_remove_replication_set('rt1subscr', 'ddl_sql');

5.- Drop subscription on rt2:

        = \c rtdb2
        = SELECT pglogical.drop_subscription( subscription_name := 'rtsubscrddl');


===== If rt2/pp2 are primary =====

0.- Use 'postgres' DB user for all queries.

1.- Alter subscription on pp2:

        = \c archdb1
	= SELECT pglogical.alter_subscription_add_replication_set('rt2subscr', 'ddl_sql');

2.- Create subscription on rt1:

        = \c rtdb1
        = SELECT pglogical.create_subscription(
        subscription_name := 'rtsubscrddl',
        provider_dsn := 'host=rt2 port=5432 user=repadmin
        password=repadmin_pass_wr dbname=rtdb2',
        replication_sets := ARRAY['ddl_sql'],
        synchronize_structure := 'FALSE',
        synchronize_data := FALSE
        );

3.- Apply schema change on rt2:

        = \c rtdb2
        = SELECT pglogical.replicate_ddl_command('alter table trinetdb.swarm_state add testfield bigint');

4.- Alter subscription on pp2:

        = \c archdb1
	= SELECT pglogical.alter_subscription_remove_replication_set('rt2subscr', 'ddl_sql');

5.- Drop subscription on rt1:

        = \c rtdb1
        = SELECT pglogical.drop_subscription( subscription_name := 'rtsubscrddl');

DB: Adding New Tables

In this example, the PDL_Ids and PDL_Product tables are added.

1.- Create tables and grant permissions on rtdb1, rtdb2 and archdb1 (active PP DB):

[postgres@rt1 PDL_Tables]$ more *.sql
::::::::::::::
create_PDL_IDS.sql
::::::::::::::
CREATE TABLE PDL_IDS
(	DBSETID BIGINT      NOT NULL,
	TNAME   VARCHAR(30) NOT NULL,
	ID      BIGINT      NOT NULL,
	LDDATE TIMESTAMP DEFAULT (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'),
	CONSTRAINT PDL_IDS_PK PRIMARY KEY (DBSETID, TNAME)
);

::::::::::::::
create_PDL_PRODUCT.sql
::::::::::::::
CREATE TABLE PDL_PRODUCT
(	EVID     BIGINT      NOT NULL,
	PRODTYPE VARCHAR(30) NOT NULL,
	DEST     VARCHAR(30) NOT NULL,
	SOURCE   VARCHAR(8)  NOT NULL,
	TIME     TIMESTAMP   NOT NULL,
	ACTION   VARCHAR(2)  NOT NULL,
	DBSETID  BIGINT,
	LDDATE TIMESTAMP DEFAULT (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'),
	CONSTRAINT PDL_PRO_PK   PRIMARY KEY (EVID, PRODTYPE, ACTION, DEST, TIME, SOURCE),
	CONSTRAINT PDL_PRO_TYPE CHECK (PRODTYPE IN ('origin','phase-data','moment-tensor','focal-mechanism','nearby-cities','shakemap')),
	CONSTRAINT PDL_PRO_DEST CHECK (DEST IN ('Recent_event_PDL','ComCat_PDL','dev-ComCat_PDL','dev-Recent_event_PDL')),
	CONSTRAINT PDL_PRO_ACT  CHECK (ACTION IN ('U','D','u','d'))
);

::::::::::::::
grant.sql
::::::::::::::

GRANT SELECT ON PDL_IDS TO trinetdb_read, code;
GRANT INSERT, UPDATE, DELETE ON PDL_IDS TO trinetdb_write, code;

GRANT SELECT ON PDL_PRODUCT TO trinetdb_read, code;
GRANT INSERT, UPDATE, DELETE ON PDL_PRODUCT TO trinetdb_write, code;

[postgres@rt1 PDL_Tables]$ 

[postgres@rt1 PDL_Tables]$ psql -U trinetdb -d rtdb1 < create_PDL_IDS.sql 
CREATE TABLE
[postgres@rt1 PDL_Tables]$ psql -U trinetdb -d rtdb1 < create_PDL_PRODUCT.sql 
CREATE TABLE
[postgres@rt1 PDL_Tables]$ psql -U trinetdb -d rtdb1 < grant.sql 
GRANT
GRANT
GRANT
GRANT
[postgres@rt1 PDL_Tables]$ 

2.- Set up replication on rtdb1, rtdb2 and archdb1:

In this example, RT2/PP2 are the active systems.

2.1.- Remove RT1 replication set on PP2:

[postgres@pp2 ~]$ psql -d archdb1
psql (10.14)
archdb1=# SELECT pglogical.alter_subscription_remove_replication_set('rt1subscr', 'rtrepset');
 alter_subscription_remove_replication_set 
-------------------------------------------
 t
(1 row)

archdb1=# \q
[postgres@pp2 ~]$

2.2.- Add tables to replication set on RT1:

[postgres@rt1 ~]$ psql -d rtdb1
psql (9.2.24, server 10.10)
Type "help" for help.

rtdb1=# SELECT pglogical.replication_set_add_table(set_name:='rtrepset', relation:='trinetdb.pdl_ids',synchronize_data:=TRUE);
 replication_set_add_table 
---------------------------
 t
(1 row)

rtdb1=# SELECT pglogical.replication_set_add_table(set_name:='rtrepset', relation:='trinetdb.pdl_product',synchronize_data:=TRUE);
 replication_set_add_table 
---------------------------
 t
(1 row)

rtdb1=# \q
[postgres@rt1 ~]$ 

2.3.- Add RT1 replication set on PP2:

[postgres@pp2 ~]$ psql -d archdb1
psql (10.14)
Type "help" for help.

archdb1=# SELECT pglogical.alter_subscription_add_replication_set('rt1subscr', 'rtrepset');
 alter_subscription_add_replication_set 
----------------------------------------
 t
(1 row)

archdb1=# \q
[postgres@pp2 ~]$

2.4.- Do an active/standby role switch (RT1/PP1 are now active).

2.5.- Remove RT2 replication set on PP1:

[postgres@pp1 ~]$ psql -d archdb1
psql (10.14)
Type "help" for help.

archdb1=# SELECT pglogical.alter_subscription_remove_replication_set('rt2subscr', 'rtrepset');
 alter_subscription_remove_replication_set 
-------------------------------------------
 t
(1 row)

archdb1=# \q
[postgres@pp1 ~]$ 

2.6.- Add tables to replication set on RT2:

[postgres@rt2 ~]$ psql -d rtdb2
psql (10.16)
Type "help" for help.

rtdb2=# SELECT pglogical.replication_set_add_table(set_name:='rtrepset', relation:='trinetdb.pdl_ids',synchronize_data:=TRUE);
 replication_set_add_table 
---------------------------
 t
(1 row)

rtdb2=# SELECT pglogical.replication_set_add_table(set_name:='rtrepset', relation:='trinetdb.pdl_product',synchronize_data:=TRUE);
 replication_set_add_table 
---------------------------
 t
(1 row)

rtdb2=# \q
[postgres@rt2 ~]$ 

2.7.- Add RT2 replication set on PP1:

[postgres@pp1 ~]$ psql -d archdb1
psql (10.14)
Type "help" for help.

archdb1=# SELECT pglogical.alter_subscription_add_replication_set('rt2subscr', 'rtrepset');
 alter_subscription_add_replication_set 
----------------------------------------
 t
(1 row)

archdb1=# \q
[postgres@pp1 ~]$ 
operations/db_ncedc/ops_man.txt · Last modified: 2023/01/19 09:35 by stephane