operations:db_ncedc:ops_man
Table of Contents
PostgreSQL Test AQMS System Operations Manual
AQMS/DB: Commands
- Start PostgreSQL database:
[postgres@rt1 ~]$ sudo systemctl [re]start postgresql-10
- Stop PostgreSQL database:
[postgres@rt1 ~]$ sudo systemctl stop postgresql-10
- PostgreSQL database Status:
[postgres@rt1 ~]$ systemctl status postgresql-10
- Show RT/PP system role:
[aqms@rt2 ~]$ showrole System is shadow [aqms@rt2 ~]$ [aqms@pp2 events]$ showrole System is primary [aqms@pp2 events]$
- AQMS command:
[aqms@rt1 ~]$ aqms -h AQMS realtime-processing control script can be used to start,stop, restart, show status Usage: aqms [start|stop|status|restart|forcestop] stop does a kill -1 and forcestop does a kill -9, returns 1 when stopped status exits with 0 if all required processes are running and 1 if not. restart does a stop 1 or 9 followed by start and status. [aqms@rt1 ~]$
- Start all AQMS+EW codes on each system:
[aqms@rt1 ~]$ aqms start
- Status of what is running:
[aqms@rt1 ~]$ aqms status 2334 java -Xms16m -Xmx512m -Dcms -jar /home/aqms/jars/QWServer.jar -configFile /home/aqms/cms/QWServerConfig.xml 2375 grep -E /home/aqms/jars/QWServer.jar adadup_slave /home/aqms/configs/adadup_slave/adadup_rt1_to_menlo1.cfg NOT RUNNING adadup_slave /home/aqms/configs/adadup_slave/adadup_rt1_to_menlo2.cfg NOT RUNNING adadup_slave /home/aqms/configs/adadup_slave/adadup_rt1_to_ucb1.cfg NOT RUNNING adadup_slave /home/aqms/configs/adadup_slave/adadup_rt1_to_ucb2.cfg NOT RUNNING ntrcg2 /home/aqms/configs/ntrcg2/ntrcg2.cfg running on pid 9237 eqrun /home/aqms/configs/eqrun/eqrun.cfg running on pid 9262 ec /home/aqms/configs/ec/ec.cfg running on pid 9173 tc2 /home/aqms/configs/tc2/tc2.cfg running on pid 9200 trig2db /home/aqms/configs/trig2db/trig2db.cfg running on pid 9223 trimag /home/aqms/configs/trimag/trimag.cfg NOT RUNNING [aqms@rt1 ~]$ [aqms@rt1 ~]$ status using default config file startstop_unix.d NOTE: If next line reads "ERROR: tport_attach...", Earthworm is not running. Sent request for status; waiting for response... EARTHWORM-64 SYSTEM STATUS Hostname-OS: rt1 - Linux 3.10.0-1160.2.1.el7.x86_64 Start time (UTC): Mon Aug 23 21:59:56 2021 Current time (UTC): Fri Sep 17 16:01:06 2021 Disk space avail: 35686244 kb Ring 1 name/key/size: HYPO_RING / 1015 / 1024 kb Ring 2 name/key/size: PICK_RING / 1005 / 1024 kb Ring 3 name/key/size: FILTERPICK_RING / 1039 / 1024 kb Startstop's Log Dir: /home/aqms/logs/ Startstop's Params Dir: /home/aqms/ew/params/ Startstop's Bin Dir: /home/aqms/ew/bin Startstop Version: v7.10 2019-08-13 (64 bit) Process Process Class/ CPU Name Id Status Priority Used Argument ------- ------- ------ -------- ---- -------- startstop 2152 Alive ??/ 0 00:01:31 - import_ack 31047 Alive ??/ 0 00:01:15 imp_pktrg_menlo1.d import_ack 31009 Alive ??/ 0 00:01:15 imp_pktrg_menlo2.d import_ack 2228 Alive ??/ 0 00:00:31 imp_pktrg_ucb1.d import_ack 2229 Alive ??/ 0 00:00:31 imp_pktrg_ucb2.d pkfilter 2231 Alive ??/ 0 00:15:02 pkfilter.d statrigfilter 2232 Alive ??/ 0 00:15:26 statrigfilter.d binder_ew 2233 Alive ??/ 0 00:12:00 binder_ew.d eqassemble 2234 Alive ??/ 0 00:07:11 eqassemble.d hyps2ps 2235 Alive ??/ 0 00:01:32 hyps2ps.d carlsubtrig 2236 Alive ??/ 0 00:05:45 carlsubtrig.d trig2ps 2237 Alive ??/ 0 00:01:31 trig2ps.d statmgr 2238 Alive ??/ 0 00:01:40 statmgr.d telafeeder 2239 NoExec telafeeder.d diskmgr 2240 Alive ??/ 0 00:01:28 diskmgr.d copystatus 2241 Alive ??/ 0 00:01:26 PICK_RING HYPO_RING copystatus 2242 Alive ??/ 0 00:01:21 FILTERPICK_RING <_RING [aqms@rt1 ~]$
[aqms@pp1 ~]$ aqms status aqms 2345 0.0 0.0 96308 4264 ? S Nov03 0:01 pws pws_dart.cfg aqms 2346 0.0 0.0 14908 1464 ? S Nov03 0:14 conlog /home/aqms/logs/pws_dart_con aqms 2351 0.0 0.0 113480 4592 ? S Nov03 0:01 pws pws_CE.cfg aqms 2352 0.0 0.0 14908 1464 ? S Nov03 0:19 conlog /home/aqms/logs/pws_CE_con aqms 2359 0.0 0.0 96308 4272 ? S Nov03 0:01 pws pws_UCB.cfg aqms 2360 0.0 0.0 14908 1468 ? S Nov03 0:22 conlog /home/aqms/logs/pws_UCB_con aqms 2367 0.0 0.0 113480 4604 ? S Nov03 0:01 pws pws_NN.cfg aqms 2368 0.0 0.0 14908 1460 ? S Nov03 0:25 conlog /home/aqms/logs/pws_NN_con aqms 2373 0.0 0.0 113480 4668 ? S Nov03 0:03 pws pws.cfg aqms 2374 0.0 0.0 14908 1464 ? S Nov03 0:52 conlog /home/aqms/logs/pws_con aqms 22902 0.1 0.0 219132 5212 ? S 16:43 0:00 pws pws_UCB.cfg aqms 22903 0.0 0.0 218168 3948 ? S 16:43 0:00 pws pws_dart.cfg aqms 22904 0.1 0.0 236312 5336 ? S 16:43 0:00 pws pws_NN.cfg aqms 2407 0.0 0.0 209876 7008 ? S Nov03 1:12 wanc wanc_AZBKBPLBPBUS.cfg aqms 2408 0.0 0.0 14908 1464 ? S Nov03 0:00 conlog /home/aqms/logs/wanc_AZBKBPLBPBUS_con aqms 2430 0.0 0.0 208728 5876 ? S Nov03 2:05 wanc wanc_BG.cfg aqms 2431 0.0 0.0 14908 1464 ? S Nov03 0:00 conlog /home/aqms/logs/wanc_BG_con aqms 2466 0.0 0.0 209812 6916 ? S Nov03 0:59 wanc wanc_CE.cfg aqms 2467 0.0 0.0 14908 1464 ? S Nov03 0:01 conlog /home/aqms/logs/wanc_CE_con aqms 2490 0.0 0.0 211464 6596 ? S Nov03 0:36 wanc wanc_CINN.cfg aqms 2491 0.0 0.0 14908 1464 ? S Nov03 0:00 conlog /home/aqms/logs/wanc_CINN_con aqms 2509 0.0 0.0 209332 6520 ? S Nov03 1:11 wanc wanc_NC.cfg aqms 2510 0.0 0.0 14908 1464 ? S Nov03 0:00 conlog /home/aqms/logs/wanc_NC_con aqms 2531 0.0 0.0 209404 6544 ? S Nov03 1:12 wanc wanc_GMGSNPPGSBSFUOUWWR.cfg aqms 2532 0.0 0.0 14908 1468 ? S Nov03 0:00 conlog /home/aqms/logs/wanc_GMGSNPPGSBSFUOUWWR_con solserver.pl running on pids 2758 2771 [aqms@pp1 ~]$
- RT1 Processes:
UID PID PPID C STIME TTY TIME CMD aqms 2334 1 0 Nov03 ? 00:11:06 java -Xms16m -Xmx512m -Dcms -jar /home/aqms/jars/QWServer.jar -configFile /home/aqms/cms/QWServerConfig.xml aqms 9167 1 0 Nov03 ? 00:00:11 startstop aqms 9173 1 0 Nov03 ? 00:00:09 ec /home/aqms/configs/ec/ec.cfg aqms 9174 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/ec_console aqms 9180 9167 0 Nov03 ? 00:01:34 import_ack imp_pktrg_menlo1.d aqms 9181 9167 0 Nov03 ? 00:01:34 import_ack imp_pktrg_menlo2.d aqms 9182 9167 0 Nov03 ? 00:00:03 import_ack imp_pktrg_ucb1.d aqms 9183 9167 0 Nov03 ? 00:00:03 import_ack imp_pktrg_ucb2.d aqms 9184 9167 0 Nov03 ? 00:02:00 pkfilter pkfilter.d aqms 9185 9167 0 Nov03 ? 00:01:56 statrigfilter statrigfilter.d aqms 9186 9167 0 Nov03 ? 00:02:05 binder_ew binder_ew.d aqms 9188 9167 0 Nov03 ? 00:01:00 eqassemble eqassemble.d aqms 9189 9167 0 Nov03 ? 00:00:11 hyps2ps hyps2ps.d aqms 9190 9167 0 Nov03 ? 00:00:43 carlsubtrig carlsubtrig.d aqms 9191 9167 0 Nov03 ? 00:00:11 trig2ps trig2ps.d aqms 9192 9167 0 Nov03 ? 00:00:12 statmgr statmgr.d aqms 9193 9167 0 Nov03 ? 00:00:00 [startstop] <defunct> aqms 9194 9167 0 Nov03 ? 00:00:10 diskmgr diskmgr.d aqms 9195 9167 0 Nov03 ? 00:00:10 copystatus PICK_RING HYPO_RING aqms 9196 9167 0 Nov03 ? 00:00:10 copystatus FILTERPICK_RING HYPO_RING aqms 9200 1 0 Nov03 ? 00:00:06 tc2 /home/aqms/configs/tc2/tc2.cfg aqms 9201 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/tc2_console aqms 9205 9188 0 Nov03 ? 00:00:00 eqcoda eqcoda.d aqms 9222 9205 0 Nov03 ? 00:00:02 hyp2000_mgr hyp2000_mgr.d cal4-1.4.hyp aqms 9223 1 0 Nov03 ? 00:00:05 trig2db /home/aqms/configs/trig2db/trig2db.cfg aqms 9224 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/trig2db_console aqms 9237 1 0 Nov03 ? 00:00:08 ntrcg2 /home/aqms/configs/ntrcg2/ntrcg2.cfg aqms 9238 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/ntrcg2_console aqms 9262 1 0 Nov03 ? 00:00:22 eqrun /home/aqms/configs/eqrun/eqrun.cfg aqms 9263 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/eqrun_console postgres 1426 1 0 Nov03 ? 00:00:11 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/ postgres 1932 1426 0 Nov03 ? 00:00:00 postgres: logger process postgres 2013 1426 0 Nov03 ? 00:00:01 postgres: checkpointer process postgres 2016 1426 0 Nov03 ? 00:00:03 postgres: writer process postgres 2018 1426 0 Nov03 ? 00:00:05 postgres: wal writer process postgres 2019 1426 0 Nov03 ? 00:00:06 postgres: autovacuum launcher process postgres 2020 1426 0 Nov03 ? 00:00:09 postgres: stats collector process postgres 2021 1426 0 Nov03 ? 00:00:00 postgres: bgworker: pglogical supervisor postgres 2022 1426 0 Nov03 ? 00:00:00 postgres: bgworker: logical replication launcher postgres 2222 1426 0 Nov03 ? 00:00:00 postgres: bgworker: pglogical manager 16384 postgres 9385 1426 0 Nov03 ? 00:00:04 postgres: wal sender process repadmin 169.229.197.23(55226) idle
- PP1 Processes:
UID PID PPID C STIME TTY TIME CMD aqms 2345 1 0 Nov03 ? 00:00:01 pws pws_dart.cfg aqms 2346 1 0 Nov03 ? 00:00:14 conlog /home/aqms/logs/pws_dart_con aqms 2351 1 0 Nov03 ? 00:00:01 pws pws_CE.cfg aqms 2352 1 0 Nov03 ? 00:00:19 conlog /home/aqms/logs/pws_CE_con aqms 2359 1 0 Nov03 ? 00:00:01 pws pws_UCB.cfg aqms 2360 1 0 Nov03 ? 00:00:22 conlog /home/aqms/logs/pws_UCB_con aqms 2367 1 0 Nov03 ? 00:00:01 pws pws_NN.cfg aqms 2368 1 0 Nov03 ? 00:00:25 conlog /home/aqms/logs/pws_NN_con aqms 2373 1 0 Nov03 ? 00:00:03 pws pws.cfg aqms 2374 1 0 Nov03 ? 00:00:52 conlog /home/aqms/logs/pws_con aqms 2407 1 0 Nov03 ? 00:01:11 wanc wanc_AZBKBPLBPBUS.cfg aqms 2408 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/wanc_AZBKBPLBPBUS_con aqms 2430 1 0 Nov03 ? 00:02:05 wanc wanc_BG.cfg aqms 2431 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/wanc_BG_con aqms 2466 1 0 Nov03 ? 00:00:59 wanc wanc_CE.cfg aqms 2467 1 0 Nov03 ? 00:00:01 conlog /home/aqms/logs/wanc_CE_con aqms 2490 1 0 Nov03 ? 00:00:36 wanc wanc_CINN.cfg aqms 2491 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/wanc_CINN_con aqms 2509 1 0 Nov03 ? 00:01:11 wanc wanc_NC.cfg aqms 2510 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/wanc_NC_con aqms 2531 1 0 Nov03 ? 00:01:12 wanc wanc_GMGSNPPGSBSFUOUWWR.cfg aqms 2532 1 0 Nov03 ? 00:00:00 conlog /home/aqms/logs/wanc_GMGSNPPGSBSFUOUWWR_con aqms 2758 1 0 Nov03 ? 00:00:00 /bin/perl -w? /home/aqms/bin/solserver.pl -p 6600 aqms 2771 1 0 Nov03 ? 00:00:00 /bin/perl -w? /home/aqms/bin/solserver.pl -p 6601 postgres 1421 1 0 Nov03 ? 00:00:44 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/ postgres 1948 1421 0 Nov03 ? 00:00:00 postgres: logger process postgres 2127 1421 0 Nov03 ? 00:00:18 postgres: checkpointer process postgres 2128 1421 0 Nov03 ? 00:00:03 postgres: writer process postgres 2129 1421 0 Nov03 ? 00:00:18 postgres: wal writer process postgres 2130 1421 0 Nov03 ? 00:00:06 postgres: autovacuum launcher process postgres 2131 1421 0 Nov03 ? 00:00:28 postgres: stats collector process postgres 2132 1421 0 Nov03 ? 00:00:00 postgres: bgworker: pglogical supervisor postgres 2133 1421 0 Nov03 ? 00:00:00 postgres: bgworker: logical replication launcher postgres 2252 1421 0 Nov03 ? 00:00:00 postgres: bgworker: pglogical manager 16384 postgres 3506 1421 0 Nov03 ? 00:00:54 postgres: wal sender process repadmin 130.118.43.106(58750) streaming F/3151AF8 postgres 9408 1421 0 Nov03 ? 00:00:24 postgres: bgworker: pglogical apply 16384:1687662095
AQMS: Metadata Loading
LoadStationXML (from aqms_ir), deletes all the meta-data for a station when you load the StationXML in. The assumption is that the StationXML file contains everything that you want to have in the database for that station. The script also cannot deal with multiple station epochs yet (multiple channel and other epochs is fine). If you have a newer version: the pole-zero part is a bit buggy, for example, can load them only once, so I made it an option: we don't need the poles and zeros loaded at the moment and have no time to debug that part now. Was added by ISTI for use in northern Cal.
Remark: Currently, it seems to populate only the following tables: * d_abbreviation * d_unit * d_format * station_data * channel_data * simple_response * channelmap_ampparms <--- Should not populate this table * channelmap_codaparms <--- Should not populate this table * sensitivity loadStationXML from aqms_ir has a PZ filling option, but that needs some work (for UW to be interested in using it, NCSN may be able to use as-is).
- Metadata loading at NCSS:
- As the 'aqms' user on the active pp, rt1 & rt2, cd into /home/aqms/configs/metadata/aqms-ir/.
- To load all stations for all networks:
- Run 'getNCEDCstaXML' to retrieve all stationXML files from the NCEDC.
- Run 'loadAllStaXML.sh' to load all stations (takes ~1h30).
- To load a given network:
- Run 'getNCEDCstaXML <Network>' to retrieve all stationXML files for the given network.
- Run 'loadAllStaXML.sh' to load all stations for the given network.
- To load a given station:
- Run 'getNCEDCstaXML <Network> <Station>' to retrieve the stationXML file for the given station.
- Run 'loadAllStaXML.sh' to load the station.
- To load a list of networks and stations:
- Run 'getFromList' to retrieve the stationXML files for the networks and stations specified in 'netsta.lst', e.g.:
[aqms@rt1 aqms-ir]$ cat netsta.lst SB BK MERC NP 1858 UO [aqms@rt1 aqms-ir]$
- Note: Instead of running those scripts on all machines, run it on one (e.g. rt1) and execute the script 'refreshMetadata.csh' on rt2 & the active pp.
AQMS: Adding a Station
Remark: This is currently a workaround. In production mode: * StationUI will be used to maintain: * The AppChannels & Applications tables (for auxiliary lists and section 101?). * AppChan.sql will be used to generate: * The AppChannels & Applications tables (based on the auxiliary lists). * The Config_Channel & Program tables. * codaparamsDB will generate: * The ChannelMap_AmpParms & ChannelMap_CodaParms tables, * The StaCorrections table section Md. * The AppChannels & Applications tables (section 111). * loadCISNmagcorr will generate: * The StaCorrections table section Ml (changes very rarely). * Logger_Info table update? Integrate with metadata updates?
- AppChannels/StaCorrections loading at NCSS:
- For the Application tables, cd into /home/aqms/configs/db/ncsn_sql/AppCha/.
- Run the following script on an Oracle DB (e.g. RTUCB2) to create the file 'app_chan_101_111.sql':
set colsep ',' set echo off set feedback off set linesize 1000 set pagesize 0 set sqlprompt '' set trimspool on set headsep off spool app_chan_101_111.sql select 'INSERT INTO AppChannels (progid,net,sta,seedchan,location,config,ondate,offdate,lddate) VALUES ('||''''||progid||''''||','||''''||net||''''||','||''''||sta||''''||','||''''||seedchan||''''||','||''''||location||''''||','||''''||config||''''||','||''''||ondate||''''||','||''''||offdate||''''||','||''''||lddate||''''||');' FROM AppChannels WHERE progid IN (101,111); spool off quit
- Run the following script on an Oracle DB (e.g. RTUCB2) to create the file 'stacorrections.sql':
set colsep ',' set echo off set feedback off set linesize 1000 set pagesize 0 set sqlprompt '' set trimspool on set headsep off spool stacorrections.sql select 'INSERT INTO StaCorrections (net,sta,seedchan,channel,channelsrc,location,auth,corr,corr_flag,corr_type,ondate,offdate,lddate) VALUES ('||''''||net||''''||','||''''||sta||''''||','||''''||seedchan||''''||','||''''||channel||''''||','||''''||channelsrc||''''||','||''''||location||''''||','||''''||auth||''''||','||''''||corr||''''||','||''''||corr_flag||''''||','||''''||corr_type||''''||','||''''||ondate||''''||','||''''||offdate||''''||','||''''||lddate||''''||');' FROM StaCorrections; spool off quit
- FTP the files 'app_chan_101_111.sql' and 'stacorrections.sql' to the PostgreSQL system in /home/aqms/configs/db/ncsn_sql/AppCha/.
- Update the 'appChan_pg.sql' script if needed (Oracle version on ucbpp:/home/ncss/run/SQL).
- Run 'refreshAppCha.csh' to refresh the Applications, AppChannels & StaCorrections table.
AQMS: Real Time Role Switch
NOTES:
The command 'switchrole' can be launched on either rt1 or rt2. The real time switch should happen before the post processing one because currently a PP system is considered primary at startup if its corresponding RT system is primary (showrole script logic).
- Switchover:
[aqms@rt1 ~]$ switchrole Will switch rt1 (rtdb1) from primary to shadow and rt2(rtdb2) from shadow to primary OK (Y or N):y Will do it. rtem@rtdb1 System was primary, changing database to shadow spawn psql -q -h rt1.geo.berkeley.edu -d rtdb1 -U rtem -W Password for user rtem: rtdb1=> \i /home/aqms/utils/scripts/set_shadow.sql rtdb1=> \q rtem@rtdb2 System was shadow, changing database to primary spawn psql -q -h rt2.wr.usgs.gov -d rtdb2 -U rtem -W Password for user rtem: rtdb2=> \i /home/aqms/utils/scripts/set_primary.sql rtdb2=> \q [aqms@rt1 ~]$ switchrole Will switch rt1 (rtdb1) from shadow to primary and rt2(rtdb2) from primary to shadow OK (Y or N):n NOT doing it. Exit.
- Primary Status:
[aqms@rt1 ~]$ psql -q -h rt1.geo.berkeley.edu -d rtdb1 -U rtem -W Password for user rtem: rtdb1=> select * from rt_role; primary_system | modification_time ----------------+---------------------------- FALSE | 2019-11-01 00:00:00 true | 2019-11-25 21:28:10.59765 false | 2021-06-30 22:11:19.622779 (3 rows) rtdb1=> \q
- Standby Status:
[aqms@rt1 ~]$ psql -q -h rt2.wr.usgs.gov -d rtdb2 -U rtem -W Password for user rtem: rtdb2=> select * from rt_role; primary_system | modification_time ----------------+---------------------------- TRUE | 2019-05-01 00:00:00 false | 2019-11-25 23:08:39.9445 true | 2021-06-30 22:11:19.731711 (3 rows) rtdb2=> \q [aqms@rt1 ~]$ exit
AQMS/DB: Post Processing Role Switch
NOTE:
MAKE SURE that the DB instances are started and stopped with systemctl, not pg_ctl. systemctl status postgresql-10 sudo systemctl stop postgresql-10 sudo systemctl [re]start postgresql-10
- Post Processing Failover:
- In a controlled environment (e.g. patching the active PP): * On the standby system as 'postgres', do a dry run first: % repmgr standby switchover --username repadmin --verbose --remote-user=postgres --force-rewind --dry-run * If you get no errors from the dry run, on the primary system as 'aqms' shut down AQMS: % aqms stop * Check to see if anything is still running, kill remaining processes: % aqms status * On the standby system, sync the event waveform archive: % rsync -av -e ssh --progress aqms@{primary}:/home/aqms/data/events/active/ /home/aqms/data/events/active where {primary} = {pp1|pp2} * On the standby system as 'postgres', do the switchover: % repmgr standby switchover --username repadmin --verbose --remote-user=postgres --force-rewind * On the NEW primary system as 'aqms', start AQMS: % aqms start * Check to make sure that all processes are running: % aqms status - In an emergency (e.g. active PP has crashed): If something bad happened or you don't expect to be able to have the standby become primary, instead of using --force-rewind use this command which kills the original primary after taking over. % repmgr standby switchover --always-promote * Stop AQMS on the old primary if it isn't already, and start it up on the new primary (see above).
- Switchover dry run:
[postgres@pp2 ~]$ repmgr standby switchover --username repadmin --verbose --remote-user=postgres --force-rewind --dry-run INFO: checking for package configuration file "/etc/repmgr/10/repmgr.conf" INFO: configuration file found at: "/etc/repmgr/10/repmgr.conf" WARNING: following problems with command line parameters detected: database connection parameters not required when executing STANDBY SWITCHOVER NOTICE: checking switchover on node "walreceiver" (ID: 2) in --dry-run mode WARNING: no superuser connection available DETAIL: it is recommended to perform switchover operations with a database superuser HINT: provide the name of a superuser with -S/--superuser INFO: searching for primary node INFO: checking if node 1 is primary INFO: current primary node is 1 INFO: prerequisites for using pg_rewind are met INFO: SSH connection to host "pp1" succeeded INFO: able to execute "repmgr" on remote host "pp1" INFO: 1 walsenders required, 3 available INFO: demotion candidate is able to make replication connection to promotion candidate INFO: archive mode is "off" INFO: replication lag on this standby is 0 seconds NOTICE: local node "walreceiver" (ID: 2) would be promoted to primary; current primary "pp1" (ID: 1) would be demoted to standby INFO: following shutdown command would be run on node "pp1": "sudo /bin/systemctl stop postgresql-10" INFO: parameter "shutdown_check_timeout" is set to 60 seconds INFO: prerequisites for executing STANDBY SWITCHOVER are met [postgres@pp2 ~]$
- Switchover:
[postgres@pp2 ~]$ repmgr standby switchover --username repadmin --verbose --remote-user=postgres --force-rewind INFO: checking for package configuration file "/etc/repmgr/10/repmgr.conf" INFO: configuration file found at: "/etc/repmgr/10/repmgr.conf" WARNING: following problems with command line parameters detected: database connection parameters not required when executing STANDBY SWITCHOVER NOTICE: executing switchover on node "walreceiver" (ID: 2) WARNING: no superuser connection available DETAIL: it is recommended to perform switchover operations with a database superuser HINT: provide the name of a superuser with -S/--superuser INFO: searching for primary node INFO: checking if node 1 is primary INFO: current primary node is 1 INFO: prerequisites for using pg_rewind are met INFO: SSH connection to host "pp1" succeeded INFO: archive mode is "off" INFO: replication lag on this standby is 0 seconds NOTICE: local node "walreceiver" (ID: 2) will be promoted to primary; current primary "pp1" (ID: 1) will be demoted to standby NOTICE: stopping current primary node "pp1" (ID: 1) WARNING: a superuser connection is required to issue a CHECKPOINT HINT: provide a superuser with -S/--superuser DETAIL: executing server command "sudo /bin/systemctl stop postgresql-10" INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout") NOTICE: current primary has been cleanly shut down at location 33/22B08078 NOTICE: promoting standby to primary DETAIL: promoting server "walreceiver" (ID: 2) using "pg_ctl -w -D '/home/pgsql/data' promote" waiting for server to promote.... done server promoted NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete INFO: standby promoted to primary after 0 second(s) NOTICE: STANDBY PROMOTE successful DETAIL: server "walreceiver" (ID: 2) was successfully promoted to primary WARNING: no superuser connection available, unable to issue CHECKPOINT INFO: local node 1 can attach to rejoin target node 2 DETAIL: local node's recovery point: 33/22B08078; rejoin target node's fork point: 33/22B080E8 NOTICE: setting node 1's upstream to node 2 WARNING: unable to ping "host=pp1 user=repadmin dbname=repmgr connect_timeout=2 application_name=pp1" DETAIL: PQping() returned "PQPING_NO_RESPONSE" NOTICE: starting server using "sudo /bin/systemctl start postgresql-10" WARNING: node "pp1" not found in "pg_stat_replication" NOTICE: NODE REJOIN successful DETAIL: node 1 is now attached to node 2 INFO: node "pp1" (ID: 1) is pingable WARNING: node "pp1" not found in "pg_stat_replication" INFO: waiting for node "pp1" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "walreceiver"'s "pg_stat_replication" table where "application_name" is "pp1" INFO: node "pp1" (ID: 1) has attached to its upstream node NOTICE: node "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby NOTICE: switchover was successful DETAIL: node "walreceiver" is now primary and node "pp1" is attached as standby NOTICE: STANDBY SWITCHOVER has completed successfully [postgres@pp2 ~]$
- Primary Status:
[postgres@pp1 ~]$ repmgr node status Node "pp1": PostgreSQL version: 10.10 Total data size: 2172 MB Conninfo: host=pp1 user=repadmin dbname=repmgr connect_timeout=2 Role: primary WAL archiving: off Archive command: (none) Replication connections: 1 (of maximal 3) Replication slots: 0 physical (of maximal 80; 0 missing) Replication lag: n/a [postgres@pp1 ~]$ psql psql (10.10) Type "help" for help. postgres=# \x Expanded display is on. postgres=# select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 22980 usesysid | 18520 usename | repadmin application_name | walreceiver client_addr | 130.118.43.106 client_hostname | client_port | 39488 backend_start | 2021-05-27 08:53:11.665653-07 backend_xmin | state | streaming sent_lsn | 28/5A603470 write_lsn | 28/5A603470 flush_lsn | 28/5A603470 replay_lsn | 28/5A603470 write_lag | 00:00:00.004746 flush_lag | 00:00:00.005036 replay_lag | 00:00:00.005777 sync_priority | 0 sync_state | async postgres=# \q [postgres@pp1 ~]$
- Standy Status:
[postgres@pp2 ~]$ repmgr node status Node "walreceiver": PostgreSQL version: 10.14 Total data size: 2172 MB Conninfo: host=pp2 user=repadmin dbname=repmgr connect_timeout=2 Role: standby WAL archiving: off Archive command: (none) Replication connections: 0 (of maximal 3) Replication slots: 0 physical (of maximal 80; 0 missing) Upstream node: pp1 (ID: 1) Replication lag: 0 seconds Last received LSN: 28/5A3D6EC8 Last replayed LSN: 28/5A3D6EC8 [postgres@pp2 ~]$ psql psql (10.14) Type "help" for help. postgres=# \x Expanded display is on. postgres=# select * from pg_stat_wal_receiver; -[ RECORD 1 ]---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ pid | 2913 status | streaming receive_start_lsn | 28/1D000000 receive_start_tli | 11 received_lsn | 28/5A501748 received_tli | 11 last_msg_send_time | 2021-05-27 13:23:52.2518-07 last_msg_receipt_time | 2021-05-27 13:23:55.041877-07 latest_end_lsn | 28/5A501748 latest_end_time | 2021-05-27 13:23:52.2518-07 slot_name | conninfo | user=repadmin passfile=/var/lib/pgsql/.pgpass connect_timeout=2 dbname=replication host=pp1 port=5432 application_name=walreceiver fallback_application_name=walreceiver sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any postgres=# \q [postgres@pp2 ~]$
DB: Shadow Post Processing DB Recovery
- Streaming Replication Status (shadow node pp1 is detached):
[postgres@pp2 ~]$ repmgr cluster show WARNING: node "pp1" not found in "pg_stat_replication" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------------+---------+-----------+---------------+----------+----------+----------+-------------------------------------------------------- 1 | pp1 | standby | running | ! walreceiver | default | 100 | 12 | host=pp1 user=repadmin dbname=repmgr connect_timeout=2 2 | walreceiver | primary | * running | | default | 100 | 12 | host=pp2 user=repadmin dbname=repmgr connect_timeout=2 WARNING: following issues were detected - node "pp1" (ID: 1) is not attached to its upstream node "walreceiver" (ID: 2) [postgres@pp2 ~]$
- If WAL segments have not yet been recycled:
==> Run the following on the shadow PP system (e.g. pp1): [postgres@pp1 ~]$ repmgr node rejoin --force-rewind --verbose -d 'host= pp2 user=repadmin dbname=repmgr connect_timeout=2'
- If WAL segments have been recycled:
- If errors are present in log or pg_rewind fails when running 'repmgr node rejoin': 2021-07-01 14:46:12.115 PDT [2491] ERROR: requested WAL segment 0000000C0000003A000000AA has already been removed First on the shadow PP DB, move $PGDATA out of the way and back it up if you want to. Then run: /usr/pgsql-10/bin/pg_basebackup -h source.host.com -R -D $PGDATA -U repadmin -v -P The "-R" generates a recovery file when doing this copy, so when you start up this instance it will be a streaming standby kept in sync with source.host.com.
- Recovery example:
[postgres@pp1 ~]$ sudo systemctl stop postgresql-10 [postgres@pp1 ~]$ pwd /var/lib/pgsql [postgres@pp1 ~]$ mv 10 10.0 [postgres@pp1 ~]$ /usr/pgsql-10/bin/pg_basebackup -h pp2 -R -D $PGDATA -U repadmin -v -P pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 6D/4F1BE930 on timeline 12 pg_basebackup: starting background WAL receiver 5474940/5474940 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 6D/83E8EE50 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: base backup completed [postgres@pp1 ~]$ [postgres@pp1 ~]$ cd 10/data [postgres@pp1 ~]$ cp ../../10.0/data/recovery.conf . [postgres@pp1 ~]$ cp ../../10.0/data/recovery.done . [postgres@pp1 ~]$ cp ../../10.0/data/pg_hba.conf . [postgres@pp1 ~]$ sudo systemctl start postgresql-10 [postgres@pp1 ~]$ repmgr node check INFO: connection is not a superuser connection, falling back to simple check HINT: provide a superuser with -S/--superuser, or add the "repadmin" user to role "pg_read_all_settings" or "pg_monitor" Node "pp1": Server role: OK (node is standby) Replication lag: OK (0 seconds) WAL archiving: OK (0 pending archive ready files) Upstream connection: OK (node "pp1" (ID: 1) is attached to expected upstream node "walreceiver" (ID: 2)) Downstream servers: OK (this node has no downstream nodes) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/home/pgsql/data") [postgres@pp1 ~]$ repmgr node status Node "pp1": PostgreSQL version: 10.10 Total data size: 3746 MB Conninfo: host=pp1 user=repadmin dbname=repmgr connect_timeout=2 Role: standby WAL archiving: off Archive command: (none) Replication connections: 0 (of maximal 3) Replication slots: 0 physical (of maximal 80; 0 missing) Upstream node: walreceiver (ID: 2) Replication lag: 0 seconds Last received LSN: 6D/869FEEA8 Last replayed LSN: 6D/869FEEA8 [postgres@pp1 ~]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------------+---------+-----------+-------------+----------+----------+----------+-------------------------------------------------------- 1 | pp1 | standby | running | walreceiver | default | 100 | 12 | host=pp1 user=repadmin dbname=repmgr connect_timeout=2 2 | walreceiver | primary | * running | | default | 100 | 12 | host=pp2 user=repadmin dbname=repmgr connect_timeout=2 [postgres@pp1 ~]$ [postgres@pp2 ~]$ repmgr node check INFO: connection is not a superuser connection, falling back to simple check HINT: provide a superuser with -S/--superuser, or add the "repadmin" user to role "pg_read_all_settings" or "pg_monitor" Node "walreceiver": Server role: OK (node is primary) Replication lag: OK (N/A - node is primary) WAL archiving: OK (0 pending archive ready files) Upstream connection: OK (N/A - node is primary) Downstream servers: OK (1 of 1 downstream nodes attached) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/home/pgsql/data") [postgres@pp2 ~]$ repmgr node status Node "walreceiver": PostgreSQL version: 10.14 Total data size: 3746 MB Conninfo: host=pp2 user=repadmin dbname=repmgr connect_timeout=2 Role: primary WAL archiving: off Archive command: (none) Replication connections: 1 (of maximal 3) Replication slots: 0 physical (of maximal 80; 0 missing) Replication lag: n/a [postgres@pp2 ~]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------------+---------+-----------+-------------+----------+----------+----------+-------------------------------------------------------- 1 | pp1 | standby | running | walreceiver | default | 100 | 12 | host=pp1 user=repadmin dbname=repmgr connect_timeout=2 2 | walreceiver | primary | * running | | default | 100 | 12 | host=pp2 user=repadmin dbname=repmgr connect_timeout=2 [postgres@pp2 ~]$
DB: Streaming Replication Status
- Node Status:
[postgres@pp2 ~]$ repmgr node check WARNING: node "pp1" not found in "pg_stat_replication" INFO: connection is not a superuser connection, falling back to simple check HINT: provide a superuser with -S/--superuser, or add the "repadmin" user to role "pg_read_all_settings" or "pg_monitor" Node "walreceiver": Server role: OK (node is primary) Replication lag: OK (N/A - node is primary) WAL archiving: OK (0 pending archive ready files) Upstream connection: OK (N/A - node is primary) Downstream servers: CRITICAL (1 of 1 downstream nodes not attached; missing: pp1 (ID: 1)) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/home/pgsql/data") [postgres@pp2 ~]$ [postgres@pp2 ~]$ repmgr node status WARNING: node "pp1" not found in "pg_stat_replication" Node "walreceiver": PostgreSQL version: 10.14 Total data size: 3683 MB Conninfo: host=pp2 user=repadmin dbname=repmgr connect_timeout=2 Role: primary WAL archiving: off Archive command: (none) Replication connections: 0 (of maximal 3) Replication slots: 0 physical (of maximal 80; 0 missing) Replication lag: n/a WARNING: following issue(s) were detected: - 1 of 1 downstream nodes not attached: - pp1 (ID: 1) HINT: execute "repmgr node check" for more details [postgres@pp2 ~]$
[postgres@pp1 ~]$ repmgr node check WARNING: node "pp1" not found in "pg_stat_replication" INFO: connection is not a superuser connection, falling back to simple check HINT: provide a superuser with -S/--superuser, or add the "repadmin" user to role "pg_read_all_settings" or "pg_monitor" Node "pp1": Server role: OK (node is standby) Replication lag: CRITICAL (3678118 seconds, critical threshold: 600)) WAL archiving: OK (0 pending archive ready files) Upstream connection: CRITICAL (node "pp1" (ID: 1) is not attached to expected upstream node "walreceiver" (ID: 2)) Downstream servers: OK (this node has no downstream nodes) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/home/pgsql/data") [postgres@pp1 ~]$ [postgres@pp1 ~]$ repmgr node status Node "pp1": PostgreSQL version: 10.10 Total data size: 2639 MB Conninfo: host=pp1 user=repadmin dbname=repmgr connect_timeout=2 Role: standby WAL archiving: off Archive command: (none) Replication connections: 0 (of maximal 3) Replication slots: 0 physical (of maximal 80; 0 missing) Upstream node: walreceiver (ID: 2) Replication lag: 3678160 seconds Last received LSN: 3A/AA000000 Last replayed LSN: 3A/AA19ECF8 [postgres@pp1 ~]$
- Cluster Status:
[postgres@pp2 ~]$ repmgr cluster show WARNING: node "pp1" not found in "pg_stat_replication" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------------+---------+-----------+---------------+----------+----------+----------+-------------------------------------------------------- 1 | pp1 | standby | running | ! walreceiver | default | 100 | 12 | host=pp1 user=repadmin dbname=repmgr connect_timeout=2 2 | walreceiver | primary | * running | | default | 100 | 12 | host=pp2 user=repadmin dbname=repmgr connect_timeout=2 WARNING: following issues were detected - node "pp1" (ID: 1) is not attached to its upstream node "walreceiver" (ID: 2) [postgres@pp2 ~]$ repmgr cluster event Node ID | Name | Event | OK | Timestamp | Details ---------+-------------+--------------------+----+---------------------+------------------------------------------------------------------------------------------------------------------ 1 | pp1 | node_rejoin | t | 2021-07-01 13:49:18 | node 1 is now attached to node 2 2 | walreceiver | standby_switchover | t | 2021-06-03 13:32:54 | node "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby 2 | walreceiver | standby_promote | t | 2021-06-03 13:32:50 | server "walreceiver" (ID: 2) was successfully promoted to primary 1 | pp1 | standby_switchover | t | 2021-04-27 00:56:54 | node "pp1" (ID: 1) promoted to primary, node "walreceiver" (ID: 2) demoted to standby 1 | pp1 | standby_promote | t | 2021-04-27 00:56:52 | server "pp1" (ID: 1) was successfully promoted to primary 2 | walreceiver | standby_switchover | t | 2021-04-27 00:53:00 | node "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby 2 | walreceiver | standby_promote | t | 2021-04-27 00:52:59 | server "walreceiver" (ID: 2) was successfully promoted to primary 1 | pp1 | standby_switchover | t | 2021-04-27 00:51:10 | node "pp1" (ID: 1) promoted to primary, node "walreceiver" (ID: 2) demoted to standby 1 | pp1 | standby_promote | t | 2021-04-27 00:51:08 | server "pp1" (ID: 1) was successfully promoted to primary 2 | walreceiver | standby_switchover | t | 2021-04-27 00:48:40 | node "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby 2 | walreceiver | standby_promote | t | 2021-04-27 00:48:40 | server "walreceiver" (ID: 2) was successfully promoted to primary 1 | pp1 | standby_switchover | t | 2021-04-27 00:08:55 | node "pp1" (ID: 1) promoted to primary, node "walreceiver" (ID: 2) demoted to standby 1 | pp1 | standby_promote | t | 2021-04-27 00:08:53 | server "pp1" (ID: 1) was successfully promoted to primary 2 | walreceiver | standby_switchover | t | 2021-04-26 23:54:35 | node "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby 2 | walreceiver | standby_promote | t | 2021-04-26 23:54:35 | server "walreceiver" (ID: 2) was successfully promoted to primary 1 | pp1 | standby_switchover | t | 2021-04-26 23:46:28 | node "pp1" (ID: 1) promoted to primary, node "walreceiver" (ID: 2) demoted to standby 1 | pp1 | standby_promote | t | 2021-04-26 23:46:26 | server "pp1" (ID: 1) was successfully promoted to primary 2 | walreceiver | standby_switchover | t | 2021-04-26 23:34:39 | node "walreceiver" (ID: 2) promoted to primary, node "pp1" (ID: 1) demoted to standby 2 | walreceiver | standby_promote | t | 2021-04-26 23:34:38 | server "walreceiver" (ID: 2) was successfully promoted to primary 1 | pp1 | standby_switchover | f | 2021-04-26 20:52:10 | node "pp1" (ID: 1) promoted to primary, but demote node "walreceiver" (ID: 2) did not connect to the new primary [postgres@pp2 ~]$
DB: DDL Updates
===== If rt1/pp1 are primary ===== 0.- Use 'postgres' DB user for all queries. 1.- Alter subscription on pp1: = \c archdb1 = SELECT pglogical.alter_subscription_add_replication_set('rt1subscr', 'ddl_sql'); 2.- Create subscription on rt2: = \c rtdb2 = SELECT pglogical.create_subscription( subscription_name := 'rtsubscrddl', provider_dsn := 'host=rt1 port=5432 user=repadmin password=repadmin_pass_ucb dbname=rtdb1', replication_sets := ARRAY['ddl_sql'], synchronize_structure := 'FALSE', synchronize_data := FALSE ); 3.- Apply schema change on rt1: = \c rtdb1 = SELECT pglogical.replicate_ddl_command('alter table trinetdb.swarm_state add testfield bigint'); 4.- Alter subscription on pp1: = \c archdb1 = SELECT pglogical.alter_subscription_remove_replication_set('rt1subscr', 'ddl_sql'); 5.- Drop subscription on rt2: = \c rtdb2 = SELECT pglogical.drop_subscription( subscription_name := 'rtsubscrddl'); ===== If rt2/pp2 are primary ===== 0.- Use 'postgres' DB user for all queries. 1.- Alter subscription on pp2: = \c archdb1 = SELECT pglogical.alter_subscription_add_replication_set('rt2subscr', 'ddl_sql'); 2.- Create subscription on rt1: = \c rtdb1 = SELECT pglogical.create_subscription( subscription_name := 'rtsubscrddl', provider_dsn := 'host=rt2 port=5432 user=repadmin password=repadmin_pass_wr dbname=rtdb2', replication_sets := ARRAY['ddl_sql'], synchronize_structure := 'FALSE', synchronize_data := FALSE ); 3.- Apply schema change on rt2: = \c rtdb2 = SELECT pglogical.replicate_ddl_command('alter table trinetdb.swarm_state add testfield bigint'); 4.- Alter subscription on pp2: = \c archdb1 = SELECT pglogical.alter_subscription_remove_replication_set('rt2subscr', 'ddl_sql'); 5.- Drop subscription on rt1: = \c rtdb1 = SELECT pglogical.drop_subscription( subscription_name := 'rtsubscrddl');
DB: Adding New Tables
In this example, the PDL_Ids and PDL_Product tables are added. 1.- Create tables and grant permissions on rtdb1, rtdb2 and archdb1 (active PP DB): [postgres@rt1 PDL_Tables]$ more *.sql :::::::::::::: create_PDL_IDS.sql :::::::::::::: CREATE TABLE PDL_IDS ( DBSETID BIGINT NOT NULL, TNAME VARCHAR(30) NOT NULL, ID BIGINT NOT NULL, LDDATE TIMESTAMP DEFAULT (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'), CONSTRAINT PDL_IDS_PK PRIMARY KEY (DBSETID, TNAME) ); :::::::::::::: create_PDL_PRODUCT.sql :::::::::::::: CREATE TABLE PDL_PRODUCT ( EVID BIGINT NOT NULL, PRODTYPE VARCHAR(30) NOT NULL, DEST VARCHAR(30) NOT NULL, SOURCE VARCHAR(8) NOT NULL, TIME TIMESTAMP NOT NULL, ACTION VARCHAR(2) NOT NULL, DBSETID BIGINT, LDDATE TIMESTAMP DEFAULT (CURRENT_TIMESTAMP AT TIME ZONE 'UTC'), CONSTRAINT PDL_PRO_PK PRIMARY KEY (EVID, PRODTYPE, ACTION, DEST, TIME, SOURCE), CONSTRAINT PDL_PRO_TYPE CHECK (PRODTYPE IN ('origin','phase-data','moment-tensor','focal-mechanism','nearby-cities','shakemap')), CONSTRAINT PDL_PRO_DEST CHECK (DEST IN ('Recent_event_PDL','ComCat_PDL','dev-ComCat_PDL','dev-Recent_event_PDL')), CONSTRAINT PDL_PRO_ACT CHECK (ACTION IN ('U','D','u','d')) ); :::::::::::::: grant.sql :::::::::::::: GRANT SELECT ON PDL_IDS TO trinetdb_read, code; GRANT INSERT, UPDATE, DELETE ON PDL_IDS TO trinetdb_write, code; GRANT SELECT ON PDL_PRODUCT TO trinetdb_read, code; GRANT INSERT, UPDATE, DELETE ON PDL_PRODUCT TO trinetdb_write, code; [postgres@rt1 PDL_Tables]$ [postgres@rt1 PDL_Tables]$ psql -U trinetdb -d rtdb1 < create_PDL_IDS.sql CREATE TABLE [postgres@rt1 PDL_Tables]$ psql -U trinetdb -d rtdb1 < create_PDL_PRODUCT.sql CREATE TABLE [postgres@rt1 PDL_Tables]$ psql -U trinetdb -d rtdb1 < grant.sql GRANT GRANT GRANT GRANT [postgres@rt1 PDL_Tables]$ 2.- Set up replication on rtdb1, rtdb2 and archdb1: In this example, RT2/PP2 are the active systems. 2.1.- Remove RT1 replication set on PP2: [postgres@pp2 ~]$ psql -d archdb1 psql (10.14) archdb1=# SELECT pglogical.alter_subscription_remove_replication_set('rt1subscr', 'rtrepset'); alter_subscription_remove_replication_set ------------------------------------------- t (1 row) archdb1=# \q [postgres@pp2 ~]$ 2.2.- Add tables to replication set on RT1: [postgres@rt1 ~]$ psql -d rtdb1 psql (9.2.24, server 10.10) Type "help" for help. rtdb1=# SELECT pglogical.replication_set_add_table(set_name:='rtrepset', relation:='trinetdb.pdl_ids',synchronize_data:=TRUE); replication_set_add_table --------------------------- t (1 row) rtdb1=# SELECT pglogical.replication_set_add_table(set_name:='rtrepset', relation:='trinetdb.pdl_product',synchronize_data:=TRUE); replication_set_add_table --------------------------- t (1 row) rtdb1=# \q [postgres@rt1 ~]$ 2.3.- Add RT1 replication set on PP2: [postgres@pp2 ~]$ psql -d archdb1 psql (10.14) Type "help" for help. archdb1=# SELECT pglogical.alter_subscription_add_replication_set('rt1subscr', 'rtrepset'); alter_subscription_add_replication_set ---------------------------------------- t (1 row) archdb1=# \q [postgres@pp2 ~]$ 2.4.- Do an active/standby role switch (RT1/PP1 are now active). 2.5.- Remove RT2 replication set on PP1: [postgres@pp1 ~]$ psql -d archdb1 psql (10.14) Type "help" for help. archdb1=# SELECT pglogical.alter_subscription_remove_replication_set('rt2subscr', 'rtrepset'); alter_subscription_remove_replication_set ------------------------------------------- t (1 row) archdb1=# \q [postgres@pp1 ~]$ 2.6.- Add tables to replication set on RT2: [postgres@rt2 ~]$ psql -d rtdb2 psql (10.16) Type "help" for help. rtdb2=# SELECT pglogical.replication_set_add_table(set_name:='rtrepset', relation:='trinetdb.pdl_ids',synchronize_data:=TRUE); replication_set_add_table --------------------------- t (1 row) rtdb2=# SELECT pglogical.replication_set_add_table(set_name:='rtrepset', relation:='trinetdb.pdl_product',synchronize_data:=TRUE); replication_set_add_table --------------------------- t (1 row) rtdb2=# \q [postgres@rt2 ~]$ 2.7.- Add RT2 replication set on PP1: [postgres@pp1 ~]$ psql -d archdb1 psql (10.14) Type "help" for help. archdb1=# SELECT pglogical.alter_subscription_add_replication_set('rt2subscr', 'rtrepset'); alter_subscription_add_replication_set ---------------------------------------- t (1 row) archdb1=# \q [postgres@pp1 ~]$
operations/db_ncedc/ops_man.txt · Last modified: 2023/01/19 09:35 by stephane