Starting and Stopping AQMS for NCSS
This page describes the various means of starting and stopping AQMS systems for the NCSS. It starts at the highest level, where the OS boot and shutdown system interact with AQMS. The discussion proceeds to lower levels, down to ways of starting and stopping individual AQMS programs.
Init Scripts
Currently almost all the computers on which NCSS runs AQMS programs use “init” scripts to start and stop things. This is the facility that Solaris and Linux up through Red Hat 6 provide. Newer Linux systems running Red hat 7 offer a different facility for controlling processes during bootup: systemd.
The following tables shows the various init scripts and the run levels and priorities assigned to them:
Network Service Systems ucbns1, ucbns2
Script Name | Start Run Levels | Start Priority | Kill Run Levels | Kill Priority | Function |
---|---|---|---|---|---|
netmon | 2,3,4 | 91 | no automatic stopping | starts data acquisition | |
ncss | 2,3,4 | 95 | 0,1,5,6 | 05 | starts and stops AQMS |
Network Service Systems mnlons1, mnlons2
Script Name | Start Run Levels | Start Priority | Kill Run Levels | Kill Priority | Function |
---|---|---|---|---|---|
ncss | 2,3,4 | 94 | 0,1,5,6 | 05 | starts and stops non-EW AQMS |
earthworm | 2,3,4 | 95 | 0,1,5,6 | 05 | starts and stops Earthworm |
RT System ubbrt
Script Name | Start Run Levels | Start Priority | Kill Run Levels | Kill Priority | Function |
---|---|---|---|---|---|
dbora | 2,3,4 | 82 | 0,1,5,6 | 10 | starts and stops Oracle DB |
cms | 2,3,4 | 89 | 0,1,5,6 | 10 | starts and stops CMS |
ncss | 2,3,4 | 95 | 0,1,5,6 | 05 | starts and stops AQMS |
RT System mnlort1
Script Name | Start Run Levels | Start Priority | Kill Run Levels | Kill Priority | Function |
---|---|---|---|---|---|
dbora | 2,3,4 | 82 | 0,1,5,6 | 10 | starts and stops Oracle DB |
cms | 2,3,4 | 89 | 0,1,5,6 | 10 | starts and stops CMS |
ncss | 2,3,4 | 95 | 0,1,5,6 | 05 | starts and stops non-EW AQMS |
earthworm | 2,3,4 98 | 0,1,5,6 | 02 | starts and stops Earthworm |
Post-Proc System ucbpp
Script Name | Start Run Levels | Start Priority | Kill Run Levels | Kill Priority | Function |
---|---|---|---|---|---|
cms | 2,3,4 | 89 | 0,1,5,6 | 10 | starts and stops CMS |
ncss | 2,3,4 | 95 | 0,1,5,6 | 05 | starts and stops parts of AQMS |
dcmgr | 2,3,4 | 95 | 0,1,5,6 | 05 | starts and stops dcmgr monitoring |
Post-Proc System mnlodb1
Script Name | Start Run Levels | Start Priority | Kill Run Levels | Kill Priority | Function |
---|---|---|---|---|---|
cms | 3 | 89 | no auto shutdown | starts CMS | |
dbora | 3 | 93 | 0,1,S | 10 | starts and stops Oracle DB |
ncss | 3 | 97 | 0,1,S | 05 | starts and stops parts of AQMS |
dcmgr | 3 | 98 | 0,1,S | 05 | starts and stops dcmgr monitoring |
As you can see, there is little consistency in the Start Priority values in the above tables; it is the order that matters most. Systems that have a local Oracle database should start that before starting “ncss”, the main part that depends on the database. Likewise on those systems running CMS, it should be started before “ncss” with depends on CMS. On the UCB acquisition and network service systems, the “netmon” system starts slowly enough that the AQMS part (WDA) will already be available by the time netmon starts the WDA writers.
Note that many parts of the post-processing systems are started by crontab entries instead of by init script. And on the two RT systems, the solution servers are started by crontab entries. Each of the NCSS computers has various crontab entries for running miscellaneous support codes, not described here.
User-level Scripts
The above init scripts (except for dbora, provided by Oracle) are quite simple. They simply call a user-level script to perform the startup (and shutdown, if applicable) work, as follows:
- ncss: calls ~ncss/run/bin/run_all for startup, ~ncss/run/bin/stop_all for shutdown.
- cms: calls ~ncss/run/cms/runAll start for startup, ~ncss/run/cms/runAll stop for shutdown.
- dcmgr: calls ~dcmgr/run/bin/run_all for startup, ~dcmgr/run/bin/stop_all for shutdown.
- netmon: calls ~ncss/config/bin/run_netmon for startup. Acquisition must be stopped manually.
- earthworm: runs startstop in background with appropriate environment and configuration file, stdout & stderr redirected to /dev/null; kills startstop on shutdown.
The run_all scripts are pretty straight-forward bash scripts. They check some environment variables and set some others that are needed by most AQMS programs. Then they run dbping to check that the configured database is available for use. dbping connects to the database and does a simple query to ensure that the database is working correctly. If the database is OK, then the run_all script calls all the scripts and programs needed to start the AQMS components needed for the particular user and host. Each run_all script is custom made for that user and host! The run_all script also starts several programs that do not depend on the Oracle database.
For stopping AQMS programs, the stop_all script stops many of the programs previously started by run_all. Some of that work is done by searching the run_all script for commands that follow a simple pattern, making that part of stop_all generic.
Individual AQMS run scripts
Most AQMS programs used by NCSS have their own scripts for starting and stopping. Where possible, many of these run scripts have only a few lines and then call a generic script run/bin/runguts. As the name implies, runguts has all the guts of the script. It provides the options start, stop, stopwait, and restart. Many AQMS programs, especially the ones connecting to CMS, take many seconds to exit after they have been sent the kill signal. The “stopwait” and “restart” options verify that the particular application has really exitted before proceeding.
Some exceptions to the simple run scripts:
- run_monitor: the monitor script will reread its configuration file when sent the HUP signal. The run_monitor script does that when called with the restart option.
- run_pws*: the proxy wave server (pws) works by forking a new process for each client connection. The run_pws* script can only kill the initial server process, not the forked ones. Thus the script cannot safely restart pws. It is up to the user to decide when it is safe to start a new pws server after stopping one.
- run_wanc*: the NC waveform archiver wanc does not reliably exit on a kill signal. Thus the run_wanc* scripts do not provide a restart option.