/data/dc5/reporting.NCEDC/archive_size/ (dcmgr on strike) This directory contains the data and programs used to determine the size of the NCEDC archive. Information is aggregated by year and month. 1. Configuration info: conf/gps.list List of directories that store GPS data to be scanned. This file is used by programs in the bin directory to determine where to look for GPS data. conf/nets.continuous.list List of networks for which we archive continuous MSEED data. This file is used by programs in the bin directory to determine the list networks we should scan the filesystem for and compute the continuous MSEED data sizes. 2. Monthly data file created by the programs in the bin directory contain: a. gzipped files created by "find" filesystem scan. b. csv files that summarize the info from the "find" files or SQL queries. data/cont_mseed/year/year.month.csv data: year,month,nbytes data/gps/year/gps.year.month.csv data: year,month,nbytes data/event_mseed_egs/year.month.egs.csv hdr: YEAR,MONTH,EVIDCOUNT,WAVEFORMBYTES,WVIDCOUNT,SNCLEVIDCOUNT data: year,month,evidcount,nbytes,wvidcount,snclevidcount data/event_mseed_ncss/year.month.ncss.csv hdr: YEAR,MONTH,EVIDCOUNT,WAVEFORMBYTES,WVIDCOUNT,SNCLEVIDCOUNT data: year,month,evidcount,nbytes,wvidcount,snclevidcount ------------------------------------------------------------------------------ Programs for getting information on the NCEDC archive size: ============================================================================== To use: 1. Edit ../setup.csh to update the years for the various scans and computations. 2. Source ../setup.csh 3. Run gen_sql_event_mseed_all to run the 2 scripts that generate the SQL query files that query the database to get size info for the NCSS and EGS event miniSEED files. The output of each sql query files are written to a single file the ../sql directory. 4. Run compute_size_event_mseed_all to run yasql using the 2 SQL query files. You will need the EGS_RO database password and the NETDC database password. 4. Run get_cont_mseed_data_all get_gps_data_all to scan the NCEDC filesystem to get archive size info for the continuous mseed and gps data directories. Scanned info is saved in monthly files in the ../data directory. 5. Run compute_size_cont_mseed_all compute_size_gps_all to compute the various archive sizes based on the scanned files or the SQL queries to the database. Output info is saved in monthly csv files in the ../data directory. 6. Run run_merge_monthly_csv to merge ALL of the monthly csv files into a single csv file that can be imported into the ncedc_archive_size.xls spreadsheet. Output file is saved in the ../results directory. ============================================================================== Programs in this directory: 1. Programs to generate SQL requests (put in ../sql) gen_sql_event_mseed_all gen_sql_event_mseed_egs gen_sql_event_mseed_ncss 2. Programs to scan archive file system for all years. get_cont_mseed_data_all get_gps_data_all 3. Programs to scan archive file system for specific year (run by the _all programs above). get_cont_mseed_data get_gps_data 4. Programs that use the the scanned file system data and sql requests to compute the size of the NCEDC archive. compute_size_cont_mseed_all compute_size_event_mseed_all compute_size_gps_all 5. Programs used by the above _all programs. compute_size_cont_mseed compute_size_gps 6. Program merge_monthly_csv 7. Support programs. compute_size_net_sta sum_reduce ------------------------------------------------------------------------------ ncedc_data_archive.xls Updated: 2015/12/11 This spreadsheet contains multiple sheet. 1. sheet 1: NCEDC_Data - COPIED data from most recent merged.csv See instructions below. 2. sheet 2: NCEDC_Summary - Summary info using formulas that reference data from NCEDC_Data. 3. sheet 3: NCEDC_Summary_Year - Summary info using formulas that reference data from NCEDC_Summary sheet. To update data in this spreadsheet: 0. MAKE A BACKUP COPY OF THE ncedc_data_archive.xls. 1. Run merge_monthly_csv to create a csv file with all ncedc archive size data. bin/merge_monthly_csv > results/merged.csv 2. Open the ncedc_data_archive.xls spreadsheet (eg with soffice) 3. Import the new cvs file into a NEW SHEET in the spreadsheet. 4. Select ALL of the data in the NEW SHEET, copy it, and paste it into the NCEDC_Data Sheet. You have to cut and paste because fields in the other NCEDC_* sheets reference fields in the NCEDC_Data sheet. When you want to add a new year to the spreadsheet, you will have to CAREFULLY add rows to the NCEDC_Summary* sheets and make sure that they reference the appropriate fields in the NCEDC_Data sheet and the NCEDC_Summary sheet.
NCEDC Archive as of: Thu Mar 31 23:55:01 PDT 2022 Total archive: Filesystem Size Used Avail Use% Mounted on strike:/sam/ncedc 250T 169T 82T 68% /data/ncedc Continuous MiniSEED data for current year: 454G BG/2022 629G BK/2022 84G BP/2022 688M CC/2022 156G CE/2022 312G CI/2022 5.7G GM/2022 1.6G GS/2022 576G NC/2022 110G NN/2022 181G NP/2022 41G PB/2022 19G PG/2022 20G SB/2022 7.0G SF/2022 58G UO/2022 32G UW/2022 62G WR/2022 2.7T total GPS data for current year: 151G gps/highrate/raw/2022 56G gps/highrate/rinex/2022 5.1G gps/rt/BK/2022 13G gps/rt/CI/2022 4.0K gps/rt/events/2022 18G gps/rt/NC/2022 45G gps/rt/PB/2022 88G gps/rt/PW/2022 373G total Total Continuous MiniSEED data: 17T BG 28T BK 7.3T BP 5.0G CC 2.2T CE 12T CI 28K db 91G GM 78G GS 44T NC 4.8T NN 9.7T NP 6.6T PB 2.6T PG 325G SB 5.3T SF 906G TA 7.7G UL 219G UO 2.8G US 108G UW 3.6T WR 143T total Total Event data: 0 events/active 0 events/active22 1.1T events/EGSEVT 3.0T events/NCEVT 31G events/SFEVT 4.1T total Total GPS data: 11T gps Total Misc data sets: 3.5T misc Continuous data daily rate: 5.1G BG/2022/2022.075 7.1G BK/2022/2022.075 981M BP/2022/2022.075 13M CC/2022/2022.075 1.8G CE/2022/2022.075 3.6G CI/2022/2022.075 65M GM/2022/2022.075 18M GS/2022/2022.075 6.5G NC/2022/2022.075 1.3G NN/2022/2022.075 2.1G NP/2022/2022.075 491M PB/2022/2022.075 236M PG/2022/2022.075 230M SB/2022/2022.075 637M UO/2022/2022.075 364M UW/2022/2022.075 734M WR/2022/2022.075 31G total GPS data daily rate: 1.7G raw/2022/2022.075 624M rinex/2022/2022.075 2.3G total 21M BK/2022/2022.075 150M CI/2022/2022.075 596M NC/2022/2022.075 540M PB/2022/2022.075 1.1G PW/2022/2022.075 2.3G total