User Tools

Site Tools


NCEDC Archive Size Estimation

Detailed Estimate

/data/dc5/reporting.NCEDC/archive_size/ (dcmgr on strike)

This directory contains the data and programs used to determine
the size of the NCEDC archive.  Information is aggregated by 
year and month.

1.  Configuration info:

	List of directories that store GPS data to be scanned.
	This file is used by programs in the bin directory to determine
	where to look for GPS data.

	List of networks for which we archive continuous MSEED data.
	This file is used by programs in the bin directory to determine
	the list networks we should scan the filesystem for and compute
	the continuous MSEED data sizes.

2.  Monthly data file created by the programs in the bin directory contain:
a.  gzipped files created by "find" filesystem scan.
b.  csv files that summarize the info from the "find" files or SQL queries.

	data:	year,month,nbytes

	data:	year,month,nbytes

	data:	year,month,evidcount,nbytes,wvidcount,snclevidcount

	data:	year,month,evidcount,nbytes,wvidcount,snclevidcount


Programs for getting information on the NCEDC archive size:


To use:

1.  Edit
    to update the years for the various scans and computations.

2.  Source ../setup.csh

3.  Run
    to run the 2 scripts that generate the SQL query files that
    query the database to get size info for the NCSS and EGS event miniSEED
    files.  The output of each sql query files are written to a single file
    the ../sql directory.

4.  Run
    to run yasql using the 2 SQL query files.  You will need the
    EGS_RO database password and the NETDC database password.

4.  Run
    to scan the NCEDC filesystem to get archive size info for the continuous mseed
    and gps data directories.  Scanned info is saved in monthly files in the
    ../data directory.

5.  Run
    to compute the various archive sizes based on the scanned files or the
    SQL queries to the database.  Output info is saved in monthly csv files
    in the ../data directory.

6.  Run
    to merge ALL of the monthly csv files into a single csv file that can be
    imported into the ncedc_archive_size.xls spreadsheet.  Output file
    is saved in the ../results directory.


Programs in this directory:

1.  Programs to generate SQL requests (put in ../sql)

2.  Programs to scan archive file system for all years.

3.  Programs to scan archive file system for specific year
    (run by the _all programs above).

4.  Programs that use the the scanned file system data and
    sql requests to compute the size of the NCEDC archive.
5.  Programs used by the above _all programs.

6.  Program

7.  Support programs.

	Updated:  2015/12/11

	This spreadsheet contains multiple sheet.
1.	sheet 1:	NCEDC_Data		- COPIED data from most recent merged.csv
						  See instructions below.
2.	sheet 2:	NCEDC_Summary		- Summary info using formulas that reference
						  data from NCEDC_Data.
3.	sheet 3:	NCEDC_Summary_Year	- Summary info using formulas that reference
						  data from NCEDC_Summary sheet.

To update data in this spreadsheet:

0.  MAKE A BACKUP COPY OF THE ncedc_data_archive.xls.

1.  Run merge_monthly_csv to create a csv file with all ncedc archive size data.
	bin/merge_monthly_csv > results/merged.csv

2.  Open the ncedc_data_archive.xls spreadsheet (eg with soffice)

3.  Import the new cvs file into a NEW SHEET in the spreadsheet.

4.  Select ALL of the data in the NEW SHEET, copy it, and paste it into
	the NCEDC_Data Sheet.   You have to cut and paste because fields
	in the other NCEDC_* sheets reference fields in the NCEDC_Data sheet.

When you want to add a new year to the spreadsheet, you will have
to CAREFULLY add rows to the NCEDC_Summary* sheets and make sure that
they reference the appropriate fields in the NCEDC_Data sheet and the
NCEDC_Summary sheet.

Summary Estimate

  • On dcmgr@transform a cron job runs at the end of each month and generates the following information in /home/dcmgr/misc/NCEDC.output:
NCEDC Archive as of:
Thu Mar 31 23:55:01 PDT 2022

Total archive:
Filesystem         Size  Used Avail Use% Mounted on
strike:/sam/ncedc  250T  169T   82T  68% /data/ncedc

Continuous MiniSEED data for current year:
454G	BG/2022
629G	BK/2022
84G	BP/2022
688M	CC/2022
156G	CE/2022
312G	CI/2022
5.7G	GM/2022
1.6G	GS/2022
576G	NC/2022
110G	NN/2022
181G	NP/2022
41G	PB/2022
19G	PG/2022
20G	SB/2022
7.0G	SF/2022
58G	UO/2022
32G	UW/2022
62G	WR/2022
2.7T	total

GPS data for current year:
151G	gps/highrate/raw/2022
56G	gps/highrate/rinex/2022
5.1G	gps/rt/BK/2022
13G	gps/rt/CI/2022
4.0K	gps/rt/events/2022
18G	gps/rt/NC/2022
45G	gps/rt/PB/2022
88G	gps/rt/PW/2022
373G	total

Total Continuous MiniSEED data:
17T	BG
28T	BK
7.3T	BP
5.0G	CC
2.2T	CE
12T	CI
28K	db
91G	GM
78G	GS
44T	NC
4.8T	NN
9.7T	NP
6.6T	PB
2.6T	PG
325G	SB
5.3T	SF
906G	TA
7.7G	UL
219G	UO
2.8G	US
108G	UW
3.6T	WR
143T	total

Total Event data:
0	events/active
0	events/active22
1.1T	events/EGSEVT
3.0T	events/NCEVT
31G	events/SFEVT
4.1T	total

Total GPS data:
11T	gps

Total Misc data sets:
3.5T	misc

Continuous data daily rate:
5.1G	BG/2022/2022.075
7.1G	BK/2022/2022.075
981M	BP/2022/2022.075
13M	CC/2022/2022.075
1.8G	CE/2022/2022.075
3.6G	CI/2022/2022.075
65M	GM/2022/2022.075
18M	GS/2022/2022.075
6.5G	NC/2022/2022.075
1.3G	NN/2022/2022.075
2.1G	NP/2022/2022.075
491M	PB/2022/2022.075
236M	PG/2022/2022.075
230M	SB/2022/2022.075
637M	UO/2022/2022.075
364M	UW/2022/2022.075
734M	WR/2022/2022.075
31G	total

GPS data daily rate:
1.7G	raw/2022/2022.075
624M	rinex/2022/2022.075
2.3G	total
21M	BK/2022/2022.075
150M	CI/2022/2022.075
596M	NC/2022/2022.075
540M	PB/2022/2022.075
1.1G	PW/2022/2022.075
2.3G	total
operations/db_ncedc/data_volume.txt · Last modified: 2022/04/03 12:50 by stephane