====== Metrics analysis ======
=== Distribution Information (From DB) ===
* On **transform**: (as **dcmgr**)
* Edit and execute the SQL scripts **req_by_format_prog.sql**, **dist_by_format_prog.sql** & **dist_total.sql**. It will generate the files //req_by_format_prog.csv//, //dist_by_format_prog.csv// & //dist_total.csv// for all the data in the database.
[dcmgr@transform analysis]# cd /data/dc5/reporting.NCEDC/analysis
[dcmgr@transform analysis]# vi req_by_format_prog.sql dist_by_format_prog.sql dist_total.sql
[dcmgr@transform analysis]# yasql ncdist@dcucb @req_by_format_prog.sql
[dcmgr@transform analysis]# yasql ncdist@dcucb @dist_by_format_prog.sql
[dcmgr@transform analysis]# yasql ncdist@dcucb @dist_total.sql
* Run the **csv_b2g** script to convert bytes into gigabytes:
[dcmgr@transform analysis]$ ./csv_b2g -f3 dist_by_format_prog.csv
FORMAT,PROG,NBYTES
assemble,httpd,0.461
cat,ftp,0.014
cat,httpd,44.362
catalog,fdsnws-event,145.248
gps,ftp,1778.876
gps,httpd,279.055
metadata,fdsnws-station,1362.311
metadata,ncedcws-dataless,1.856
metadata,ncedcws-resp,0.103
metadata,ncedcws-sacpz,0.382
mseed,fdsnws-dataselect,40431.028
mseed,ncedcws-eventdata,10.793
mseed,sws,594.182
rawdata,httpd,0.001
xmldata,ftp,68.643
xmldata,httpd,0.005
[dcmgr@transform analysis]$ ./csv_b2g -f1 dist_total.csv
NBYTES
44717.320
[dcmgr@transform analysis]$
=== Geolocation Information (From DB) ===
* On **transform**: (as **dcmgr**)
* Edit the script **sql.dist_by_type** to specify the time interval (yyyymm for start and end month):
[dcmgr@transform analysis]# cd /data/dc5/reporting.NCEDC/analysis
[dcmgr@transform analysis]# vi sql.dist_by_type
* Run the script **sql.dist_by_type**. It will generate 3 output files:
* //count.data.csv// - for all types of data (mseed, gps, xml, raw, assemble, ...).
* //count.data.cat// - for all catalog queries.
* //count.data.meta// - for all metadata queries.
[dcmgr@transform analysis]# ./sql.dist_by_type
* Run the script **geolocate.pl** to get geolocation information for the above files, e.g.:
[dcmgr@transform analysis]# ./geolocate.pl count.data.csv -> # creates count.data.csv.geo
* Notes:
Notes based on current data distribution processing.
1. IPADDR is NEVER NULL.
2. DOMAIN is NEVER NULL.
3. IPADDR is set to '-' when no IP address is available.
4. DOMAIN is set to the ipaddress when no domain is available.
So, values of fields can be:
case: IPADDR DOMAIN
---------------------------------------
1 real_ip_addr real_domainname
2 real_ip_addr real_ip_addr
3 - real_domainname
To get a count of everything once and only, perform 3 selections:
select ... where IPADDR != DOMAIN and IPADDR != '-' # Case 1 - use either
select ipaddr ... where IPADDR = DOMAIN # Case 2 - only IP: domain set to IP
select domain ... where IPADDR = '-' # Case 3 - only domain
==============================================================================
Query
curl -s http://freegeoip.net/csv/128.32.149.11
Alternatives investigated but not used:
curl -s http://geoip.nekudo.com/api/128.32.149.11/en/short
https://dns.google.com/resolve?name=usgs.gov
curl -s http://dns.google.com/resolve\?name=hotmail.com
{"Status": 0,"TC": false,"RD": true,"RA": true,"AD": false,"CD": false,"Question":[ {"name": "hotmail.com.","type": 1}],"Answer":[ {"name": "hotmail.com.","type": 1,"TTL": 3568,"data": "157.56.198.220"},{"name": "hotmail.com.","type": 1,"TTL": 3568,"data": "65.55.118.92"}]}
{"Status": 0,
"TC": false,
"RD": true,
"RA": true,
"AD": false,
"CD": false,
"Question":[
{"name": "hotmail.com.","type": 1}
],
"Answer":[
{"name": "hotmail.com.","type": 1,"TTL": 3568,"data": "157.56.198.220"},
{"name": "hotmail.com.","type": 1,"TTL": 3568,"data": "65.55.118.92"}
]
}
==============================================================================
# 1. Identify what is a domain name vs a hostname.
# 2. While domain name,
use dns.google.com, and take result from "data" attribute.
# 3. Lookup with
rake% curl -s http://dns.google.com/resolve\?name=icjta.csic.es
{"Status": 3,"TC": false,"RD": true,"RA": true,"AD": false,"CD": false,"Question":[ {"name": "icjta.csic.es.","type": 1}],"Authority":[ {"name": "csic.es.","type": 6,"TTL": 1611,"data": "olmo.csic.es. hostmaster.csic.es. 2010042607 86400 7200 2592000 86400"}]}
rake% curl -s http://dns.google.com/resolve\?name=olmo.csic.es
{"Status": 0,"TC": false,"RD": true,"RA": true,"AD": false,"CD": false,"Question":[ {"name": "olmo.csic.es.","type": 1}],"Answer":[ {"name": "olmo.csic.es.","type": 1,"TTL": 70114,"data": "161.111.10.3"}]}
rake% curl -s http://freegeoip.net/csv/olmo.csic.es
161.111.10.3,ES,Spain,MD,Madrid,Madrid,28001,Europe/Madrid,40.4167,-3.6838,0
rake% curl -s http://freegeoip.net/csv/161.111.10.3
161.111.10.3,ES,Spain,MD,Madrid,Madrid,28001,Europe/Madrid,40.4167,-3.6838,0
=== Yearly Summary Information (From files) ===
* On **transform**: (as **dcmgr**)
[dcmgr@transform csv]$ cd /data/dc5/reporting.NCEDC/csv/
* Run the **usr_csv_year.csh ** script to generate the number of distinct users for a given year, e.g.:
[dcmgr@transform csv]$ usr_csv_year.csh 2012
11003
[dcmgr@transform csv]$
* Run the **sum_csv_year.csh ** script to generate the total amount of data for a given year, e.g.:
[dcmgr@transform csv]$ sum_csv_year.csh 2012
10975G
[dcmgr@transform csv]$
* Run the **sum_csv_year_sta.csh ** script to generate the total amount of data for a given year and list of stations (defined in **sum_csv_sta**), e.g.:
[dcmgr@transform csv]$ sum_csv_year_sta.csh 2018
271G
[dcmgr@transform csv]$