====== Metrics analysis ====== === Distribution Information (From DB) === * On **transform**: (as **dcmgr**) * Edit and execute the SQL scripts **req_by_format_prog.sql**, **dist_by_format_prog.sql** & **dist_total.sql**. It will generate the files //req_by_format_prog.csv//, //dist_by_format_prog.csv// & //dist_total.csv// for all the data in the database. [dcmgr@transform analysis]# cd /data/dc5/reporting.NCEDC/analysis [dcmgr@transform analysis]# vi req_by_format_prog.sql dist_by_format_prog.sql dist_total.sql [dcmgr@transform analysis]# yasql ncdist@dcucb @req_by_format_prog.sql [dcmgr@transform analysis]# yasql ncdist@dcucb @dist_by_format_prog.sql [dcmgr@transform analysis]# yasql ncdist@dcucb @dist_total.sql * Run the **csv_b2g** script to convert bytes into gigabytes: [dcmgr@transform analysis]$ ./csv_b2g -f3 dist_by_format_prog.csv FORMAT,PROG,NBYTES assemble,httpd,0.461 cat,ftp,0.014 cat,httpd,44.362 catalog,fdsnws-event,145.248 gps,ftp,1778.876 gps,httpd,279.055 metadata,fdsnws-station,1362.311 metadata,ncedcws-dataless,1.856 metadata,ncedcws-resp,0.103 metadata,ncedcws-sacpz,0.382 mseed,fdsnws-dataselect,40431.028 mseed,ncedcws-eventdata,10.793 mseed,sws,594.182 rawdata,httpd,0.001 xmldata,ftp,68.643 xmldata,httpd,0.005 [dcmgr@transform analysis]$ ./csv_b2g -f1 dist_total.csv NBYTES 44717.320 [dcmgr@transform analysis]$ === Geolocation Information (From DB) === * On **transform**: (as **dcmgr**) * Edit the script **sql.dist_by_type** to specify the time interval (yyyymm for start and end month): [dcmgr@transform analysis]# cd /data/dc5/reporting.NCEDC/analysis [dcmgr@transform analysis]# vi sql.dist_by_type * Run the script **sql.dist_by_type**. It will generate 3 output files: * //count.data.csv// - for all types of data (mseed, gps, xml, raw, assemble, ...). * //count.data.cat// - for all catalog queries. * //count.data.meta// - for all metadata queries. [dcmgr@transform analysis]# ./sql.dist_by_type * Run the script **geolocate.pl** to get geolocation information for the above files, e.g.: [dcmgr@transform analysis]# ./geolocate.pl count.data.csv -> # creates count.data.csv.geo * Notes: Notes based on current data distribution processing. 1. IPADDR is NEVER NULL. 2. DOMAIN is NEVER NULL. 3. IPADDR is set to '-' when no IP address is available. 4. DOMAIN is set to the ipaddress when no domain is available. So, values of fields can be: case: IPADDR DOMAIN --------------------------------------- 1 real_ip_addr real_domainname 2 real_ip_addr real_ip_addr 3 - real_domainname To get a count of everything once and only, perform 3 selections: select ... where IPADDR != DOMAIN and IPADDR != '-' # Case 1 - use either select ipaddr ... where IPADDR = DOMAIN # Case 2 - only IP: domain set to IP select domain ... where IPADDR = '-' # Case 3 - only domain ============================================================================== Query curl -s http://freegeoip.net/csv/128.32.149.11 Alternatives investigated but not used: curl -s http://geoip.nekudo.com/api/128.32.149.11/en/short https://dns.google.com/resolve?name=usgs.gov curl -s http://dns.google.com/resolve\?name=hotmail.com {"Status": 0,"TC": false,"RD": true,"RA": true,"AD": false,"CD": false,"Question":[ {"name": "hotmail.com.","type": 1}],"Answer":[ {"name": "hotmail.com.","type": 1,"TTL": 3568,"data": "157.56.198.220"},{"name": "hotmail.com.","type": 1,"TTL": 3568,"data": "65.55.118.92"}]} {"Status": 0, "TC": false, "RD": true, "RA": true, "AD": false, "CD": false, "Question":[ {"name": "hotmail.com.","type": 1} ], "Answer":[ {"name": "hotmail.com.","type": 1,"TTL": 3568,"data": "157.56.198.220"}, {"name": "hotmail.com.","type": 1,"TTL": 3568,"data": "65.55.118.92"} ] } ============================================================================== # 1. Identify what is a domain name vs a hostname. # 2. While domain name, use dns.google.com, and take result from "data" attribute. # 3. Lookup with rake% curl -s http://dns.google.com/resolve\?name=icjta.csic.es {"Status": 3,"TC": false,"RD": true,"RA": true,"AD": false,"CD": false,"Question":[ {"name": "icjta.csic.es.","type": 1}],"Authority":[ {"name": "csic.es.","type": 6,"TTL": 1611,"data": "olmo.csic.es. hostmaster.csic.es. 2010042607 86400 7200 2592000 86400"}]} rake% curl -s http://dns.google.com/resolve\?name=olmo.csic.es {"Status": 0,"TC": false,"RD": true,"RA": true,"AD": false,"CD": false,"Question":[ {"name": "olmo.csic.es.","type": 1}],"Answer":[ {"name": "olmo.csic.es.","type": 1,"TTL": 70114,"data": "161.111.10.3"}]} rake% curl -s http://freegeoip.net/csv/olmo.csic.es 161.111.10.3,ES,Spain,MD,Madrid,Madrid,28001,Europe/Madrid,40.4167,-3.6838,0 rake% curl -s http://freegeoip.net/csv/161.111.10.3 161.111.10.3,ES,Spain,MD,Madrid,Madrid,28001,Europe/Madrid,40.4167,-3.6838,0 === Yearly Summary Information (From files) === * On **transform**: (as **dcmgr**) [dcmgr@transform csv]$ cd /data/dc5/reporting.NCEDC/csv/ * Run the **usr_csv_year.csh ** script to generate the number of distinct users for a given year, e.g.: [dcmgr@transform csv]$ usr_csv_year.csh 2012 11003 [dcmgr@transform csv]$ * Run the **sum_csv_year.csh ** script to generate the total amount of data for a given year, e.g.: [dcmgr@transform csv]$ sum_csv_year.csh 2012 10975G [dcmgr@transform csv]$ * Run the **sum_csv_year_sta.csh ** script to generate the total amount of data for a given year and list of stations (defined in **sum_csv_sta**), e.g.: [dcmgr@transform csv]$ sum_csv_year_sta.csh 2018 271G [dcmgr@transform csv]$