User Tools

Site Tools


operations:db_ncedc:dd_db

Metrics database import

  • Uncompress the csv files that need to be loaded (in csv/csv.YYYY.MM/).
  • On transform: (as dcmgr)
    • Edit the script dbload_all to include the years and months to be loaded:
[dcmgr@transform dbload]# cd /data/dc5/reporting.NCEDC/dbload
[dcmgr@transform dbload]# vi dbload_all
  • Run the script dbload_all:
[dcmgr@transform dbload]# ./dbload_all
  • Notes:
- Oracle database user is 'ncdist@dcucb'.
- To remove all data use 'TRUNCATE TABLE PROD_DIST;' to avoid generating redo logs.
- Add more space to the 'NCDIST' tablespace if needed.
2014/05/07
New processing rules:
1.  Remove all entries in subnet 169.229.197.0/26 (or /24).

2.  Data from email-based distribution methods have email address,
    but no IP info.

For email user@domain:
	c.  "-"				=> ipaddr
	a.  lowcase(username)		=> username
	b.  lowcase(domain)		=> domain

3.  Data from web or other service daemon distribution methods have
    IP address but not user info.

For ipaddr:
	a.  ipaddr			=> ipaddr
	a.  dns(ipaddr)|ipaddr		=> domain
	b.  "-"				=> username

Distinct email users:
	select unique username,domain where ipaddr = "-";
Distinct ip users:
	select unique username,domain here ipaddr != '-';
operations/db_ncedc/dd_db.txt · Last modified: 2022/01/14 16:24 by stephane