operations:db_ncedc:dd_db
Metrics database import
- Uncompress the csv files that need to be loaded (in csv/csv.YYYY.MM/).
- On transform: (as dcmgr)
- Edit the script dbload_all to include the years and months to be loaded:
[dcmgr@transform dbload]# cd /data/dc5/reporting.NCEDC/dbload [dcmgr@transform dbload]# vi dbload_all
- Run the script dbload_all:
[dcmgr@transform dbload]# ./dbload_all
- Notes:
- Oracle database user is 'ncdist@dcucb'. - To remove all data use 'TRUNCATE TABLE PROD_DIST;' to avoid generating redo logs. - Add more space to the 'NCDIST' tablespace if needed.
2014/05/07 New processing rules: 1. Remove all entries in subnet 169.229.197.0/26 (or /24). 2. Data from email-based distribution methods have email address, but no IP info. For email user@domain: c. "-" => ipaddr a. lowcase(username) => username b. lowcase(domain) => domain 3. Data from web or other service daemon distribution methods have IP address but not user info. For ipaddr: a. ipaddr => ipaddr a. dns(ipaddr)|ipaddr => domain b. "-" => username Distinct email users: select unique username,domain where ipaddr = "-"; Distinct ip users: select unique username,domain here ipaddr != '-';
operations/db_ncedc/dd_db.txt · Last modified: 2022/01/14 16:24 by stephane