Talk:Data Warehouse
Conversions
I've written 2 queries to convert the temperatures. They will only process rows in which ??_temp_c is not NULL and ??_temp_f is still NULL. to make sure we don't do the same thing twice. I does seem that the ambient temp values are sometimes a bit of the scale ("999.9"), Do I need to include some kind of lower and upper limit to skip those? Or do we do some kind of basic sanity checking at some other time.
I've directly updated the HIVE_DATA table, I hope that's fine?
mysql> update HIVE_DATA set hive_temp_f = (hive_temp_c * 1.8) + 32 where hive_temp_c is not NULL and hive_temp_f is NULL; Query OK, 0 rows affected, 65535 warnings (11.07 sec) Rows matched: 4270276 Changed: 0 Warnings: 3279801
mysql> update HIVE_DATA set ambient_temp_f = (ambient_temp_c * 1.8) + 32 where ambient_temp_c is not NULL and ambient_temp_f is NULL; Query OK, 3682033 rows affected, 65535 warnings (16.86 sec) Rows matched: 3682033 Changed: 3682033 Warnings: 2445369
No sure where the warnings come from though.
for easy execution:
# mysql --verbose -p < CelciusToFarenheit.sql Enter password:
update HIVE_DATA set hive_temp_f = (hive_temp_c * 1.8) + 32 where hive_temp_c is not NULL and hive_temp_f is NULL
update HIVE_DATA set ambient_temp_f = (ambient_temp_c * 1.8) + 32 where ambient_temp_c is not NULL and ambient_temp_f is NULL
- cat CelciusToFarenheit.sql
use _hivetool_converted; update HIVE_DATA set hive_temp_f = (hive_temp_c * 1.8) + 32 where hive_temp_c is not NULL and hive_temp_f is NULL; update HIVE_DATA set ambient_temp_f = (ambient_temp_c * 1.8) + 32 where ambient_temp_c is not NULL and ambient_temp_f is NULL;
Start for summary tables
Copied HIVE_DATA to S_HIVE_DATA
Date fields To do some easy grouping per date unit
Added fields some fields that will be useful for aggregation: day_of_year,month,month_name,quarter,year,month_year
Maybe season is a usefull one as well?
day_of_year int(3) month int(2) month_name varchar(10) quarter int(1) year int(4) month_year varchar(15)
to fill them:
# mysql --verbose -p < FillDateFields.sql # cat FillDateFields.sql
use centesb7_hivetool_converted; update S_HIVE_DATA set day_of_year=DAYOFYEAR(timestamp), month=MONTH(timestamp), month_name=lower(MONTHNAME(timestamp)), quarter=QUARTER(timestamp), year=YEAR(timestamp), month_year=lower(DATE_FORMAT(timestamp,'%M-%Y')) where timestamp != 0;
Some easy example queries:
- select year,month_name,hive_id,avg(hive_temp_c) from S_HIVE_DATA where timestamp != 0 group by year,month_name,hive_id; - select hive_id,year,month_name,avg(hive_temp_c),avg(ambient_temp_c) from S_HIVE_DATA where timestamp != 0 and hive_id=10 group by hive_id,year,month_name order by year,month; - select hive_id,year,month_name,avg(hive_temp_c),avg(ambient_temp_c),max(hive_weight_kgs) from S_HIVE_DATA where timestamp != 0 and hive_id=10 group by hive_id,year,month_name order by year,month;
rownums per hive We will need those to easily calculate delta's
Added field "hive_row_id" bigint(20)
The query below for fills in the rowid for hive_id 10 set @rownum:=0; update S_HIVE_DATA set hive_row_id = ( select @rownum := @rownum + 1) where hive_id=10 order by row_id asc;
To do it for all hives:
# cat WriteFillHiveRownums.sql use centesb7_hivetool_converted;
select CONCAT('set @rownum:=0; update S_HIVE_DATA set hive_row_id = ( select @rownum := @rownum + 1) where hive_id=',hive_id,' order by row_id asc;') from HIVE_PARAMETERS;
# echo "use centesb7_hivetool_converted;" > FillHiveRownums.sql # mysql -N -p < WriteFillHiveRownums.sql >> FillHiveRownums.sql # mysql -p < FillHiveRownums.sql
TODO:
- Add hive param fields to make easier selection on location, name, ...
- fill in delta's on weight.