Well I am just breaking my sequence of RUNSTATS and REORG for a while (I promise my next blog will be having that)...
Actually I have just been back from an interview session... I took a few interviews for filling up vacancies for java developer... I just asked few basic SQLs, the general approach people took for those SQLs raises a smile across my face... Actually those were the most obvious approach that comes to our mind when we try to reach/approach such problems... I have seen such patterns quite often in reports and prepared statement during my reviews (of SQLs used by development)... But if you stretch a little bit, try thinking the Anti conditions (wondering what they are??? Just wait...), you might get a more efficient option...
Let me share a few to guide you thru this...
I will try to skip the table structure etc (unless it is very much reqd) in order to avoid boring you guys/gals...
1) OK, if you have to find out all those customers (from a CUSTOMER table in bank database) whose PINCODE (an integet field) starts with 56 (i.e. of a particular city/state)... The most obvious response is:
select * from customer wherecast(pincode as varchar(6)) like '56%'
Weren't you thinking exactly that? But unfortunately that case will avoid an index scan and has a casting overhead... Put your Matriculation Hat on your head... Think of the max and min no having 6 digits (i.e. in INDIA pincodes are 6 digit) which can starts with 56??? Does that help??? So now you got a better query:
select * from customer where pincode between 560000 and 569999
If you thought that off pretty early (before I gave the hint), hats off to you...
2)Now if you have a timestamp/date column (in DB2 9.7 date will store time as well till the seconds) to store the transaction date for a banking application (sorry guys but that is the only domain I have worked with), and you want to list all the transactions that happened on a particular date? Obvious query:
select * from transaction_posting_table where date(txn_posting_date)='2010/04/30'
Some one who has attended an Oracle Usability/Oracle Enablement session for DB2 might think
select * from transaction_posting_table wheretrunc(txn_posting_date)='2010/04/30'
'2010/04/30' is '2010/04/30' with time 12:00:00.0000AM and trunc will also convert all the date/timestamps to that day's 12:00:00.0000AM... Hence we get all the transactions of '2010/04/30' irrespective of its time...
As these queries go for table scan, people suggest of adding an extra column which is always computed based on date(txn_posting_date) and then index... Ahhh!!! Before that check this out:
select * from transaction_posting_table where txn_posting_date)>='2010/04/30' and txn_posting_date)<'2010/04/30'+1 Kwel???
'2010/04/30' is '2010/04/30' with time 12:00:00.0000AM
and '2010/04/30' +1 is '2010/05/01' with time 12:00:00.0000AM...
So you pick up any time on '2010/04/30' it will satisfy above condition... The trick is to move the computation from LHS (i.e. instead of converting all the date/timestamp to 12 midnight, search for all date/timestamps b/e two midnights, one inclusive and other exclusive )... :)
3) Another famous interview question is (I have never asked this one) "Write a query to find the 2nd max salary from emp table". But I know it will send across a smile on many faces... (Yeah I know that, I did that in my graduation viva!!!...) So is this what you are thinking?
select max(sal) as secnd_max_sal from emp where sal<(select max(sal) from emp)
Yeah??? that is what you said in your graduation viva? Atleast I said that, but in an interview I will say differently... Well the problem is when you generate a plan you see two sort operation (I would frown at a query with one sort...)
select sal as secnd_max_sal from (select dense_rank() over(order by sal) rnk,sal from emp)tmp_rslt wher tmp_rslt.rnk=2;
(more extensible and works with just one sort and an rid scan!!! :) )
4) Another one that I have seen quite often, that people JOIN two table and the JOIN result is joined with third table on a condition where the key of third can be either matched to first or the second table... This happens more commonly when the first query is left outer join and the join condition can be on the key from first table or if it is null then on second column...
so a few optimization that comes to my mind (I will just write the JOIN clause):
>>A LEFT OUTER JOIN B on A.Aid=B.Bid JOIN C on(A.Aid=C.Cid or B.Bid=C.Cid)
Better way to write is:
>>A LEFT OUTER JOIN B on A.Aid=B.Bid JOIN C on (C.Cid COALESCE(A.Aid,B.Bid))
Another scenario is
>>A JOIN B on A.Aid=B.Bid JOIN C on(A.a1=C.Cid or B.b1=C.Cid)
This again might not go for an index scan on either of the column and might go for a hash join. A better way would be to Use IN... Wondering how???
>>A JOIN B on A.Aid=B.Bid JOIN C on C.Cid IN( B.b1,A.a1)
What say??? I hope this blog would have helped people understand negative/anti condition (I had talked just in brief about this in my ppt in DB2 has talent)... I will try to put up more such content if I come across one... Esp a lot of stuffs can be optimized with CASE statements...
So I hope next time a database developer writes a query, he will think that extra mile and a DBA reviews a SQL he will be more critical...
For those who want more before they can do on their own... Spend a few bucks and get this book on SQL Antipatterns... I havent yet read this but some one suggested it is on the same lines (and the Title of the book suggests the same)...
Actually I have just been back from an interview session... I took a few interviews for filling up vacancies for java developer... I just asked few basic SQLs, the general approach people took for those SQLs raises a smile across my face... Actually those were the most obvious approach that comes to our mind when we try to reach/approach such problems... I have seen such patterns quite often in reports and prepared statement during my reviews (of SQLs used by development)... But if you stretch a little bit, try thinking the Anti conditions (wondering what they are??? Just wait...), you might get a more efficient option...
Let me share a few to guide you thru this...
I will try to skip the table structure etc (unless it is very much reqd) in order to avoid boring you guys/gals...
1) OK, if you have to find out all those customers (from a CUSTOMER table in bank database) whose PINCODE (an integet field) starts with 56 (i.e. of a particular city/state)... The most obvious response is:
select * from customer where
Weren't you thinking exactly that? But unfortunately that case will avoid an index scan and has a casting overhead... Put your Matriculation Hat on your head... Think of the max and min no having 6 digits (i.e. in INDIA pincodes are 6 digit) which can starts with 56??? Does that help??? So now you got a better query:
select * from customer where pincode between 560000 and 569999
If you thought that off pretty early (before I gave the hint), hats off to you...
2)Now if you have a timestamp/date column (in DB2 9.7 date will store time as well till the seconds) to store the transaction date for a banking application (sorry guys but that is the only domain I have worked with), and you want to list all the transactions that happened on a particular date? Obvious query:
select * from transaction_posting_table where date(txn_posting_date)='2010/04/30'
Some one who has attended an Oracle Usability/Oracle Enablement session for DB2 might think
select * from transaction_posting_table where
'2010/04/30' is '2010/04/30' with time 12:00:00.0000AM and trunc will also convert all the date/timestamps to that day's 12:00:00.0000AM... Hence we get all the transactions of '2010/04/30' irrespective of its time...
As these queries go for table scan, people suggest of adding an extra column which is always computed based on date(txn_posting_date) and then index... Ahhh!!! Before that check this out:
select * from transaction_posting_table where txn_posting_date)>='2010/04/30' and txn_posting_date)<'2010/04/30'+1 Kwel???
'2010/04/30' is '2010/04/30' with time 12:00:00.0000AM
and '2010/04/30' +1 is '2010/05/01' with time 12:00:00.0000AM...
So you pick up any time on '2010/04/30' it will satisfy above condition... The trick is to move the computation from LHS (i.e. instead of converting all the date/timestamp to 12 midnight, search for all date/timestamps b/e two midnights, one inclusive and other exclusive )... :)
3) Another famous interview question is (I have never asked this one) "Write a query to find the 2nd max salary from emp table". But I know it will send across a smile on many faces... (Yeah I know that, I did that in my graduation viva!!!...) So is this what you are thinking?
select max(sal) as secnd_max_sal from emp where sal<
Yeah??? that is what you said in your graduation viva? Atleast I said that, but in an interview I will say differently... Well the problem is when you generate a plan you see two sort operation (I would frown at a query with one sort...)
select sal as secnd_max_sal from (select dense_rank() over(order by sal) rnk,sal from emp)tmp_rslt wher tmp_rslt.rnk=2;
(more extensible and works with just one sort and an rid scan!!! :) )
4) Another one that I have seen quite often, that people JOIN two table and the JOIN result is joined with third table on a condition where the key of third can be either matched to first or the second table... This happens more commonly when the first query is left outer join and the join condition can be on the key from first table or if it is null then on second column...
so a few optimization that comes to my mind (I will just write the JOIN clause):
>>A LEFT OUTER JOIN B on A.Aid=B.Bid JOIN C on
Better way to write is:
>>A LEFT OUTER JOIN B on A.Aid=B.Bid JOIN C on (C.Cid COALESCE(A.Aid,B.Bid))
Another scenario is
>>A JOIN B on A.Aid=B.Bid JOIN C on
This again might not go for an index scan on either of the column and might go for a hash join. A better way would be to Use IN... Wondering how???
>>A JOIN B on A.Aid=B.Bid JOIN C on C.Cid IN( B.b1,A.a1)
What say??? I hope this blog would have helped people understand negative/anti condition (I had talked just in brief about this in my ppt in DB2 has talent)... I will try to put up more such content if I come across one... Esp a lot of stuffs can be optimized with CASE statements...
So I hope next time a database developer writes a query, he will think that extra mile and a DBA reviews a SQL he will be more critical...
For those who want more before they can do on their own... Spend a few bucks and get this book on SQL Antipatterns... I havent yet read this but some one suggested it is on the same lines (and the Title of the book suggests the same)...