About Me

My photo
Singapore, Singapore, Singapore
I am currently working as a DB Consultant with Ashnik. We are into Open Source and Cloud based enterprise class solutions. Prior to this I was working as a DBA @ Misys India (Banking Services). Working on IBM DB2, Oracle Microsoft SQL Server. Selected as IBM Champion for Data Management 2011. I like bit of photography, cycling, poetry and I just love gadgets!!

Connect to me

See my profile on LinkedIn Visit my Ashnik Website

SQL Anti logic!!!

Well I am just breaking my sequence of RUNSTATS and REORG for a while (I promise my next blog will be having that)...

Actually I have just been back from an interview session... I took a few interviews for filling up vacancies for java developer... I just asked few basic SQLs, the general approach people took for those SQLs raises a smile across my face... Actually those were the most obvious approach that comes to our mind when we try to reach/approach such problems... I have seen such patterns quite often in reports and prepared statement during my reviews (of SQLs used by development)... But if you stretch a little bit, try thinking the Anti conditions (wondering what they are??? Just wait...), you might get a more efficient option...
Let me share a few to guide you thru this...

I will try to skip the table structure etc (unless it is very much reqd) in order to avoid boring you guys/gals...

1) OK, if you have to find out all those customers (from a CUSTOMER table in bank database) whose PINCODE (an integet field) starts with 56 (i.e. of a particular city/state)... The most obvious response is:

select * from customer where cast(pincode as varchar(6)) like '56%'

Weren't you thinking exactly that? But unfortunately that case will avoid an index scan and has a casting overhead... Put your Matriculation Hat on your head... Think of the max and min no having 6 digits (i.e. in INDIA pincodes are 6 digit) which can starts with 56??? Does that help??? So now you got a better query:
select * from customer where pincode between 560000 and 569999
If you thought that off pretty early (before I gave the hint), hats off to you...

2)Now if you have a timestamp/date column (in DB2 9.7 date will store time as well till the seconds) to store the transaction date for a banking application (sorry guys but that is the only domain I have worked with), and you want to list all the transactions that happened on a particular date? Obvious query:

select * from transaction_posting_table where date(txn_posting_date)='2010/04/30'

Some one who has attended an Oracle Usability/Oracle Enablement session for DB2 might think

select * from transaction_posting_table where trunc(txn_posting_date)='2010/04/30'

'2010/04/30' is '2010/04/30' with time 12:00:00.0000AM and trunc will also convert all the date/timestamps to that day's 12:00:00.0000AM... Hence we get all the transactions of '2010/04/30' irrespective of its time...

As these queries go for table scan, people suggest of adding an extra column which is always computed based on date(txn_posting_date) and then index... Ahhh!!! Before that check this out:

select * from transaction_posting_table where txn_posting_date)>='2010/04/30' and txn_posting_date)<'2010/04/30'+1 Kwel???
'2010/04/30' is '2010/04/30' with time 12:00:00.0000AM
and '2010/04/30' +1 is '2010/05/01' with time 12:00:00.0000AM...

So you pick up any time on '2010/04/30' it will satisfy above condition... The trick is to move the computation from LHS (i.e. instead of converting all the date/timestamp to 12 midnight, search for all date/timestamps b/e two midnights, one inclusive and other exclusive )... :)

3) Another famous interview question is (I have never asked this one) "Write a query to find the 2nd max salary from emp table". But I know it will send across a smile on many faces... (Yeah I know that, I did that in my graduation viva!!!...) So is this what you are thinking?

select max(sal) as secnd_max_sal from emp where sal< (select max(sal) from emp)

Yeah??? that is what you said in your graduation viva? Atleast I said that, but in an interview I will say differently... Well the problem is when you generate a plan you see two sort operation (I would frown at a query with one sort...)

select sal as secnd_max_sal from (select dense_rank() over(order by sal) rnk,sal from emp)tmp_rslt wher tmp_rslt.rnk=2;

(more extensible and works with just one sort and an rid scan!!! :) )

4) Another one that I have seen quite often, that people JOIN two table and the JOIN result is joined with third table on a condition where the key of third can be either matched to first or the second table... This happens more commonly when the first query is left outer join and the join condition can be on the key from first table or if it is null then on second column...
so a few optimization that comes to my mind (I will just write the JOIN clause):

>>A LEFT OUTER JOIN B on A.Aid=B.Bid JOIN C on (A.Aid=C.Cid or B.Bid=C.Cid)

Better way to write is:
>>A LEFT OUTER JOIN B on A.Aid=B.Bid JOIN C on (C.Cid COALESCE(A.Aid,B.Bid))

Another scenario is
>>A JOIN B on A.Aid=B.Bid JOIN C on (A.a1=C.Cid or B.b1=C.Cid)

This again might not go for an index scan on either of the column and might go for a hash join. A better way would be to Use IN... Wondering how???
>>A JOIN B on A.Aid=B.Bid JOIN C on C.Cid IN( B.b1,A.a1)


What say??? I hope this blog would have helped people understand negative/anti condition (I had talked just in brief about this in my ppt in DB2 has talent)... I will try to put up more such content if I come across one... Esp a lot of stuffs can be optimized with CASE statements...

So I hope next time a database developer writes a query, he will think that extra mile and a DBA reviews a SQL he will be more critical...

For those who want more before they can do on their own... Spend a few bucks and get this book on SQL Antipatterns... I havent yet read this but some one suggested it is on the same lines (and the Title of the book suggests the same)...

6 comments:

  1. Great post! Looking forward for a sequel of this post ! - Sathy

    ReplyDelete
  2. thnx sathy!!! Once I have ample example, on SQL tuning I shall post a new one!!!

    ReplyDelete
  3. I think query to find 2nd max salary should be
    select sal as secnd_max_sal from (select dense_rank() over(order by sal DESC) rnk,sal from emp)tmp_rslt wher tmp_rslt.rnk=2

    Also I would like to know what exactly does dense_rank() mean...is it giving serial numbers (1,2...N) to the columns ordered in over(...) clause. And in the result produced is rank=10 for a row means there are 9 rows which satisfy condition in over(..) clause?

    ReplyDelete
  4. Hey Saurabh, thanks for the correction. You are correct. I missed a DESC in the over clause.

    dense_rank and rank functions gives a rank to the data set as per the order by clause. Further you can partition on the ranking. e.g if you have a table which stores student_name, subjectcode, score then you can do

    select student_name, subjectcode, dense_rank()over(order by score partition by subjectcode) from my_table;


    this will give rank 1 to highest scorer of each subject.


    it is more or less similar to row_number, but row_number will give serial two different serial numbers to two person who have same score.

    e.g. if Anita and Rita have 90 in History, and Amar has 80 they will be "row_numbered" as

    Rita history 1
    Anita history 2
    Amar history 3


    when instead of row_number you use rank() they will be "ranked" as

    Rita history 1
    Anita history 1
    Amar history 3

    notice that rank did a justice with both the toppers :)
    But poor amar, though actually his score is 2nd highest got 3rd rank cause 1 rank was taken up by 2 people. This usually helps in getting points table for leagues (e.g. in Cricket of football leagues, top two teams with same points and 3rd team with 2nd highest point will be awarded).


    Now to be a little more fair with Amar you may use dense_rank (which is also required when using 2nd max salary and even in case of athletics, where if two people cross the line at exact same fraction of second, both should get gold). so we "dense_ranked" to give some more credit to Amar for his score and we got



    Rita history 1
    Anita history 1
    Amar history 2

    I hope it is clear now. If you use partition by you may get something like this

    Name SubjectCode Score Rank
    Amar HIST 85 2
    Rita HIST 90 1
    Anita HIST 90 1
    Rita ENG 90 3
    Anita ENG 92 2
    Amar ENG 95 1


    ReplyDelete