db2 performance: April 2011

SQL Anti logic!!!

Well I am just breaking my sequence of RUNSTATS and REORG for a while (I promise my next blog will be having that)...

Actually I have just been back from an interview session... I took a few interviews for filling up vacancies for java developer... I just asked few basic SQLs, the general approach people took for those SQLs raises a smile across my face... Actually those were the most obvious approach that comes to our mind when we try to reach/approach such problems... I have seen such patterns quite often in reports and prepared statement during my reviews (of SQLs used by development)... But if you stretch a little bit, try thinking the Anti conditions (wondering what they are??? Just wait...), you might get a more efficient option...
Let me share a few to guide you thru this...

I will try to skip the table structure etc (unless it is very much reqd) in order to avoid boring you guys/gals...

1) OK, if you have to find out all those customers (from a CUSTOMER table in bank database) whose PINCODE (an integet field) starts with 56 (i.e. of a particular city/state)... The most obvious response is:

select * from customer where ~~cast(pincode as varchar(6)) like '56%'~~

Weren't you thinking exactly that? But unfortunately that case will avoid an index scan and has a casting overhead... Put your Matriculation Hat on your head... Think of the max and min no having 6 digits (i.e. in INDIA pincodes are 6 digit) which can starts with 56??? Does that help??? So now you got a better query:
select * from customer where pincode between 560000 and 569999
If you thought that off pretty early (before I gave the hint), hats off to you...

2)Now if you have a timestamp/date column (in DB2 9.7 date will store time as well till the seconds) to store the transaction date for a banking application (sorry guys but that is the only domain I have worked with), and you want to list all the transactions that happened on a particular date? Obvious query:

select * from transaction_posting_table where date(txn_posting_date)='2010/04/30'

Some one who has attended an Oracle Usability/Oracle Enablement session for DB2 might think

select * from transaction_posting_table where ~~trunc(txn_posting_date)='2010/04/30'~~

'2010/04/30' is '2010/04/30' with time 12:00:00.0000AM and trunc will also convert all the date/timestamps to that day's 12:00:00.0000AM... Hence we get all the transactions of '2010/04/30' irrespective of its time...

As these queries go for table scan, people suggest of adding an extra column which is always computed based on date(txn_posting_date) and then index... Ahhh!!! Before that check this out:

select * from transaction_posting_table where txn_posting_date)>='2010/04/30' and txn_posting_date)<'2010/04/30'+1 Kwel???
'2010/04/30' is '2010/04/30' with time 12:00:00.0000AM
and '2010/04/30' +1 is '2010/05/01' with time 12:00:00.0000AM...

So you pick up any time on '2010/04/30' it will satisfy above condition... The trick is to move the computation from LHS (i.e. instead of converting all the date/timestamp to 12 midnight, search for all date/timestamps b/e two midnights, one inclusive and other exclusive )... :)

3) Another famous interview question is (I have never asked this one) "Write a query to find the 2nd max salary from emp table". But I know it will send across a smile on many faces... (Yeah I know that, I did that in my graduation viva!!!...) So is this what you are thinking?

select max(sal) as secnd_max_sal from emp where sal< ~~(select max(sal) from emp)~~

Yeah??? that is what you said in your graduation viva? Atleast I said that, but in an interview I will say differently... Well the problem is when you generate a plan you see two sort operation (I would frown at a query with one sort...)

select sal as secnd_max_sal from (select dense_rank() over(order by sal) rnk,sal from emp)tmp_rslt wher tmp_rslt.rnk=2;

(more extensible and works with just one sort and an rid scan!!! :) )

4) Another one that I have seen quite often, that people JOIN two table and the JOIN result is joined with third table on a condition where the key of third can be either matched to first or the second table... This happens more commonly when the first query is left outer join and the join condition can be on the key from first table or if it is null then on second column...
so a few optimization that comes to my mind (I will just write the JOIN clause):

>>A LEFT OUTER JOIN B on A.Aid=B.Bid JOIN C on ~~(A.Aid=C.Cid or B.Bid=C.Cid~~)

Better way to write is:
>>A LEFT OUTER JOIN B on A.Aid=B.Bid JOIN C on (C.Cid COALESCE(A.Aid,B.Bid))

Another scenario is
>>A JOIN B on A.Aid=B.Bid JOIN C on ~~(A.a1=C.Cid or B.b1=C.Cid)~~

This again might not go for an index scan on either of the column and might go for a hash join. A better way would be to Use IN... Wondering how???
>>A JOIN B on A.Aid=B.Bid JOIN C on C.Cid IN( B.b1,A.a1)

What say??? I hope this blog would have helped people understand negative/anti condition (I had talked just in brief about this in my ppt in DB2 has talent)... I will try to put up more such content if I come across one... Esp a lot of stuffs can be optimized with CASE statements...

So I hope next time a database developer writes a query, he will think that extra mile and a DBA reviews a SQL he will be more critical...

For those who want more before they can do on their own... Spend a few bucks and get this book on SQL Antipatterns... I havent yet read this but some one suggested it is on the same lines (and the Title of the book suggests the same)...

RUNSTATS and REORG-2: How to automate runstats

Hi guys!!! I am back... n back as an IBM Information Champion (yup got selected for 2011 :) ) !!!
Please rate this blog as informative/interesting/cool/all/two of these once you are done... Any comments on improvement are appreciated and will be welcomed...

Well in last post I talked about importance of runstats and reorg for a db2 database...

just a recap, reorg keeps the data pages near by for same table hence reducing disk seek n runstats will keep the stats updated so helping the optimizer make best use of available indexes and generate better plan...

well... automating runstats is pretty simpler... I will take you little deep into this automation n its impact...
Following are the automatic maintainence parameters in db2...
__________________________________________________________________
Automatic maintenance (AUTO_MAINT)
|
|-- Automatic database backup (AUTO_DB_BACKUP)
|
|-- Automatic table maintenance (AUTO_TBL_MAINT)
|
|---- Automatic runstats (AUTO_RUNSTATS)
|
|-------- Automatic statement statistics (AUTO_STMT_STATS)
|
|---- Automatic statistics profiling (AUTO_STATS_PROF)
|
|--------- Automatic profile updates (AUTO_PROF_UPD)
|
|---- Automatic reorganization (AUTO_REORG)
__________________________________________________________________
So it must be clear that above is an hierarchy... To set ON a child you need to set ON the parent first...
Now lets understand the parameters we are concerned about...

1) First one is AUTO_RUNSTATS
If you set it effective (not just ON... guess what i mean by this???), it will automatically keep doing runstats on tables...
And do not worry, this will not impact your other db2 processes by more that 7-10% (no matter what you have set as UTIL_IMPACT_PRIORITY)... and it will not keep doing this every time you do a CRUD operation... It will do a runstats only on those tables whose data has changed more than 10-15%...good news is db2 luw v9.7 will switch it on automatically by default for any new db created...

2) Next will be automatic statement statistics
If you set this ON, it will try to collect the stats synchronously for any CRUD operation you do... In fact at times it may cause your CRUD operation to wait... It is a real time stats gather...
Well here are a few SNAP DB monitor elements, which might help you decide best about these parameters...
i) STATS_FABRICATIONS Vs ASYNC_RUNSTATS+SYNC_RUNSTATS is STATS usage TO STATS GATHER ratio,
ii) SYNC_RUNSTATS vs SYNC_RUNSTATS+ASYNC_RUNSTATS is FRACTION of SYNC RUNSTAT
iii) STATS_FABRICATE_TIME Vs SYNC_RUNSTATS_TIME is usage stats compared to SYNC update of STATS
iv) SYNC_RUNSTATS_TIME Vs ELAPSED_EXEC_TIME_S & ELAPSED_EXEC_TIME_MS will give you impact of SYNC STAT COLLECTION ON STMT execution
BY default his parameter is ON... Well I had used the monitor elements above and was surprised that 22% of txn were getting delayed due to synchronous stats updates... 30% of stats were being updated synchronously...I leave it to you to decide what to do when you run into this...

3) Third is Automatic statistics profiling and Automatic profile updates
Switching on the automatic stats profiling, will turn on stats profile generation...
but you can not set it on if your db cfg section_actual is ON...
Would like to make a point that generally auto_runstats will respect the user profile if user has run a manual runstats with some profile...
Automatic Profile update will update the runstats profile with recommendations...
If it is off profile recommendations generated are stored in opt_feedback_ranking...
Both of these are off by default...
Will suggest keeping them off to let auto_runstats pick up the default profile... or run a manual runstats with a specific profile for a table n auto_runstats will keep it in mind...

One more way of statistics gathering, is to do a run-stats in idle time, with a background scheduled job... A few months ago, I would have preferred that, when my seniors explained me that...

"db2 does it automatically n won't impact the current processes by much fraction... More over if you automate it urself, you will update stats for all the tables, even if they did not change much (or not at all) so its better we leave it to db2..."

So this is all about automating the runstats...
I will get back on auto maint parameters for automatic reorg and how to identify when to do a reorg...

For people interested in exploring more...
--> Go thru this info center page for more content on auto_maint parameters...
--> Go thru this article on dev works... It explains the automatic maintainence parameters for statistics gathering quit well...

RUNSTATS and REORG-1: How important they are

Hi guys...

So I am back... little quick this time...
So this is final post in my pseudo participation blog for DB2 Night Show (I had an early exit :-( after qualifying for finals)...
Congrats to Norberto for winning the competitions... Check his blog for tips...
Congrats to JB, Momi and Thiru as well for making to final...
I am looking forward to more such opportunities in future...

So I have come with some tips for people (people as the DBA community) to understand how important is the usage of two DB2 commands RUNSTATS and REORG, more importantly how you can determine when to REORG/RUNSTATS on which tables...

1) RUNSTATS are important for your tables, as they keep the stats updated for your table. This will always help your optimizer generate a better plan, rather the apt plan for fetching the data. I have seen situation when a filtered fetch goes for a table scan for 0.1-1million records even though column in filter is indexed...

2) REORG is equally important as it helps in putting the data in continous chunks... Generally for better insert performance, the data is put where you have space rather than putting it in a place (or near it) where other data of the table... The placement of data is also dependent on availability of space in nearest extent/block... If your data is too scattered across, there might be too much of disk seek involved and might as well degrade the performance of sequential and concurrent fetch...

Well you can AUTOMATE your task to certain extent and use automatic maintainance database configuration for auto runstats and auto reorg...

I shall come up with more on pros and cons of using those parameters and ways to optimize your RUNSTATS and REORG, in my future blogs...

I would like to thank Susan and Sathey for their continuous support, encouragement and tips for improvement on by blogging...

I will keep my blogs little short and will split the massive overdose of info over several blogs (thanks to Sathey)...

Latter I will also try to split my previous long posts into several blogs, which shall help my new subscriber...

My db2 success story: Five things to check when you see high memory usage on db server

Hi, so I am back with my blog sequence of db2 has talent... I know I have been little late with this one... But can not help it when India is playing cricket world cup... They finally won it... After 28 long years... First host nation to have won it... So had given up everything else for following the team create history...

by thw way the grand finale concluded recently and the last four finalists are waiting for your votes... There are some really cool tips to adopt in your dba life and that can help you lift your quality of work... so do see the replay and vote before 7th april...

I hope Scott and DBI will surely gonna come up with more of this... Some of you might find it interesting that JB (of the contestant who made it to mega finals) has been invited by Susan Visser to write a book... TO get a glimpse of her check this out, she was Susan's guest blogger...

Well let me continue with my would had been presentation in the episode just before the grand finales... All the contestants talked about their success story, where they concentrated on one single thing with DB2 that made them hero (well this is what I assume from all the presentations)...

So, for me being a junior DBA, there had not been many such moments... One i have already shared in my previous blog... Let me attempt to share another one...

We had a client running db2 8.2.8, and we had planned for migration to db2 9.5... But before the transition could kick-off there was a big performance bottleneck... All of a sudden they were reporting high I/O, high memory consumption and very high swap-in swap-out activity...
Eyebrow were raised and much like any other issue, this too made people point towards database... Our team was consulted, and my seniors being on leave on that day, I was involved in lot of mail exchanges that day... Let me share the information I requested for:

First Step: I requested for snapshot of memory usage by db2 instance and db2 database hosting the production db:
db2mtrk -i -d [-v]

It confirmed that the db was not using too much of memory, it was using whatever was allocated to bufferpool and cache-pool and db shared memory...

Second thing I did was a request to check the tablespaces where the tables are lying and the bufferpool used by all the different tablespaces we had suggested

This confirmed that not all the tables are in the same tablespace, and also all tablespaces are not pointing to the same bufferpool. This helped me overcome the doubt that, may be only one bufferpool is being used for all the I/O, other bufferpools have simply reserved there share of memory but are not using it... Moreover one bufferpool being used for all tabels/tablespaces means too many flush events...

third step of my investigation was to check if there is CIO (oncurrent I/O) enabled at the filesystem level and to check if the filesystem chaching is switched off at tablespace level...

the result confirmed the cause for high I/O and too much of memory usage...
we saw that there was filesystem caching enabled for all the tablespaces (shit!!! the db2 8.2.8 version had this as default, so guys take my advice and migrate to new version, where default is to create tablespaces with filesystem level caching off)... Moreover the CIO was disabled at Unix level as well...
If you have filesystem level caching enabled, what happens is your data is once cached by the OS in memory and then database (i.e. DB2 application) will cache is again in the bufferpool area created by db2 in the application memory allocated to it by kernel... firstly it causes to much of memory to be used, and secondly there is more time wasted in double caching :P...

Correcting this helped alot...

fourth thing to be inspected was prefetch size at db and tablespace level, if this value is not apt there might be too much of I/O waits involved (though this was not in the problem definition, i thought it will be good to check it) the monitor element tablespace_prefetch_size shall help you get that info (i had to use the table-function interface, as I was working with 8.2.8 version, it is much easier to get in v9.7)... Ideally 1.5-3 times your block size is a good number for prefetch data...

fifth thing I confirmed was to check if there is a scope given for block I/O or not... Actually you can reserve some space in your bufferpool for blocked I/O while creating the bufferpool... Blocked I/O is when DB2 performs sequential prefetching of pages into the block area of the buffer pool...
you can check the performance of blocked I/O using block_ios and vertored_ios monitor elements belonging to your bufferpool...
For optimal performance, it is recommended (source db2 info center for v8) that you bind tablespaces with the same extent size to a block-based buffer pool with a block size equal to the extent size...

These all things can help you as well, when you see a shooting high increase in memory usage by db2, or high memory usage on server hosting db2 database or a very high I/O or too much of I/O waits...

querying all these parameters regularly and fixing them (even when, rather esp when you are not facing any issues) will help you keep memory and I/O issues away...

My mantra for db2 DBAs in production is a snapshot a day keeps performance issues aways... i.e. run one snapshot a day and work on the results, fix the problems you anticipate out of their results...

Well I am myself looking forward to DBI's most awaited webinar on DB2 LUW Performance: 10 SQL Snapshots You MUST Run Today!
Register Now!!!

To end it I hope DBI won't mind me pretending to be a pseudo contestant despite being out of the show now... With all due regards I apologize, if in any way I have overlooked any term/conditions/legality of this event which is a mega success... there will be many to follow...

db2 performance

About Me

Connect to me

SQL Anti logic!!!

RUNSTATS and REORG-2: How to automate runstats

RUNSTATS and REORG-1: How important they are

My db2 success story: Five things to check when you see high memory usage on db server