db2 performance: How to convey performance metrics from SNAPSHOT views...

I had recently participated in db2 has talent and was lucky to be selected for finals... I missed the finals though due to my travel schedule... But I will write my next few blog posts with the content I would have presented if I would have reached till last episode... And you do watch the actual show (it has tips much more kwel)

To start with I will write on Performance metrics from snapshot info which can help you identify performance issues (especially before you hit any bottle necks)...

Sometime back I had been involved in a performance tuning exercise. We tuned all the top expensive SQL, we added a few new indexes as well (a lot is identified during development itself, a good practice to have), we had fixed all the individual SQLs which were taking time more than the benchmark. Despite all this, overall system performance still could not come up... I decided to take up the task of monitoring the database while performance load tests... That was an adventurous experience... I had decided to track the following:

1) SQLs which are although below the benchmark but are taking more time compared to
what they should be taking
2) SQLs which are spending too much of time in sorting the data
3) SQLs which are executed with a misguided plan (i.e. not getting proper statistics)
4) SQLs which are spending too much time on I/O and fetching lesser no of rows i.e.
which very high I/O time per row fetched
5) I started looking for no of data pages and index pages in bufferpool that are read
and written by a query (trust me you can not get a better metrics for identifying
table scans)
6) One thing which I had not earlier not thought of but eventually became my concern
was SQL which get executed only once or twice
7) SQLs which are executed over and over and total execution time was too much
8) How efficient is log write and log reads
9) Are there too long transactions and what is the frequency of commit

After this I had to explore a lot on which snapshot view has what info, which can help me arriving at the correct conclusion regarding above points... Going thru all the snapshot views and their columns helped me adding a few of above parameters (which i didnt had on my list when I started)

Following metrics helped me get the stats I was looking for:
Following statements helped me get some metrics for statements:

1) SUM(TOTAL_SORT_TIME)/NULLIF(SUM((STMT_USR_CPU_TIME_S*1000000) + STMT_USR_CPU_TIME_MS) + SUM(TOTAL_SORT_TIME),0 )
AS TOTAL_SORT_TIME_FRACTION

2) SUM((POOL_TEMP_INDEX_L_READS+POOL_TEMP_DATA_L_READS) - (POOL_TEMP_DATA_P_READS+POOL_TEMP_INDEX_P_READS))/NULLIF(SUM(POOL_TEMP_INDEX_L_READS+POOL_TEMP_DATA_L_READS),0)
AS TOTAL_TEMP_BP_UTIL

3) SUM(POOL_DATA_P_READS)/(NULLIF(SUM(POOL_DATA_L_READS),0)) TOTAL_DATA_BP_REFRESH

4)QUERY_CARD_ESTIMATE AS EXPECTED_CARDINALITY,
ROWS_READ, where EXPECTED_CARDINALITY/ROWS_READ>=1.5

5) SUM(CAST(total_sort_time as decimal(24,10)))/NULLIF(SUM((STMT_USR_CPU_TIME_S*1000000) + STMT_USR_CPU_TIME_MS),0 )
AS TOTAL_SORT_TIME_FRACTION

Following metrics helped me check the dynamic sqls (from SNAPDYN_SQL snapshot view):

6) CAST(TOTAL_SORT_TIME*(NUM_EXECUTIONS)as decimal(24,10))/NULLIF(PREP_TIME_BEST * NUM_COMPILATIONS,0) AS SORT_TO_COMPILE_RATIO

7) CAST(((PREP_TIME_WORST + PREP_TIME_BEST)*NUM_COMPILATIONS)as decimal(24,10))/NULLIF(2*NUM_EXECUTIONS,0) COMPILE_EFFICIENCY,

8) cast(SUM((POOL_TEMP_INDEX_L_READS+POOL_TEMP_DATA_L_READS) - (POOL_TEMP_DATA_P_READS+POOL_TEMP_INDEX_P_READS)) as decimal(24,10))/NULLIF(SUM(POOL_TEMP_INDEX_L_READS+POOL_TEMP_DATA_L_READS),0)
AS TOTAL_TEMP_BP_UTIL

Following are helpful metrics for monitoring locking performance (from SNAPDB):

9) LOCK_ESCALS LOCK_ESCALATIONS,

10) DEADLOCKS/NULLIF(LOCK_WAITS,0) DEADLOCK_PER_LOCK,

11) X_LOCK_ESCALS EXCLUSIVE_LOCL_ESCAL,

12) LOCKS_WAITING/NULLIF(LOCKS_HELD,0) LOCK_ISSUES_PER_LOCK,

13) DEADLOCKS/NULLIF(COMMIT_SQL_STMTS,0) DEADLOCK_PER_STMNT

Following are some good metrics to look for while measuring i/o efficiency (from SNAPDB)

14) cast(POOL_DRTY_PG_STEAL_CLNS*1000 as decimal(24,10))/ NULLIF(COMMIT_SQL_STMTS,0)
DIRTY_PAGE_STEAL_FRACTION,

15) cast(DIRECT_READS as decimal(20,10)) / NULLIF(DIRECT_READ_REQS,0) DIRECT_READ_EFFECIENCY,

16) cast(float(NUM_LOG_WRITE_IO)/NULLIF(LOG_WRITES,0) as decimal(20,10)) LOG_IO_PER_WRITE,

17) cast(float(NUM_LOG_READ_IO)/NULLIF(LOG_READS,0) as decimal(20,10)) LOG_IO_PER_READ,

18) cast(NUM_LOG_BUFFER_FULL as decimal(20,10))/NULLIF(LOG_WRITES,0) LOG_BUFF_UTIL,

19) CAST(LOG_WRITES AS DECIMAL(20,10))/NULLIF(COMMIT_SQL_STMTS,0) LOG_WRITE_EFFECIENCY,

20) cast(NUM_LOG_DATA_FOUND_IN_BUFFER as decimal(20,10))/NULLIF(NUM_LOG_READ_IO+NUM_LOG_DATA_FOUND_IN_BUFFER,0) AS LOG_BUFF_READ_EFFICIENCY,

21) cast(LOG_HELD_BY_DIRTY_PAGES *100 as decimal(24,10))/(10240*4096) AS LOG_IOCLEANERS_EFFICIENCY,

22) cast(UNREAD_PREFETCH_PAGES as decimal(24,10))/NULLIF(PREFETCH_WAIT_TIME,0) AS WASTEFULL_PREFETCH_PER_WAIT_MSEC

23) cast(POOL_NO_VICTIM_BUFFER as decimal(24,10))/NULLIF((POOL_INDEX_P_READS+POOL_DATA_P_READS),0) AS BP_CLEANER_FAILURE_FRACTION

24) --Overall Bufferpool Utilization per bufferpool
SELECT
SNAPSHOT_TIMESTAMP,
DB_NAME,
BP_NAME,
TOTAL_LOGICAL_READS,
TOTAL_PHYSICAL_READS,
TOTAL_HIT_RATIO_PERCENT,
INDEX_HIT_RATIO_PERCENT,
DATA_HIT_RATIO_PERCENT,
DATA_LOGICAL_READS,
DATA_PHYSICAL_READS,
INDEX_LOGICAL_READS,
INDEX_PHYSICAL_READS
FROM BP_HITRATIO
where DATA_HIT_RATIO_PERCENT<90 OR INDEX_HIT_RATIO_PERCENT<95;

25)
SELECT
SNAPSHOT_TIMESTAMP,
BP_NAME,
CAST(VECTORED_IOS as decimal(24,10))/NULLIF((POOL_DATA_P_READS+POOL_INDEX_P_READS),0) AS VECTORED_IO_FRACTION,
CAST(PAGES_FROM_VECTORED_IOS as decimal(24,10))/NULLIF((PAGES_FROM_VECTORED_IOS+PAGES_FROM_BLOCK_IOS),0) AS VECTORED_PAGE_READ_FRACTION,
CAST(UNREAD_PREFETCH_PAGES as decimal(24,10))/NULLIF(PAGES_FROM_BLOCK_IOS,0) AS WASTEFULL_PREFETCH_FRACTION,
CAST((POOL_ASYNC_DATA_READ_REQS + POOL_ASYNC_INDEX_READ_REQS) as decimal(24,10))/NULLIF((POOL_DATA_P_READS+POOL_INDEX_P_READS)-VECTORED_IOS,0) AS ANSYNC_PREFETCH_EFFICIENCY,
CAST(POOL_NO_VICTIM_BUFFER as decimal(24,10))/NULLIF((POOL_INDEX_P_READS+POOL_DATA_P_READS),0) AS BP_CLEANER_FAILURE_FRACTION
FROM SNAPBP;

Following parameters helped me get some imp points about how fine are the automatic maintainence parameters working

26)
cast(STATS_FABRICATIONS as decimal(24,10))/NULLIF((ASYNC_RUNSTATS+SYNC_RUNSTATS),0) AS STATS_USE_TO_GATHER_RATIO,

27)
cast(SYNC_RUNSTATS as decimal(24,10)) /NULLIF((SYNC_RUNSTATS+ASYNC_RUNSTATS),0) SYNC_RUNSTAT_FRACTION,

28)
cast(STATS_FABRICATE_TIME as decimal(24,10))/NULLIF(SYNC_RUNSTATS_TIME,0) AS STATS_SYNC_COMPUTE_UTIL,

29)
cast(SYNC_RUNSTATS_TIME*1000000 as decimal(24,10))/NULLIF(ELAPSED_EXEC_TIME_S+ELAPSED_EXEC_TIME_MS*1000000,0) AS SYNC_STAT_COLLECT_TIME_FRACTION

30) Two parameters in SYSIBMADM.SNAPDYN_SQL which can help you see if the queries provided by development is properly parameterized or not:

NUM_EXECUTIONS and NUM_COMPILATIONS

Following are some important monitor elements, which can help you see dynamic (SNAPDYN_SQL) or static sql (from SNAPSTMT) which are having heavy table scan operation

31)
CAST((POOL_DATA_L_READS-POOL_DATA_P_READS) as decimal(24,10))/NULLIF(POOL_DATA_L_READS,0) DATA_BP_UTIL,

32)
CAST(POOL_DATA_P_READS as decimal(24,10))/NULLIF(POOL_DATA_L_READS,0) DATA_BP_REFRESH,

33)
CAST((POOL_INDEX_L_READS-POOL_INDEX_P_READS)as decimal(24,10))/NULLIF(POOL_INDEX_L_READS,0) INDX_BP_UTIL,

34)
CAST(POOL_INDEX_P_READS as decimal(24,10))/NULLIF(POOL_INDEX_L_READS,0) INDX_BP_REFRESH,

35)
CAST((POOL_TEMP_INDEX_L_READS+POOL_TEMP_DATA_L_READS) - (POOL_TEMP_DATA_P_READS+POOL_TEMP_INDEX_P_READS) as decimal(24,10))/NULLIF(POOL_TEMP_INDEX_L_READS+POOL_TEMP_DATA_L_READS,0)
AS TOTAL_TEMP_BP_UTIL,

I hope above metrics and monitor elements help any one who reads this blog. I have kept the column alias as elaborate (to convey the meaning they carry) as possible. I might have ignored the parameters like STMT_USR_CPU_TIME_S and STMT_USR_CPU_TIME_MS but they are equally important. These snapshot elements mentioned by me will help you super tune your application, once you have already tuned the long running sqls and have indexed the database properly!!!

Would like to acknowledge the encouragements received from reader of previous blogs (the increasing no of views every time i opened this page is a boost factor) and Susan Visser (whose tweets inspired me to continue writing the blog)!!!

Will continue writing!!! db2expln is next to come!!!

db2 performance

About Me

Connect to me

How to convey performance metrics from SNAPSHOT views...

No comments:

Post a Comment