db2 performance: MERGE: MAKE Your UPSERTs quick!!!

Hi guys... I am back... this time as a Advanced DB2 9.7 DBA... Thanks to Sathy for helping me prepare for the exam and Beth Flood for giving me a free voucher (otherwise I never had guts to put 5k on stake for my capabilities)... But very soon an all new DB2 is expected I am hope it will have hell lot of new features and I will have to upgrade or take the certification exams from scratch...

Well recently I have been working with some scenarios where people came to me with an update statement which was working very slow... When I looked at it, I said "I need to tell people there is something like MERGE"... There was one thing common, all the statements were having correlated sub-queries for set value... I replaced their queries with a MERGE and it worked like wonder...

Let's see some example...

UPDATE Table1
SET T1Col1=(select T2Col1 from Tables2 where Table1.T1IDPK=T2IDPK);

So the problem with this query is the inner query will be executed for every row of table Table1...

So I wrote them a query MERGE, which shall first create a HASH JOIN and then do an RID based update:

MERGE INTO Table1
USING Table2 on (Table1.T1IDPK=Table2.T2IDPK)
WHEN MATCHED THEN UPDATE
SET Table1.T1Col1=Table2.T2Col1;

Another scenario was of a process where a scheduled process will pick up rows from a table (which has designated pagenumbers/rownumers) in batches and after processing certain number of rows in every batch the scheduled process will sleep... Remaining rows shall be renumbered and partitioned in batches...

UPDATE PagedTab t1
set SEQNUM=(select row_number() over(order by PagedTab_IDPKColumn) from PagedTab t2 where t1.PagedTab_IDPKColumn=t2.PagedTab_IDPKColumn);

So I just changed it to below:

MERGE INTO PagedTab t1
USING (select PagedTab_IDPKColumn,row_number() over(order by PagedTab_IDPKColumn) as rnum from PagedTab ) t2
on t1.PagedTab_IDPKColumn=t2.PagedTab_IDPKColumn
WHEN MATCHED THEN UPDATE
SET t1.SEQNUM=t2.rnum;

I have simulated the below scenrio to make it more easy to be understood as the actual was more complex...
Third place was where there was a Mater-Detail kind (Dependent Entity) kind of relationship and some details were stored in Master Table and for every IDPK in master table one and only one record existed in detail table... ACCOUNTID and LASTACCESSDATE and STATUS were some of the details... some one wanted to first find out the LATEST record (based on date) from detail table for every ACCOUNTID and then find out primary key for that record and then for that PK whatever status is found in detail had to bee copied to master... So here goes the query people will write in first attempt...

UPDATE MasterTable outTab
SET STATUS=
(select STATUS from DetailsTable t2 join
(select ACCOUNTID, MAX(LASTACCESSDATE) DT from DetailTable group by ACCOUNTID)t1
on t1.ACCOUNTID=t2.ACCOUNTID and t2.LASTACCESSDATE=t1.DT and t2.IDPK=outTab.IDPK
)temp;

Actually the query which was written had used one more IN clause which I have converted to JOIN to avoid more confusion...

I modified it...

MERGE into MasterTable t1
USING
(select IDPK, DENSE_RANK() over(PARTITION BY ACCOUNTID order by LASTACCESSDATE) rn, STATUS from DetailsTable) t2
ON (t1.IDPK=t2.IDPK and rn=1)
WHEN MATCHED THEN UPDATE
SET STATUS=t2.STATUS;

And this avoided a join and sort which was being done for row... Did a single sort a hash join and an update based on RID... Wow!!!

One final instance where I saw something like this:

UPDATE TableX X
SET
Col1=(select Col1 from TableY Y where X.IDPK=Y.IDPK),
Col2=(select Col2 from TableY Y where X.IDPK=Y.IDPK),
Col3=(select Col3 from TableY Y where X.IDPK=Y.IDPK),
Col4=(select Col4 from TableY Y where X.IDPK=Y.IDPK)
;

and then

INSERT INTO TableX (COl1, COl2, COl3, COl4)
Select COl1, COl2, COl3, Col4 from TableY Y
where not exists (select 1 from TableX X where X.IDPK=Y.IDPK);

and finally...

DELETE FROM TableX X where not exists (select 1 from TableY where X.IDPK=Y.IDPK);

I will leave it upto the readers to interpret the purpose of these statements... I will just give an alternate query...

MERGE INTO TableX X
USING TableY Y on (X.IDPK=Y.IDPK)
WHEN MATCHED then UPDATE
SET
X.COl1=Y.COl1,
X.COl2=Y.COl2,
X.COl3=Y.COl3,
X.COl4=Y.COl4
WHEN NOT MATCHED THEN INSERT(COl1, COl2, COl3, COl4)
VALUES (Y.COl1, Y.COl2, Y.COl3, Y.COl4)
;

This has combined the UPDATE and INSERT statements and is much more efficient cause the hash join is executed once as opposed to execution of 4 correlated sub-queries for every row in TableX for Update and then for INSERT an anti-join is performed...

The delete can not be optimized or combined in merge which is a little disappointing and I hope to see this feature in DB2 very soon... A clause of "WHEN NOT MATCHED BY SOURCE" exists in SQL Server using which I could have just said:

MERGE INTO TableX X
USING TableY Y on (X.IDPK=Y.IDPK)
WHEN MATCHED then UPDATE
SET
X.COl1=Y.COl1,
X.COl2=Y.COl2,
X.COl3=Y.COl3,
X.COl4=Y.COl4
WHEN NOT MATCHED THEN INSERT(COl1, COl2, COl3, COl4)
VALUES (Y.COl1, Y.COl2, Y.COl3, Y.COl4)
WHEN NOT MATCHED BY SOURCE THEN DELETE
;

Well this blog is bloated too much and I do not want to bore people, you can have multiple WHEN MATCHED and WHEN NOT MATCEHD clauses and combine with another condition (AND col1=col2) and have different different action for every branch, hence you will write one single statement instead of writing one update for every condition... I will try to put some examples sometime...

BTW thanks to all my readers... I feel lucky and honored to be listed in "My Favorite DB2 Blogs" by Troy Coleman @ db2tutor

If you are reading this Mr. Coleman, trust me db2tutor and db2geek are my frist reference points for any query... :) and thanks for putting me on the list...

12 comments:

AyubJanuary 2, 2012 at 8:30 AM
How can I delete the not matched record. After the update and Insert I have written like this

DELETE where id = 0 ;

I am gettin this err:

An unexpected token "DELETE where id = 0" was found
following "l, null, null, null)". Expected tokens may include: "".
LINE NUMBER=301. SQLSTATE=42601
SameerJanuary 3, 2012 at 8:23 AM
Hey what do you intent to do?? Do you want to delete the records which are not matched with any record in SOURCE (SOURCE is the table/resultset one mentions in USING clause)? That facility is not there in DB2. :(
SQL Server allows you doing that.

It will be a nice to have feature in DB2.
PratikAugust 20, 2012 at 9:12 PM
Hi ,

I'm trying to execute a merge as a prepared statement from java.
It is a simple merge wherein a record is inserted if an id is not present or the same is updated.
But when I run the application the merge executes fine , that is without any errors but I don't see any insert or update in the concerned tables.

Would you be able to help please?
PratikAugust 22, 2012 at 12:37 PM
Hi ,

Thanks so much for your reply.That helped.
I was also trying something as follows

MERGE INTO TABLE USING (DATA)
ON (CONDITION)
WHEN MATCHED THEN
SELECT * FROM FINAL TABLE (INSERT Stmt)
WHEN NOT MATCHED THEN
SELECT * FROM FINAL TABLE (UPDATE Stmt)

And this didn't work.Found a workaround.
But would like to know what I'm doing wrong.
Could you please advice?
PratikAugust 29, 2012 at 9:45 AM
Hi Sam,
Thanks for your reply.Actually I gave it a thought.It seems I had gotten the whole concept of merge incorrectly.But now after trying out different options with merge I'm a bit comfortable with the ways to use it.

Anyways are there other means to contact you.Cause many times it so happens that Im stuck with some db2 related issue and just cant find a way out.And could really use a DBA opinion in these situations.

Thanks
Pratik
SameerAugust 29, 2012 at 9:56 AM
And could really use a DBA opinion in these situations.

I have always advocated involving a DBA right from requirement phase... What you can get from forums and blogs can never be as good as having a dba involved in your development... :P

But if you are stuck somewhere you can ping me on my mailID sameer.kasi200x@gmail.com or you can ping me on twitter...
Refer & Earn Get instant ₹100 discount. Subscribe to Livpure Smart RO today. Apply my code GOP422 now and subscribe to pure drinking water at home. Click here to know more :- https://lsh.care/w7yduDecember 30, 2013 at 9:11 AM
Hi Sam,

I have a query with just SELECT fields from Table where condition.

Is there a way to optimize this query?

i can't use FIRST FETCH ROW option as the number of records cannot be determined.
AnonymousOctober 16, 2014 at 8:52 PM
Hi
Is there a way where I can use a commit count when using a merge statement. I have to update/insert around 40 lakhs records per day into an archive table.
SameerOctober 17, 2014 at 9:20 AM
You can use pagenated queries.

Or you can try use partitioned tables. It will become easier to move them around from live to archive tables!

db2 performance

About Me

Connect to me

MERGE: MAKE Your UPSERTs quick!!!

12 comments: