About Me

My photo
Singapore, Singapore, Singapore
I am currently working as a DB Consultant with Ashnik. We are into Open Source and Cloud based enterprise class solutions. Prior to this I was working as a DBA @ Misys India (Banking Services). Working on IBM DB2, Oracle Microsoft SQL Server. Selected as IBM Champion for Data Management 2011. I like bit of photography, cycling, poetry and I just love gadgets!!

Connect to me

See my profile on LinkedIn Visit my Ashnik Website

A few tips for faster Bulk Refresh of a Table


So I am back with some tips to improve insert performance... I am wondering why I did not choose this for my 2nd chance @ db2night show's DB2 has talent...
Well a lot has happened in DB2 world meanwhile... There was a new LUW release and I shall be soon writing some stuff about that..
So there are various scenarios when you have to do a bulk insert and I too had to deal with one of those... The story and various solutions which came to me may bore you, but stick with it so that you know what you should not do and I will share some piece of code at the end... :)

There are obviously other tips for improvement of bulk inserts or any type inserts but those are more to do with your DB CFG/ DB physical design... i will here share more on how you can handle it programatically...


So there was this very complex query which read data from 6-7 tables all of which had around 4-5 million of records... With all the filters and everything the record returned by the query was around 6-7million... Obviously you can not process this chunk of data at once at your server and hence you pagenate the data and read only in bits... But again reading page by page means the view query is fired again and again and then sorting also happens for every new page read...
So a better way is to either use a MQT or a just a table which is loaded with this data and rowsequence and then you can read in bits based on rowsequence... We had used the table approach in our scenario...

I believe usage of MQT is good when you do not have to access MQT directly and your queries written over under-lying tables automatically use MQT...
But the problem was the problem was the insert of data in the temporary table was taking a huge amount of time... Also we had indexes on a few columns which were accessed during processing... This is was not a single case and there were few more cases of such processes which did bulk delete and bulk insert which were slow... Moreover these temporary tables were to be marked/updated so that one know how the failure recovery should happen in case the process is Aborted/terminated abnormally... Another problem which complicated it, whatever solution I had to give should be generic and should work on Oracle and SQL Server as well (well that comes with my job)...

Someone suggested that one can create a MQT for the query which populates data in the temporary table (and counter parts Materialized View and Indexed in Oracle and SQL Server respectively)...
The problem was update of MQT counterparts in Oracle and SQL Server is either difficult/not possible/require special license...

So the suggestion was extended to use this MQT (and MV and IV in Oracle and SQL Server) to load data in temporary table...
But I discarded it...
You first load 7million records in one table (MQT is essentially a table) and then load a table from this table... As a DBA you are doing your best to screw yourself down the line...


So I thought if MQT is also maintained as a Table and DB2 internally populates data when doing a full refresh, so lets try to think how DB2 would be optimizing this process...

So I thought
1) May be it truncates the table instead of deleting the records...
So i created a stored proc to truncate the table but retain the storage (I can't just use the truncate statement cause I had to support Oracle as well, where syntax is different, BTW anyone has an idea why DB2 mandates the use of IMMEDIATE keyword?)

CREATE OR REPLACE PROCEDURE TESTPROC
( TABLENAME VARCHAR(30), SCHEMANAME VARCHAR(15))
LANGUAGE SQL
BEGIN
     DECLARE STATEMENT1 VARCHAR(1024);--
     SET STATEMENT1='SET CURRENT_SCHEMA=' || SCHEMANAME; --
     EXECUTE IMMEDIATE STATEMENT1;--

     set CNT=1;--
     SET STATEMENT1='TRUNCATE TABLE ' ||
                      TRIM(SCHEMANAME) ||
                       '.' ||
                       TRIM(TABLENAME) ||
                       ' REUSE STORAGE IMMEDIATE';--
     commit;--
     EXECUTE IMMEDIATE STATEMENT1;--
     commit;--
END;



But this still resolved issue of deletion... Inserts were still taking alot of time... So second thought which came to my mind was...
2) May be it will drop the indexes on MQT before populating data(disable index is a requirement which DB2 team should start looking at)

So modified the above stored proc...
CREATE OR REPLACE PROCEDURE TESTPROC
( TABLENAME VARCHAR(30),  SCHEMANAME VARCHAR(15), COLUMNLIST VARCHAR(1024), INSERTVAL VARCHAR(1024))
SPECIFIC TESTPROC
LANGUAGE SQL
BEGIN
  DECLARE STATEMENT1  VARCHAR(1024);--
  DECLARE STATEMENT2 VARCHAR(2048);--
  DECLARE INDEX_STMT WASADMIN.VARCHAR_ARR;--
  DECLARE CNT INTEGER;--
  
  SET STATEMENT1='SET CURRENT_SCHEMA=' || SCHEMANAME; --
  EXECUTE IMMEDIATE STATEMENT1;--
  
-- Truncate the table before loading/refreshing with new data
  set CNT=1;--
  SET STATEMENT1='TRUNCATE TABLE ' || TRIM(SCHEMANAME) ||  '.' || TRIM(TABLENAME) || ' REUSE STORAGE IMMEDIATE';--
  commit;--
  EXECUTE IMMEDIATE STATEMENT1;--

-- Get a copy of index creation statements...

  FOR CR1 AS a CURSOR
  WITH HOLD
  FOR
     (SELECT INDNAME FROM SYSCAT.INDEXES WHERE TABNAME=TABLENAME)
  DO
    SET INDEX_STMT[CNT]='CREATE INDEX ' ||
                          CR1.INDNAME ||
                            ' ON ' ||
                              TABLENAME ||
                                  '( ';--
    
    FOR CR2 AS a CURSOR WITH HOLD FOR
    (
        SELECT I.INDNAME,
        IC.COLNAME ||
            CASE WHEN  IC.COLORDER='A' THEN ' ASC ,' ELSE ' DESC ,' END AS COL
        , IC.COLSEQ  
        FROM SYSCAT.INDEXES I JOIN SYSCAT.INDEXCOLUSE IC ON I.INDNAME=IC.INDNAME
        AND I.INDSCHEMA=IC.INDSCHEMA
        AND I.INDNAME=CR1.INDNAME
        AND I.TABNAME=TABLENAME
        ORDER BY IC.COLSEQ
    )
    DO
      SET INDEX_STMT[CNT]=INDEX_STMT[CNT] || CR2.COL ;--
    END FOR;--

    set INDEX_STMT[CNT]=SUBSTR(INDEX_STMT[CNT],1,length(INDEX_STMT[CNT])-1);--

    set INDEX_STMT[CNT]=INDEX_STMT[CNT] || ')';--

    SET CNT=CNT+1;--

-- Drop the index once its definition is stored...

    set STATEMENT1='drop index ' || CR1.INDNAME;--
    EXECUTE IMMEDIATE STATEMENT1;--
    commit;--
  END FOR;--
  

-- Once definition is captured for all indexes (and they are dropped) insert the data in the table using the
             -- Column list provided in the arguement
             -- and values
              --(which can be VALUES (ValueFor_Column1, Val_ForColumn2, .... Val_forColumnN)
              --or can be a select stmt

  set STATEMENT2='INSERT INTO '
                      || trim(SCHEMANAME) ||
                           '.' || trim(TABLENAME) ||
                               ' ( ' || COLUMNLIST ||
                                      ' ) ' ||
                                        INSERTVAL;--
  EXECUTE IMMEDIATE STATEMENT2;--
  commit;--
  
-- Now re-create all the indexes...
  FOR CR3 as a CURSOR
    WITH HOLD
    for
      ( SELECT T.STMT FROM UNNEST(INDEX_STMT) WITH ORDINALITY AS T(STMT,ID))
  DO
    EXECUTE IMMEDIATE STMT;--
  END FOR;--
END;




I could have used files to store the index creation instead of variable, but i wanted to make it generic so that it works even when new indexes are added and wanted to avoid I/O.


Now a tip to ensure that for such tables, when cardinality shoots high during a bulk insert, you can mark them as VOLATILE by

"ALTER TABLE TABLNAME VOLATILE"

Further performance gains can be achieved if you are ready to compromise on recovery of the table being loaded. In that case you can mark the table not logged just before the insert:

"ALTER TABLE TABLENAME NOT LOGGED INITIALLY"

If anyone can help me figure out what other optimizations are done by MQT for refreshing a table, I can try to write a piece of code for that as well...

Keep optimizing...