Test Data Manager

  • 1.  FDM and cluster index in SQL Server

    Posted Mar 22, 2017 08:10 AM

    Hello,

     

    We have to mask a field that has a Cluster Index on it and is there any other solution beside dropping the Cluster Index in SQL Server.

     

    Since this type of Index you can't de-activate we will have to drop it and recreate it. This will be time consuming.

     

    We might also need to add a dummy sequence field with a cluster index to improve the masking time.

     

    Does anybody ever had this issue and find a work around?

     

    Thanks

     



  • 2.  Re: FDM and cluster index in SQL Server
    Best Answer

    Broadcom Employee
    Posted Mar 22, 2017 01:33 PM

    Hi Francois,

    SQL server has its own challenges with regards to indexing that are overcome in other databases.  For instance, this is much easier in Oracle, as FDM uses in the built rowid for its updates automatically for tables with no PK/UK.  Similarly it will use RRN for DB2.  However, in SQL, to accomplish this, you have to add in a Unique Column and use a Sequential number (i.e. a row number) and then index on that newly created column. 

    I consulted our pool of SMEs and this is the response I received from one of the original Grid Tools Developers and his response is directed to using FDM:

    *******************************************************************

    The issue is that some of the masked values may already exist in the database, and if this is the case then the mask will throw a database constraint error and not work.

    So for a really simplistic example, if I have a table of 10 rows with ID values from 1 to 10, and mask with a sequence starting from 5, the first 5 sequence values would cause constraint errors.  In this case , much easier to drop the constraint, do the mask so we have new values from 5 to 15 , then recreate the constraint.

    The issue with the mask,  is having dropped the PK,  the table potentially has no defined uniqueness, which FDM needs for its update statements.

    You can add the ID column into the unique columns mask csv to mitigate against this, but the updates will be slow as there is no index for the database to work with, and so each update statement will perform a full table scan.

    In this instance it’s better to add a NON UNIQUE index for column ID, do the mask, then drop the index. (In addition to dropping the PK and recreating it later)

    In short we need an enhancement to FDM to provide all the necessary pre and post scripts automatically.

    The only other alternative is to have a function that will guarantee not to create ANY of the existing values (not easy to do), the most obvious would be to use a sequence that starts from the high water mark of the existing ID (in this example a sequence starting at 11) – but this becomes a lot more tricky if the column(s) that make up the constraint are character or date columns.

    Hope this makes sense.

    *******************************************************************

    The best practice is to drop the Clustered Index, add a NON UNIQUE index for a column ID, adding the column(s) that make up uniqueness into the unique columns section in the masking CSV, mask, drop the Non-Unique Index, and re-add the Clustered Index. 

    To add clarity (if it is needed) to the "adding the column(s) that make up uniqueness into the unique columns section in the masking CSV" section, you have to add unique columns into the masking csv (so FDM knows how to do its updates). You don’t have to use a sequential number, just a function that guarantees uniqueness ,so FORMATENCRYPT could be used as well.

    SQL adds a layer of complication that the other mentioned databases avoid by not needing to add columns to unique columns in the mask or create a non-unique index.

    Cheers! Les



  • 3.  Re: FDM and cluster index in SQL Server

    Posted Mar 23, 2017 08:35 AM

    Thanks for the response Les.