Service Virtualization

  • 1.  Masking running for too long

    Posted Mar 28, 2017 07:25 AM

    Hi,

    I am currently running FDM remotely to mask tables which comprises of nearly 8 million rows overall. This is a trial mask run hence running in audit mode. Another point to note is the FDM, Database and the point of triggering FDM (i.e FDM batch files) are all located on different servers. This process has been running from past 2 days and has not  completed yet. Please let me know on what can be done to improve the timelines.

     

    Regards,

    Ekta



  • 2.  Re: Masking running for too long

    Broadcom Employee
    Posted Mar 29, 2017 07:37 PM

    Hi Ekta,

    One is to make sure that you have indexes on your database file and you are using them.  If not, then you are doing complete table scans for every record processed.

    Are the servers in the same Data Center or separated by miles?  I could not determine that by them simply being on different servers.  If miles, then latency is the issue.

    Anything in the logs?

    What are the server resources running at?

    Could you simply try a few rows and see what the length of time is on 1000?

    Cheers!
    Les.



  • 3.  Re: Masking running for too long

    Posted Mar 31, 2017 07:36 AM

    Hi Les,

    The database has indexes built and yes the servers are separated by miles (NAM & APAC regions).

    For 1000 rows the time for completion was approx. 1 hr. in audit mode.

     

    When I tried this process using below parameters in options file then the time for completion for the full volume(8 million) was again approx. the same i.e 1 hr.(no audit). A drastic difference when compared to the previous masking process which kept running for 2 days.

     

    BACKUPDIR=I:\Fastdatamasker\backups
    DBUPDATES=N
    ERRORDIR=I:\Fastdatamasker\errorlogs
    LOGDIR=I:\Fastdatamasker\logs

     

    Please could you help me understand if this major difference in time is because the process to write actual and masked values in audit file was excluded.?

    And with DBUPDATES=Y would it take around the same time as DBUPDATES = N, or will updates to database take a major chunk of time ?

     

    Regards,

    Ekta



  • 4.  Re: Masking running for too long

    Posted Apr 04, 2017 01:28 AM

    Hi Les,

    Any updates for above question?

    Regards,

    Ekta



  • 5.  Re: Masking running for too long
    Best Answer

    Broadcom Employee
    Posted Apr 06, 2017 10:56 AM

    Hi Ekta,

     

    I apologize for my late reply as I am in a Training Class.  However, I do have an answer. The ultimate reason for the vast difference in time is latency in fetches to the database and the number of times it has to take place (8 million) times the number of milliseconds for each fetch and then the writing of the Audit of that data on a distant server.  You could ask your Network Admins what the latency is.  A simple ping will give you a rough idea.

     

    If this was all with DBUPDATES=N, then the amount of time it took is strictly attributed to the exhaustive writing of the Audit file which can be 8 to 10 times as much data as the actual database.   When we are doing an audit, we are doing a substantial amount of I/O.  As a result, the distance comes into play.  By the way, it is best practice to use the Audit on during the initial Debug phase of the project to make sure that the masking and data generation is working, and then turn off the audit during actual production runs.  We had heard this from many Client Auditors that they see the Audit File as a security breach when run against production or actual Client Data.

     

    Now, with DBUPDATES=Y the latency in distance between the databases (assuming your source is say APAC  and your target is NAM) will come into play if the data has to be read from one distant database to be written into a near one.  Once again this can be tested with a small run of 1000 rows.  This test masking would be dropped when you run your final run, but it will provide you with the fore knowledge of what to expect when doing the production run.

     

    Okay, so lets assume 2 days run-time.  Is this acceptable?  Is there Risk involved in the span of time that the mask data is being created?  Is my production data vulnerable to exposure during this time?  Can I decrease that time?  These are questions that I would be looking at if I was doing this run.  The next questions would be can I mitigate the exposure?  And if so How? 

     

    One would hope that the transmission would be secure between your servers, so that is in someone else's hands.  However, that is a long time (2 days).  Is my data run exposed to complete failure due to unscheduled outages and what will it take to recover? If the failure occurs at 1 day, 23 hours, and 30 minutes, and we do not have a method to simply do that last 30 minutes, and we have to redo the entire run, that would be costly.  Have you explored the restartability of your data run?  Specify the Restartability Option

     

    Can I shorten the time if it is 2 days?  If the ultimate issue is latency, then I would have to strongly lean to writing the data on the database where the data is and then importing it to the "distant" server.  This may not be an option, but it is one that will dramatically reduce the time for completion of the process and Risk.  Since the fastest way to complete any computer process is in the machine's memory and limiting the disk or network I/O to an absolute minimum, this is where you have to look to save time.

     

    I hope this clarifies your questions.

     

    Cheers!
    Les



  • 6.  Re: Masking running for too long

    Posted Apr 11, 2017 01:26 AM

    Thanks a lot Les, will consider these risk factors and look to incorporate a work around to mitigate the latency issues.

     

    Regards,

    Ekta