Hi Ekta,
I apologize for my late reply as I am in a Training Class. However, I do have an answer. The ultimate reason for the vast difference in time is latency in fetches to the database and the number of times it has to take place (8 million) times the number of milliseconds for each fetch and then the writing of the Audit of that data on a distant server. You could ask your Network Admins what the latency is. A simple ping will give you a rough idea.
If this was all with DBUPDATES=N, then the amount of time it took is strictly attributed to the exhaustive writing of the Audit file which can be 8 to 10 times as much data as the actual database. When we are doing an audit, we are doing a substantial amount of I/O. As a result, the distance comes into play. By the way, it is best practice to use the Audit on during the initial Debug phase of the project to make sure that the masking and data generation is working, and then turn off the audit during actual production runs. We had heard this from many Client Auditors that they see the Audit File as a security breach when run against production or actual Client Data.
Now, with DBUPDATES=Y the latency in distance between the databases (assuming your source is say APAC and your target is NAM) will come into play if the data has to be read from one distant database to be written into a near one. Once again this can be tested with a small run of 1000 rows. This test masking would be dropped when you run your final run, but it will provide you with the fore knowledge of what to expect when doing the production run.
Okay, so lets assume 2 days run-time. Is this acceptable? Is there Risk involved in the span of time that the mask data is being created? Is my production data vulnerable to exposure during this time? Can I decrease that time? These are questions that I would be looking at if I was doing this run. The next questions would be can I mitigate the exposure? And if so How?
One would hope that the transmission would be secure between your servers, so that is in someone else's hands. However, that is a long time (2 days). Is my data run exposed to complete failure due to unscheduled outages and what will it take to recover? If the failure occurs at 1 day, 23 hours, and 30 minutes, and we do not have a method to simply do that last 30 minutes, and we have to redo the entire run, that would be costly. Have you explored the restartability of your data run? Specify the Restartability Option
Can I shorten the time if it is 2 days? If the ultimate issue is latency, then I would have to strongly lean to writing the data on the database where the data is and then importing it to the "distant" server. This may not be an option, but it is one that will dramatically reduce the time for completion of the process and Risk. Since the fastest way to complete any computer process is in the machine's memory and limiting the disk or network I/O to an absolute minimum, this is where you have to look to save time.
I hope this clarifies your questions.
Cheers!
Les