Test Data Manager

  • 1.  Bulk Data  Generation (1-2 million data per file)

    Posted May 02, 2018 08:30 AM

    Hi

     

    I working on POC where I need to generate member eligibility file with volume close to 1-2 million for performance testing. I have completed Data definition setup and generated 1k volume file. PFB the Log details.

     

    Data generation Log :

    Total column : 220

    Data generation rule applied : 10-15 column

    File generation : 1000 rows

    Time taken :424 seconds

     

    Is there is any way I can improve this generation time ?



  • 2.  Re: Bulk Data  Generation (1-2 million data per file)

    Posted May 02, 2018 10:44 AM

    Could you provide some more details?

    Are you generating in Datamaker or Portal?

    Are all 220 columns in a single table?

    What do you data generation rules look like?

    Where is your repo DB located in relation to your DM/Portal server? (ping times, hops)

    Are you publishing to a connection profile? If so, where is your target DB in relation to your DM/Portal server? (ping times, hops)

     

    Does the GT Server meet the minimum Server requirements as noted within our documentation:

    System Requirements - CA Test Data Manager - 4.5 - CA Technologies Documentation



  • 3.  Re: Bulk Data  Generation (1-2 million data per file)

    Posted May 04, 2018 03:51 AM

    Hi

    PFB comments inline:

    Are you generating in Datamaker or Portal? Datamaker

    Are all 220 columns in a single table? Its a flat file comprising 220 odd column details

    What do you data generation rules look like? Unique number generation, Seedlist for FN, LN, DOB+1, Seedlist address, city, sate, phone number and SSN

    Where is your repo DB located in relation to your DM/Portal server? (ping times, hops) Repo DB and Data maker located in same server

    Are you publishing to a connection profile? If so, where is your target DB in relation to your DM/Portal server? (ping times, hops) This is file generation. I have registered File layout in repository, defined the generation rule and published FD file from repository

     

    Let me know if you require any further details.

     



  • 4.  Re: Bulk Data  Generation (1-2 million data per file)
    Best Answer

    Posted May 07, 2018 07:28 PM

    Madhava,

     

    I ran some tests on a smaller scale...

     

    1 FD file, with 5 columns of generated data and 1 column of fixed data.

     

    1,000 rows = 54 seconds

    10,000 rows = 7m:19s (439 seconds)

     

    Based on those results, datamaker estimated that 100,000 rows would take 1h:13m:10s

     

    I have a 300k row job running now that I will check on in the morning.

     

    All jobs as noted above were causing the gtdatamaker.exe process to use about 25% CPU (on a quad core system). So that's 100% of a single CPU.

     

    You have 2-3 times the amount of generated data and about 36 times total data per row. You job is ultimately taking nearly 8 times longer (using the 1,000 rows to compare). Obviously we have some differences in the amount of data being generated per row which would explain some of the performance differences. As this type of job is CPU intensive, hardware could also be a factor. My job was run in a test VM on a slightly older ESX server that has a Xeon X5670 CPU @ 2.93 GHz. Without knowing hardware differences, I would say that my tests and your results line up fairly well - especially considering that my data generation rules are very simple (generally just one function pulling from a seed list). Examples:

    @randlov(0,@seedlist(Credit Card)@)@
    @randlov(0,@list(MR,MS,MRS,DR)@)@
    @randlov(0,@seedlist(FirstName)@)@
    @randlov(0,@seedlist(LastName)@)@
    @string(@randdate(1900-01-01,2000-01-01)@, YYYYMMDD)@

     

    With that said, even though this is a fixed width file that you're attempting to generate - you may want to attempt an "enterprise publish" and have the file generated via TDM Portal. To do so, you just need to configure the source/target DB connections as a DSN-less-ODBC connection - you can setup a new connection profile for this to your local SQL server. Once that's done, the "Enterprise Mode" option will be enabled... Using this method on the same VM (and same CPU of course) as the tests above, I published 300,000 rows in under 4 minutes:

     

    The java.exe (portal process) used upwards of 3 Gigs of memory for a time and spiked above 25% CPU for a time, but mostly seemed to be in that 25% CPU range for the duration of the job. This was also run at the same time as the other 300,000 row datamaker job was running. I suspect if this was the only job running, it would be slightly faster.

     

    There are some limitations in Portal publishing files. Please refer to the documentation accordingly:

    Publish Data Using Datamaker - CA Test Data Manager - 4.5 - CA Technologies Documentation 

    Publish Data Using the CA TDM Portal - CA Test Data Manager - 4.5 - CA Technologies Documentation 

     

    Hope this helps...