gilta03

Tech Tip: Publishing Performance Example

Discussion created by gilta03 Champion on Dec 4, 2017

This will also be continually updated in our TDM 4.4+ documentation. 

 

The development team performed testing to show performance benchmarks for publishing to CSV and XLSX files. You can use these benchmarks to help tune the performance of your system.

 

  • Environment Details
    • Machine specs
    • Table used for the performance publish
  • Test 1 - Hard-coded Data
    • Publish to CSV
    • Publish to XLSX
  • Test 2 - One Expression
    • Publish to CSV
    • Publish to XLSX
    • SQL Server Target publish (default config)
    • SQL Server Target publish (iterationsBeforeCommit=20000)
    • SQL Server Target publish (iterationsBeforeCommit=50000)
  • Test 3 - With Expressions
    • Table DDL Used
    • Expressions Used
    • Publish to CSV

 

Environment Details

The following environment was set up to gather performance data for publishing to CSV and XLSX files.

Machine specs

The machine has 16 GB physical memory, 4 vCPU, and runs Microsoft Windows Server 2012 R2 DataCenter. The repository was on a local SQL Server 2016.

Table used for the performance publish

CREATE TABLE [dbo].[CARD_ACCOUNT](
    [CARD_ID] [int] NOT NULL,
    [CARD_BA_ID] [int] NOT NULL,
    [CARD_RA_ID] [int] NOT NULL,
    [CARD_BCH_ID] [int] NULL,
    [CARD_ICH_ID] [int] NULL,
    [CARD_NO] [varchar](16) NOT NULL,
    [CARD_EXP_DATE] [varchar](4) NOT NULL,
    [CARD_VALID_DATE] [varchar](4) NOT NULL,
    [CARD_NAME] [varchar](30) NOT NULL,
    [CARD_CVV] [decimal](4, 0) NOT NULL,
    [CARD_PRI] [varchar](3) NULL,
    [CARD_SUP] [varchar](3) NULL,
    [CARD_ADD] [varchar](3) NULL
)

Test 1 - Hard-coded Data

No expressions were used. All data were hard-coded in the generator.

Publish to CSV

repeater
performance based on number of rows
10,0000.5 seconds
100,0001.2 seconds
1,000,0007.6 seconds
10,000,0001 min 08 seconds
100,000,00011 min 48 seconds

Publish to XLSX

Publish to XLSX is memory intensive compared to publish to CSV. Using the default CA TDM Portal configuration, the publish hit a wall around 300,000 counts when the CPU usage went high and stayed high. In fact, performance started degrading around 260,000 counts.

This behavior is caused by the GC (Garbage collector) going overdrive when trying to clean up some memory to make sure that the Portal application does not crash with an out-of-memory exception.

The following graph outlines the impact of garbage collection on the Portal:

Before starting a high-volume publish to XLSX, make sure you increase the memory that is allocated to the Portal.

Edit the config file called wrapper.conf located under CA\CA Test Data Manager Portal\service\conf. You can set either maxmemory or maxmemory.percent. With maxmemory.percent, the maximum allocated memory is calculated from the number that was set, times the physical memory. For more information about what needs to be done to increase the memory size used by the java process, see https://wrapper.tanukisoftware.com/doc/english/prop-java-maxmemory.html.

repeater
performance based on number of rows
100,00032 seconds
200,00059 seconds
300,0001 min 30 seconds
400,0001 min 49 seconds
500,0002 min 40 seconds
600,0003 min 36 seconds

Test 2 - One Expression

We publish using one expression ~NEXT~ in the generator.

Publish to CSV

repeater
performance based on number of rows
1,000,00014 seconds
10,000,0001 min 29 seconds
100,000,00012 min 38 seconds

Publish to XLSX

repeater
performance based on number of rows
100,00024 seconds
200,00046 seconds
300,0001 min 18 seconds
400,0001 min 47 seconds
500,0002 min 17 seconds
600,0002 min 41 seconds
700,0003 min 16 seconds

SQL Server Target publish (default config)

repeater
performance based on number of rows
100,0004 min 18 seconds
200,0008 min 16 seconds
400,00017 min 36 seconds

(portal restarted)

800,000

23 min 12 seconds

SQL Server Target publish (iterationsBeforeCommit=20000)

tdmweb.publish.batchCommit=true

tdmweb.publish.iterationsBeforeCommit=20000

repeater
performance based on number of rows
repeater
performance based on number of rows
800,00035 seconds
10,000,0006 min 33 seconds
100,000,0001 hour 6 min 46 seconds

SQL Server Target publish (iterationsBeforeCommit=50000)

tdmweb.publish.batchCommit=true

tdmweb.publish.iterationsBeforeCommit=50000

repeater
performance based on number of rows
1,000,00042 seconds
10,000,0006 min 31 seconds

 

Test 3 - With Expressions

We publish using several expressions.

Table DDL Used

CREATE TABLE equifax_records (
"update_period" numeric (38, 0) ,
"peer" varchar (20) ,
"state" varchar (20) ,
"county" varchar (20) ,
"product" varchar (20) ,
"vintage" varchar (20) ,
"originalrisk" varchar (20) ,
"currentrisk" varchar (20) ,
"term" varchar (20) ,
"smallbusinessownerflag" varchar (20) ,
"mortgageindicator" varchar (20) ,
"consumer_age" varchar(20) ,
"edti" varchar (20) ,
"pim" varchar (20) ,
"n_cur" numeric (38, 0) ,
"n_030" numeric (38, 0) ,
"n_060" numeric (38, 0) ,
"n_090" numeric (38, 0) ,
"n_120" numeric (38, 0) ,
"n_svr" numeric (38, 0) ,
"n_bkr" numeric (38, 0) ,
"n_misc" numeric (38, 0) ,
"n_closed_pos" numeric (38, 0) ,
"bal_cur" numeric (38, 0) ,
"bal_030" numeric (38, 0) ,
"bal_060" numeric (38, 0) ,
"bal_090" numeric (38, 0) ,
"bal_120" numeric (38, 0) ,
"bal_svr" numeric (38, 0) ,
"bal_misc" numeric (38, 0) ,
"bal_closed_pos" numeric (38, 0) ,
"pmt" numeric (38, 0) ,
"hc" numeric (38, 0) ,
"bal_bk" numeric (38, 0) ,
"n_fcs" numeric (38, 0) ,
"bal_fcs" numeric (38, 0) ,
"n_pos_bal" numeric (38, 0) );

Expressions Used

Table Name Column NameDefinitionData Type

EQUIFAX_RECORDS

n_pos_bal

@randlov(0,@perclist(90%@randrange(1,2)@,5%0,5%@randrange(3,10)@)@)@

NUMBER(38)

EQUIFAX_RECORDS

vintage

Q@randlov(0,@list(1,2,3,4)@)@@string(@randdate(2005/01/01,~YEAR~/01/01)@,YYYY)@

VARCHAR(20)

EQUIFAX_RECORDS

peer

@randlov(0,@list(GM,CM,CP,NC,OT)@)@

VARCHAR(20)

EQUIFAX_RECORDS

bal_closed_pos

@if(^bal_cur^=0,@randrange(0,60000)@,0)@

NUMBER(38)

EQUIFAX_RECORDS

n_bkr

@randlov(0,@perclist(1%1,99%0)@)@

NUMBER(38)

EQUIFAX_RECORDS

state

@seqlov(0,@seedlist(State_County,S)@,2)@

VARCHAR(20)

EQUIFAX_RECORDS

bal_120

@randlov(0,@perclist(99%0,1%@randrange(0,30000)@)@)@

NUMBER(38)

EQUIFAX_RECORDS

n_120

@randlov(0,@perclist(1%1,99%0)@)@

NUMBER(38)

EQUIFAX_RECORDS

edti

@randrange(0,9)@

VARCHAR(20)

EQUIFAX_RECORDS

n_svr

@randlov(0,@perclist(1%1,99%0)@)@

NUMBER(38)

EQUIFAX_RECORDS

bal_060

@randlov(0,@perclist(99%0,1%@randrange(0,60000)@)@)@

NUMBER(38)

EQUIFAX_RECORDS

hc

@if(^bal_cur^=0,0,@addrand(^bal_cur^,0,20000)@)@

NUMBER(38)

EQUIFAX_RECORDS

update_period

~YEAR~@randlov(0,@list(01,02,03,04,05,06,07,08,09,10,11,12)@)@

NUMBER(38)

EQUIFAX_RECORDS

originalrisk

@randrange(0,14)@

VARCHAR(20)

EQUIFAX_RECORDS

smallbusinessownerflag

@randlov(0,@perclist(15%1,85%~EMPTY~)@)@

VARCHAR(20)

EQUIFAX_RECORDS

n_misc

@randlov(0,@perclist(1%1,98%0,1%-1)@)@

NUMBER(38)

EQUIFAX_RECORDS

bal_030

@randlov(0,@perclist(97%0,3%@randrange(0,90000)@)@)@

NUMBER(38)

EQUIFAX_RECORDS

bal_cur

@randlov(0,@perclist(20%@randrange(100000,900000)@,40%@randrange(0,9000)@,35%@randrange(10000,90000)@,5%@randrange(1000000,2000000)@)@)@

NUMBER(38)

EQUIFAX_RECORDS

bal_fcs

0

NUMBER(38)

EQUIFAX_RECORDS

bal_misc

@randlov(0,@perclist(99%0,1%@randrange(0,20000)@)@)@

NUMBER(38)

EQUIFAX_RECORDS

n_060

@randlov(0,@perclist(25%1,70%0,5%2)@)@

NUMBER(38)

EQUIFAX_RECORDS

n_090

@randlov(0,@perclist(1%1,99%0)@)@

NUMBER(38)

EQUIFAX_RECORDS

n_030

@randlov(0,@perclist(45%1,35%0,15%2,1%3,1%4,1%5,1%10,1%25)@)@

NUMBER(38)

EQUIFAX_RECORDS

bal_bk

0

NUMBER(38)

EQUIFAX_RECORDS

pmt

@randlov(0,@perclist(80%@randrange(0,1000)@,20%@randrange(1000,10000)@)@)@

NUMBER(38)

EQUIFAX_RECORDS

county

@upper(@seqlov(0,@seedlist(State_County,S)@,5)@)@

VARCHAR(20)

EQUIFAX_RECORDS

pim

@randrange(0,9)@

VARCHAR(20)

EQUIFAX_RECORDS

n_fcs

0

NUMBER(38)

EQUIFAX_RECORDS

mortgageindicator

@randlov(0,@perclist(45%1,55%0)@)@

VARCHAR(20)

EQUIFAX_RECORDS

n_cur

@randrange(1,6)@

NUMBER(38)

EQUIFAX_RECORDS

term

@randrange(0,11)@

VARCHAR(20)

EQUIFAX_RECORDS

n_closed_pos

@randlov(0,@perclist(10%1,89%0,1%2)@)@

NUMBER(38)

EQUIFAX_RECORDS

bal_090

@randlov(0,@perclist(99%0,1%@randrange(0,60000)@)@)@

NUMBER(38)

EQUIFAX_RECORDS

currentrisk

@randrange(0,14)@

VARCHAR(20)

EQUIFAX_RECORDS

product

@randlov(0,@list(AB1,AF2,AF1,AB2)@)@

VARCHAR(20)

EQUIFAX_RECORDS

bal_svr

@randlov(0,@perclist(99%0,1%@randrange(0,60000)@)@)@

NUMBER(38)

EQUIFAX_RECORDS

consumer_age

@randrange(0,7)@

VARCHAR(20)

Publish to CSV

repeater
performance based on number of rows
100,00014 min 51 seconds

Outcomes