pryth01

Putting the ‘T’ back into TDM

Blog Post created by pryth01 Employee on Jul 4, 2016

For many organizations, Test Data Management (TDM) starts and ends with copying production data, masking it, and moving it to test environments. In other words, TDM is viewed as a purely logistical matter, with the goal of moving subsets of production data as quickly as possible into testing.

 

We’ve already written about the broader challenges of this view of TDM, but it also presents specific pain points for testers, and this narrow conception of TDM is one reason why testing can be frequently blamed for bottlenecks. Some of these testing-specific frustrations below are set out below.

 

Data Provisioning is too slow

Testers are usually dependent on another team to fulfil their data requests, and this team quickly finds themselves inundated with requests. Often they lack automated tools for data discovery or creation and so must manually copy, mask or create data to meet specific test cases or requirements.

 

This can be highly complex, while the data must be coherent across systems. It is therefore of no surprise that a team faced with requests for tens of thousands of records per week cannot keep up, and some testers we’ve worked with have had to wait longer than the sprint itself for a data request to be fulfilled.

 

The wrong data gets provisioned

This is worth calling out in its own right, given how often we’ve heard it from testers. The error-prone nature of manual data processing, combined with poorly defined requests, means that the wrong data is frequently provisioned.

 

If you’ve already waited 3-4 weeks for that data, the frustration should be obvious. However, erroneous provisioning creates further issues, such as automated test failures which must then be investigated.

 

Data is unavailable in parallel

In addition to upstream dependencies, there are often dependencies between test teams who compete for a limited number of copies of production data.

 

One team might have to wait for another to finish, while a change made to the data by another  team will affect all others. Rare or interesting data might then be lost or tests might start failing for no reason, while interesting data might further be lost if another team requests a refresh.

 

“I can’t test because another system is in use, is not ready yet, or contains bad data”

There is also the competition for environments. At first glance this does not look like a test data issue, but having the right virtual data is paramount to effective environment provisioning.

 

Service Virtualization, for instance, often involves manually engineering complex Request-Response Pairs which must both cover the scenarios needed for testing and be realistic. This is time-consuming and costly and test teams might be left waiting weeks for production-like systems to test with.

 

Does the right data exist in the first place?

Arguably the greatest issue with copying production data for testing is quality. Production data is drawn from past scenarios and so will not contain the future scenarios needed to test evolving systems. It will rarely contain “bad data” or negative scenarios, and our audits have found just 10-20% coverage to be the norm. Sampling methods therefore rarely provide the data needed for testing, but manually creating the missing data is slow and error-prone.

 

A more complete approach to Test Data Management

In order to eliminate these delays, a more complete approach to TDM is needed. We advocate an end-to-end approach, where re-usable data sets are stored in a central Test Data Warehouse and are exposed to testers on demand via a self-service web portal.

 

The more requests which are made, the larger the warehouse becomes, and it becomes less likely that testers will be dependent on an upstream team to provision the data they need. If new data is needed, CA Test Data Manager provides high performance masking and powerful synthetic data to quickly fulfil the request.

 

In this approach, TDM has moved beyond the logistics, and a test-oriented approach has been taken. Rather than just copy, mask and move production data to test environments, the focus has shifted to identify what test data is needed to fulfil test cases and test plans. All the data needed for testing is then created upfront, and is made available on demand.

Outcomes