nick_darlington

How many XOG instances should you read/write in a single request?

Blog Post created by nick_darlington Employee on Jul 7, 2015

This question comes up from time to time, and a couple have occurred recently that I thought it was worth covering still.

 

XOG requests do take time (at least, more so than a typical UI action).  The data they are inputting in a single request can match hours worth of manual entry in the UI.

 

As such, if there is too much data in a request, then a timeout on the network whilst that data is trying to be processed is possible.  Unfortunately, the loss of connectivity will cause the XOG to error out.

 

It is generally - although not universally - known that modifying your XOG file to work with batches of the data rather than being a single request can afford the following benefits:

 

  1. Avoiding timeouts
  2. Increasing throughput (more instances processed in the same amount of run time)
  3. In some cases, avoiding overloading the client's or server's memory allowances
  4. Lowering overheads and burdens on the server allowing for it to handle more tasks concurrently without impacting (such as UI tasks for other users)

 

So, if batching is the solution, the next question to follow inevitably is 'how many records do we batch at once'.

 

There are some crude or raw guidelines that have been floating around.  Some say keep the xml file under 5MB or 10MB, some will say keep the number of 'instances' down to 100 or 200 or some other numbers.

 

The truth is that these are all fairly arbitrary suggestions and the optimal numbers depend on each customer (because of their different data usage patterns and configurations within Clarity) and each XOG request.

 

It is a 'how long is a piece of string' question, so consider those rule-of-thumb numbers merely as starting points to consider profiling with or around if the current attempts are trying to do everything in one big file and suffering from one or more of the above points.

 

This means the answer depends on testing and profiling to be done.  Don't forget that XOG is an API, and like any API (such as a database API) you have to tune your code and data requests if you want it to perform and scale.  XOG is making efforts to improve its own governing and reduce the need for making it a critical issue, but even with all the governing in the world it is still in the interests of our customers to do profiling in order to find the optimal settings - especially if these are business-to-business interfaces that can be hauling a large amount of records/data in a limited time window.

 

For a XOG write, batching is typically just a matter of subdividing the input file into multiple files (containing full valid standalone requests, just with less quantities of detailed records in each).

 

For a XOG read, aside from the new governor limits and pagination that takes place with many request types in the latest versions, batching will require some amount of filtering to be done.

 

In some cases, some ranged filtering may be sufficient to extract the data in batches - not necessarily uniform quantities (let's say you extract projects by month, and have 10 in January, 15 in February, 3 in March, 20 in April, etc.) - but perhaps still sufficient to handle.

 

  • The pros of this approach are with data that has a fixed reference in time that is unlikely to move significantly enough to disrupt the expectations from one run to the next.  In the example of having 10 projects in January, if we are already in March of that year, you wouldn't really expect to see more projects that suddenly appear from nowhere with a start date 2 months in the past.  Whereas if you filtered by an alpha prefix (like extracting all the projects beginning with 'A') the quantity can be a lot more volatile from one run to the next.
  • The cons are that the data won't have any real uniformity.  If most projects are entered into the system near the start of a calendar or fiscal year, then that month could be overloaded whilst the other periods under utilized.

 

This might be solved by using another attribute if it is available in the filter (an auto-number ranged attribute for example) to give greater control on the spread of data.

 

In the event that neither of these options are feasible, then another solution can be to create a simple NSQL query that can contain your own filtering logic or, through the NSQL WSDL querying interfaces ( http://yourserver/niku/wsdl/Query ) optionally also using the pagination feature can allow for the fetching and batching of several instances that can then be added as filters to subsequent XOG Object read requests.  It takes a little extra coding (either in the GEL or other web service consuming code that is being written to use the XOG API), but it's essentially just an extra nested loop wrapped around the original piece of work.

 

  • The pros of this method are that you retain absolute control over the spread of the data being fetched, exactly N records that you want, no more and (provided other filtering isn't present on the read request to add complication) no less.
  • The cons of this method are with the extra NSQL Query call(s) to build the filters, and the adding of an outer nested loop to construct the multiple read requests with the filter for each of the batches.

 

Now that we have this method, we still don't actually have the answer on which quantity is the right quantity to go with - where are the sweet spots, and where are the cut-offs between simply being feasible or not producing a result?

 

For that, different quantities should be tested and notes taken on the performance.  Extrapolate the times by the number of batches required to complete the full request in order to (probably) generate a curve of results similar to the following:

 

 

Notes:

  • In this hypothetical test, we needed to extract a total of 1000 records/instances of 'something'.
  • We have a 15 minute network timeout constraint to live with (not imposed by the Clarity product or any timeout configuration settings within it, but externally such as on the network infrastructure).
  • We can likely expect that due to other overheads, requesting very small numbers of records will probably result in comparable performance (it may be near enough the same time to extract 1 record at a time as it is to extract 3, or 5, or etc.).
  • We can also likely expect that as we ramp up the numbers, the performance will not remain flat or linear, but either progressively become greater or may even remain on tolerable levels for some time and then just shoot off or not respond once some unseen threshold is triggered.

 

With our results of these tests plotted we can start to see where we stand with this environment (dataset) and type of request.

 

For instance, too few items at once (10 or 20) appears to have some round-trip overheads from having to make too many requests that our total processing time suffers.  However our individual processing time of each file in the batch is very safely below our timeout.

 

At the other end of the scale, once we hit 90+ then even in batches we may risk some of those individual requests timing out, which would cause complications and delays in the total processing time.

 

The sweet spot on this test is shown with batches of 30 instances at a time (give or take a few), as it shows the total processing time to be the fastest along with the individual requests not likely being in danger of hitting the timeout.

 

Interestingly though, you'll also notice that in these results running with a batch size of 30 meant that all the requests (33 of them) combined not only is the fastest throughput but also managed to come in under the timeout too (the green line below the blue).  That wasn't our goal here (we only need to keep the red line away from the blue line to avoid the thread of timeouts), but it does make for an interesting observation.

 

Finding these sweet spot values takes some effort, and they won't always be as pronounced as this contrived example, but they are worth some amount of time to find for the longer term benefits.

 

 

Closing footnote: Please also be aware that the amount of physical data (in bytes) put into the input/output files is not able to directly correlate with the efforts to process and manage the data requests or produce the responses.  The work that takes place in the middle can be orders of magnitude more (and consuming as much more RAM/CPU too), as anybody performing a content pack XOG update to an object can testify to.  This is an easy mistake or assumption that is sometimes made.

Outcomes