The ISSUG Infobase contains archived postings from our on-line discussion forums. Nothing here may be quoted without the written permission of the author(s) of the original postings. The Infobase is organised into 12 sections concerning different topics.

FAQ content: Ram Malladi

                         

Part 3: Planning / Analysis

 3.1) How to Model History?
  3.2) Componet Based Development (CBD) and Batch
  3.3) CBD 96
  3.4) Can we take an existing data model and tailor it to our needs?


3.1) How to Model History?

Steve Thomas
Dated  : 17 December, 1996 at 16:04:07
Subject: Modeling History
 
I thought I'd show up with a question (that's how I usually work).

Any suggestions on modeling history?  We need to model current as well as historical information.  We have been thinking on how that is best done.  We could give each record a timestamp identifier, then when a change occurs, copy the old record to another record in the entity type, but with a different timestamp to identify and sequence them, with the latest always being current.  Or, we could create a history entity type, and move all the historical records there, and only keep current ones in the main entity type.  Or maybe some other way.

Any thoughts? And any references to books, white papers, etc?

Thanks, and great idea for the Bulletin Board!
====

Andy Ward
Dated  : 17 December, 1996 at 16:30:50
Subject: Re: Modeling History
 
Hi Steve. This has always been a thorny question, and as usual, there is no straight answer. A lot depends on the use of the entity type and its likely volumetrics. If you are having alot of changes to a row, and are loading history on the same table, then fairly obviously the table grows pretty quickly. Issues such as fragmentation (both data and index) quickly occur, and of course any queries to the table may take longer depending on the access path chosen. You do not mention which RDBMS you are using as some of the issues may be specific to that. Traditionally, I always prefer to keep historical information away from current info, unless operationally there is a need for them to be together. This has to be looked at for each application. For instance, my company has to value customer portfolios based on the shares value at the time it was purchased (for Capital gains tax purposes) and this means that we need to hold historical information. For the stock itself, we hold the data purchased, and the price at that time. We still need to keep a history of prices (for other reasons) and these are kept in a separate table. You do not mention if the date/time stamp was going to be part of the key. Often it is desirable from a search criteria to keep it there, but from an update point of view this is an irritant. Often delete and re-insert is the answer to this. This also doesn't help the fragmentation case. On a DB/2 table we used table partitioning to keep the individual tablespaces down to a reasonable size, which can be a useful technique. We used the quarter and year as the partitioning attribute (e.g. Q197) and because you can have 64 partitions, this gives us 16 years before we have to 'drop' off the earliest. Not quire a year 2000 problem, but certainly a year 2012 problem!. Let me know more of the details (eg target DB, table size, query activity ins/del activity and the like), and maybe I can be of more help.

Cheers

Andy
====
Steve Thomas
Dated  : 18 February, 1997 at 17:30
Subject: History Modeling
 
We're having a difficult time on a project settling on an approach to modeling history. We have a requirement that we keep all versions of an application for services, and be able to recreate them as of a point in time.  So, the original comes in, and over the course of working on this several amendments to the original are posted.  We must be able to see what the status of this information is at any given time in it's history.  The application for service is in
reality a lot of information, 25 separate forms with 2-4 pages each, so a lot of data.

What is a good way to model this, so we have the current and all past versions easily available?

Any ideas will be appreciated.
====

David Rothschild
Dated  : 19 February, 1997 at 01:40
Subject: Re: History Modeling
 
Of the many possible methods of modeling history, the one that is right for you has a lot to do with the frequency with which you access the historical data and whether space or performance is the highest priority. Answering the following questions will help you decide the best approach

? Total number of records (not including history) in table
? Frequency of access to current record.
? Frequency of access to history record
? Frequency of what changed analysis is done.
? Is requirement to produce value at date and time or to detail what changed over time.
 
 ====
 
 Doug Scott
Dated  : 20 February, 1997 at 19:47
 Subject: Re: History Modeling
 
 Are the forms (you say 25, each of 2-4 pages) totally replaced with each update?
 
 I must confess that my inclination is to have a series of transactions, and then to retain the transactions together with an annual snapshot of the master data. That way you can re-create to a chosen point in time, without the overhead of holding all the intermediate data.
 
 It also satisfies one of my criteria - every transaction must be reversible. The history transactions then allow me to control a reversal mechanism should I ever need it.
 ====
 
 Alida Riddell
Dated  : 21 February, 1997 at 14:30
 Subject: Re: History Modeling
 
 In essence you need to use 'effective dating'. Each record (row) contains the date it becomes effective as this is part of the identifier.  For performance reasons, we have implemented two databases (or tables) where the history with changes is kept on one 'history' database and the current active version is kept on the operational database.
 
 You are welcome to call me for any further details.
 ====
 
 Stacy R. Pickett
Dated  : 14 July, 1997 at 21:05
 Subject: Modelling and processing history
 
 A question which falls under the "associated methods, or related matters,..." part of the BBS guidelines.
 
 Does anyone know of some good literature (book, white paper, technical article, etc...) which discusses the pros and cons of different methods to model and process the history requirements of a system?
 
 By different methods I mean model history in the same entity type, model it in a seperate "history" entity type for every "current information" entity type, model a completely seperate data warehouse, etc.....
 
 I haven't been able to find a good comparison of the different history processing methods and thought that I would poll the collective minds and libraries of the IEF/Composer/COOL:Gen community.
 
 On a completely unrelated topic, I wonder if the new COOL:Gen tool(s) will install into and over the IEF directory on my PC like the Composer tool does? Then we'll have COOL:Stuff that we can't find on our PCs because it's named something completely different than what I am expecting.  8-}
 ====
 
 Anders Romell, Volvo Data AB
Dated  : 18 July, 1997 at 14:10
 Subject: Re: Modelling and processing history
 
 I attended a course a couple of years ago called "Advanced Data Modeling". It was a Codd & Date course but it was presented by Larry English from "Information Impact International". In this course we learned a lot of different special case data modeling. Perhaps LE is giving a course near you or maybe he has written a book? Check out their Web-site at:
 
 http://www.infoimpact.com
 
 or courses at
 
 http://www.infoimpact.com/educate.htm
 ====
 
 Steve Thomas
Dated  : 24 July, 1997 at 00:37
 Subject: Re: Modelling and processing history
 
 Stacy,
 
 Just went through all that history stuff.  We produced a few papers and found a little stuff on it.  I do have a file from the Compuserve TI/IEF forum on this subject, with lots of opinions.  Generally, every approach imaginable was suggested and championed! All approaches had application in some cases.
 
 I'd be glad to get this together and send to you.  I might have to snailmail some of it, but I can email most of the stuff (two or three papers, and all the dialog from Compuserve).
 
 Let me see what I can do over the next couple of days.
 ====
 
 Richard Veryard
Dated  : 07 August, 1997 at 18:06
 Subject: Re: Modelling and processing history
 
 If you're thinking in object terms, it's worth noting that the behaviour of the 'history' object is often completely different to the behaviour of the 'current' object. In fact, all history objects probably share some common processing characteristics. (In object-speak, they 'inherit' the properties of some generic history object type.)
 
 It follows that when you partition the data into separate logical data stores, or onto different physical platforms, it may well make sense to place the 'history' objects separately from the 'current' objects.
 But this gets us into the details of the distribution design for your chosen technical architecture (e.g. client/server).
 
 That's fine if you're working with a pure object-oriented platform.  But what about Composer (or Cool, as I guess I'm going to have to get used to calling it)?
 
 If you're trying to use Composer/Cool in an object-oriented way, you'd have separate entity types for the history data and for the current data. One problem here is that you may want to use the same UI - the same screens or windows - for the current data and the history data. Then if you've put the current data and the history data into different places, you may have to have a component that accesses both places and presents both current and history data onto the same windows in the same format.
 
 I don't think there is a single right answer - there never is - but I'm inclined to favour a more object-oriented approach - separating current and history data - in the interests of future maintainability. For one thing, it makes it much easier to implement changes to the data structure if you only have to convert current data and not history data as well. Similar arguments apply to data portability - moving data across platforms.
 
 Hope this helps.  Feel free to contact me if you want to discuss further.
 
 Richard
 rxv@veryard.com
 http://www.veryard.com


3.2) Componet Based Development (CBD) and Batch

Tim Courtney
Dated  : 09 June, 1997 at 09:31
Subject: CBD and Batch at Old Mutual.
 
The following article is probably only of interest to those of you involved with
component based development (CBD). However, I encourage anybody and everybody
to comment if they wish.

Tim Courtney
Design Architecture Team, Old Mutual, Cape Town, South Africa.
-----------------------------------------------------------------

Objective.

What follows is a summary of what has been done at OM with the batch side of the Flexible Investment Choice (FIC) CBD project. My objective in circulating this is to gather opinions and experiences in order that we may put together a set of guidelines for future CBD projects which  have a need to use batch programming styles.

Brief history

About 2 years ago we started on the component identification and construction phase of the FIC project. This was not the first CBD project at OM, but it was the first to use the then current CBD architecture which
would have a need to use batch processing on input/output files from/to OM clients. In order to identify the operations (previously called methods) and to confirm the boundaries of the components the project when through a transaction analysis phase. This was chiefly done with the on-line processes and produced a set of operations for each component to support the user interface, in this case a GUI C/S front end.

At this time the architecture at OM called for the user interface to be outside of the component boundaries and as a result of what was learnt during this project the architecture has evolved such that this is no
longer always true. It is probably also worth saying that the project had identified ten components, of which two were of type 2  - Infrastructure and the remainder fell into type 3 - Domain Generic or 4 - Domain Specific.
 
The Batch issue

The latter part of this project involved the construction of a number of batch programs to process files received from or to be sent to OM clients. A number of common themes ran through all of these batch programs, mainly that they all needed to use the code table (type 2) component and they all used at least two of the other business (type 3 or 4) components. The question then arose as to where the core (main loop) of the batch program should lie. Should it be within a component, and if so which one, or should it lie outside of all of the components. After much debate, the decision as taken to place the core of the batch programs outside of all of the components. This was done because it was felt that if the core was placed inside of a business component, which then required information from other business components (via operations), these components would start to develop an affinity for each other which was considered to be a bad trend.

If you thought of the core as a sort of user interface then this was then also inline with the OM architecture in force at that time, mainly that the user interface layer should lie outside of the component.

The result of this was that the core of the program lay outside of all of the components and had to access all component data required via operations. The problem with this was the performance of the resulting batch programs was appalling. In a traditional environment a constructor could join tables and maintain open cursors on the required tables, but by forcing them to use operations this flexibility was lost.  It was realised
at the time that the component approach had a cost in terms of program efficiency, but the scale of that cost didnt become apparent until latter. Now, having put some additional effort into tuning these batch programs and the component operations, we have improved the performance without breaking our component architecture. It could be argued that this should have been done in the first place, but nobody expected the performance cost to be so high.

What next ?

Since the FIC project we have updated our architecture. The main amendments
to this are that :-
a) components may in certain cases now contain a user interface.
b) some business components seem always to have an affinity for other business components. This is now recognised and projects must now define these inter component relationships up front.

What we would like to do now is to establish a set of batch design and construction guidelines for building batch CBD programs in the future. From our experiences with FIC a number of observations and views have been expressed which may provide input to these guidelines. Please note what follows is not OM CBD policy, but rather opinions from various people which may help in forming the CBD batch guidelines which we are after. The statements are intended to provoke comment and I have not included any comments on them, rather this is what I would like to come from you.

- Batch programs have no place in a CBD environment.
- Batch programs should not be constrained by component boundaries, but should be allowed to cross these boundaries, join tables etc.
- The cost in poor performance will be out weighed by the improvement in our ability to build systems to meet our business needs and should therefore be ignored.
- Allowing a batch programs core to be within a component will improve performance sufficiently to bring the performance cost within acceptable boundaries.
- The concept of batch programs is forced on us by our historical view of processing data. In some environments there is no concept of batch and we should rather adjust our view of how to process data to a more transaction based concept.
- Some experienced developers have a very fixed and traditional view of how systems should be designed and built and dont have the flexibly of mind to embrace the use of components and address the issues that they raise. (I know this is a real bitchy point, but I note that it has been raised by a number of different CBD sites.)
- The transaction analysis should have taken more account of the batch processing requirements which may in turn have affected the component boundaries and operations.
- Our operations were designed for an on-line environment so why is anybody surprised that when called from a batch program they perform badly. What we should really have is two types of operations, one for on-line and one for batch programs.

An example.

To aid further discussion I have included an example of one of our batch programs with component names. In case you were not aware of what we do, our business is about providing Employee Benefits schemes for a number of large corporate clients.

Batch member update program.

A 80 000 record file is loaded to the DB2 Application layer tables.  The Batch program core exists within the application (user interface) layer. In order to process the records, information from 4 business components is required.  In some cases, a component is visited more than once for information, based on different criteria.  The majority of the information is stored/retrieved utilising effective date.

The program uses the following components :-

CLIENT
CAR                        (Client Agreement Role)
AGREEMENT
RISK
CODE TABLE & PERMITTED VALUES

Program Flow

For each  input record of type 1060:-

The  CLIENT component is read to get the Client relationship details, in order to obtain the salary details from the CLIENT Component. IF there has been a salary increase, the new salary is inserted into the CLIENT component. The CAR component is visited to get the employment details, this is used to get the Plan details from the AGREEMENT Component. The AGREEMENT Component is accessed to determine the annualising factor, based on the salary type, paymeny details and frequency, which is used to calculate the annual salary, this is then inserted into the CAR component under membership details. The RISK Component is called to determine the Formula, based on the age of the Client and frequency of payment, for calculating the benefits, the new Benefits are then inserted into the CAR component.

These stats may not mean much to you, but just in case you are interested the size of last end of month file from just one customer site was 31,564 records. After validation only 564 were processed in 6.04 minutes.

The TOTAL number of DB2 statements executed in these 6.04 minutes = 535575. Made up of the following statements: Select = 93311  Insert = 1374   Update = 30924 (this being a status indicator on the application table) Open cursor = 126 294   Close cursor = 126 292  Fetch  from cursor = 157380
====

Mike Scott
Dated  : 20 June, 1997 at 22:42
Subject: Re: CBD and Batch at Old Mutual.
 
Tim,

Thanks for the information, it has come at just the right time for us.  We are currently designing a sales bonusing system.  The rules for the bonus engine are likely to change every year and so we have decided to componentize (is that a word?) rather that try to build a complicated data driven rule based system. The main part of the system will be batch and is likely to call operations of 3 or 4 domain specific components.  We will have to process about one million input records each month and run the bonus engine for a few thousand employees. Your statements have caused a bit of agitation here as your timings would appear to suggest that it will take us over a week to process a million records!  Or have I misunderstood?

You asked for comments on your statements, here is my two penny worth.
- CBD must support batch for it to be any use in large enterprises.  Batch is not just a technical convenience there are business reasons for doing it. In any case if components can't be made to perform and you go large scale down this route and attemt to do all business operations online then online operations will be too slow to use.
- Allowing systems under the covers defeats the object of components.  If you start by allowing it for batch it is not long before you discover other exceptions and eventually we will end up having to invent another method to make CBD usable (just like CBD did for OO).
- I agree with the sentiment about performance, in a few years time the machines will be so fast we will wonder what the fuss was about (our COBOL staandards used to tell us to use GO TO instead of PERFORM because it saved a couple of machine cycles). However in the short term we must develop ways of making components perform.
- I don't know if it is possible to add a batch core to components, by their nature batch programs will need to call the operations of a number of components.  I wonder if it is possible to add operations to components which do some of the batch work.
- Batch is not going to go away in the short term so we need to tackle the issues now.
- As you said the culture issue has been discussed before and I don't have anything to add except that we had the same battles with CASE tools so maybe we can learn from that.
- A good point.  It looks like batch performance is another factor which must be considered when putting the component architecture together.
- I can see a need to have some operations which will only be used in a batch environment, but we will be creating a maintenance nightmare if we start designing parallel versions of operations.

Other points.
- Is it a problem of maintaining context on an object instance?
- Would memory caching of object instances help? What are the side effects of this?
- If all operations are designed with batch in mind will online performance suffer.
- Accessing components via their specifications should allow one to tune the implmentation without affecting the application so is there a better way to implement?

This is a critical issue for us and will seriously affect the success of our latest CBD project if we do not get it resolved.  So I would appreciate any contributions to this thread.

Regards,

Mike Scott
====

Darius Panahy
Dated  : 23 June, 1997 at 21:41
Subject: Re: CBD and Batch at Old Mutual.
 
Hi Tim,

My initial impression is that this is an extreme example of a problem that has afflicted IEF/Composer for many years.

The issues that are raised by CBD components have arisen in the past when having the deal with the question of common action blocks, elementary processes, etc. Do we use a CAB or do we re-code as in-line code?

Fine tuning the code for the specific batch program will always offer better performance at the cost of extra development effort and duplication of code, leading to higher maintanance costs over time.

I feel that the 80/20 rule applies here and that limited duplication of code to achieve the desired level of performance is necessary and justifible. This does not mean that CBD has no place in batch, merely that you have to be prepared to compromise.

Apart from the overhead caused by calling action blocks, there are many other causes of poor performance. Some of the main ones are:

- Systems are often designed for online execution with little consideration for implications in high volume batch. Symptoms of this include:
- Common routines that are designed for single execution, often re-reading data to avoid having to pass data through import/export views
- little use of persistent entity views
- highly normalised data models
- Overuse of IEF supplied functions (concat, substr, numtext and the real killers - the date routines. The TI runtime routines can often consume a considerable amount of the CPU in a high volume batch program.
- Reluctance to use EABs or external code even when this would result in a significant performance advantage c.f. Composer equivalent, for example DB2 load utility for bulk data load.
- Inability to have multiple open cursors in Composer action diagrams.

Something else to consider is the use of 3rd party products to help with the SQL overhead that CBD brings. I know of one company that has had dramatic reductions in CPU through using an SQL caching product that is transparent to the application code.

Regards

Darius Panahy
Information Engineering Technology Ltd
====

Tim Courtney
Dated  : 27 June, 1997 at 16:13
Subject: Re: CBD and Batch at Old Mutual.
 
Mike,

Please dont get to concerned about the timings quoted in my example. They were given to demonstrate how surprised we were when this particular program was initially run. The project team have now spent some time improving the performance so that it falls within acceptable bounds.

However it would appear from other correspondence I received on this subject that some feel that they would rather allow their batch programs to break component boundaries. We view this as very much a last resort and feel that there are a number of other design and technical options like those highlighted by Darius which would solve this problem. Whatever the answer is, if CBD is to be more that just a buzzword of the moment then it will have to address this issue.

On your other points we are not entirely clear what you mean in items 1 and 4, perhaps you could give small examples. As for item 2, we opted to cache some static data and I would expect that batch orientated operations would not fit well in an online environment.
====

Tim Courtney
Dated  : 30 June, 1997 at 09:50
Subject: Re: CBD and Batch at Old Mutual.
 
Hi Darius,

I think the points you list are good examples of problems which can occur when using Composer and most experienced developers could probably add a few more items to the list. It has been my experience that to some extent many Composer sites go through a learning curve with these items and it is perhaps a pity that TI have not been able to help out with things such as the date/time routines. Indeed there are a number of things which we would ask TI/SS to do to the toolset to assist with these issues.

What was also a disappointment to us was the fact that those who purport to know so much about CBD seemed unable to offer advice on CBD in a batch mode. The performance problems we experienced were due in part to some of the issues you listed, but also we believe to the CBD architecture and level of operations which we had defined. For our current and forthcoming CBD projects we hope we have addressed both of these areas.

However one of the side affects of the performance problem has been that we have had a number of suggestions put forward which would start to break our component boundaries and join them in a much more permanent way. This to my way of thinking is starting to defeating the object of CBD, but there are some who would strongly disagree with me on this.

I also know of another site who independently to us have already decided to join their components when being used by batch programs. There has also been a suggestion from another to use DB2 views across components (so called friendly views). I am not convinced that this is the correct solution to the problem and have a suspicion, based on our experiences here that the performance problem is due to a combination of poor component boundary definition as well as the issues you listed.
 
From your comments I was also unsure if you felt that CBD was just the buzz word of today.

------
Regards,
Tim Courtney (MB&A),
Design Architecture Team, Old Mutual, Cape Town, South Africa.
====

Darius Panahy
Dated  : 01 July, 1997 at 10:38
Subject: Re: CBD and Batch at Old Mutual.
 
I do not think that CBD is just a buzz word. I support its aims of promoting reuse of software, not just within a Composer environment but across different technologies. Reusability has always been possible in a Composer world but difficult outside of Composer. If CBD and the Microsoft repository, UML, etc. can deliver the vision of CBD, then this will be a major step forward.

However the constraints of today's tools and technologies need to be recognised when designing systems, especially high volume batch. Just as the early IEF promise of 'programmerless' development proved to be misleading, there is still a need with CBD (and all other development approaches) to design systems carefully.

Darius


3.3) CBD 96
 
 Gerry Wethington
Dated  : 02 January, 1997 at 20:07
 Subject: CBD 96
 
 Here at the Missouri State Highway Patrol we are beginning an aggressive project utilizing the CBD standards.  We will be working closely with TI and their Rosalyn(sp) Center to implement CBD at the Patrol, exercise the standards documented to date and make
 recommendations based upon the results of this project.  It is our intent to move two legacy applications through the process, taking them from older 3GL and 4GL languages to Composer 4.  We hope to take the lessons learned and the resulting standards to form the
 basis for future projects and potentially use them in our Year 2000 project which will begin July 1, 1997.  We are currently having the assigned staff review the CBD material and will be sending them through training in late January.  Project activity is to begin the first week of February, 1997.
 
 We would be happy to share the results.
 
 Gerry Wethington
 ========
 
 Mike Scott
Dated  : 07 January, 1997 at 16:39
 Subject: Re: CBD 96
 
 I would be very grateful if you could keep me posted about your progress. We are about to embark on a small scale project using some of the CBD 96 method.  At present we are not proposing full scale adoption of the standards but will stay fairly near them so any feedback from you will help us avoid the pitfalls and keep on the right track. I will of course share our results with you.
 
 Mike Scott 


  3.4) Can we take an existing data model and tailor it to our needs?
 
 Thiyagu
Dated  : 19 December, 1996 at 09:07
 Subject: Analysis
 
 We have to build a logistics system on IEF-DB2-COBOL. There is an existing system on IDMS/MVS written in COBOL. The existing system is for handling for one region and proposed is a global solution. Our target environment is CICS/DB2. Now is it a good idea to take existing IDMS data model and then modify to meet new requirements? That is, go directly to data modeling and create entity types present in existing systems along with its attribute and draw ERD.
 
 What effect does will it have on the design issues and on later phases?
 ========
 
 Andy Ward, Dated  : 19 December, 1996 at 09:58
 Subject: Re: Analysis
 
 Hi Thiyagu! I don't know if I can help too much on this. As ever, with all these things there's never a straight answer. There is absolutely nothing to stop you taking a data model from a different source, and then clicking Composer diagrams against it, but you may lose out on some of the facilities offered by the tool. Composer was originally designed to work against DB/2 and has since been targeted at other RDBMS. The key word here, is relational. I know that IDMS was originally not relational, but may have made some progress along that route. If it is truly relational, then you will be able to make use of this from within Composer. If it is not, then it may be more tricky (for instance to read along relationships). The other concern is whether or not you wish to have a model of your business. Taking a database schema and using this as a data model is fine for all us techy people, but means little to business users. One of the concepts behind Composer was the use of a business model to enable business users to contribute to the development. This is then transformed in to a technical design for the particular RDBMS you are using. At this time, you would use denormalisation techniques, add indices duplicate data etc. as necessary. Leaving out the 'analysis' portion may preclude some of this. I hope this has been helpful, but if you want any more info, then post a reply.
 
 Regards
 
 Andy Ward
 ========
 
George Hawthorne
Dated  : 19 December, 1996 at 10:50
Subject: Re: Analysis
 
 There is a tool from Viasoft (your TI account manager should be able to get details) which will take an IDMS schema and create from it a Composer ERD. There is also a tool which captures information from CoBOL source code. I don't know how effective these tools are but it may be a labour-saving device. You would then, of course, have an analysis phase to rationalize the resulting ERD before forward engineering.
 
 TI has a method for approaching the situation you describe. They call it Transition Solutions and are trying it out in a few companies at present. I suggest you ask your account manager for information on it. If the existing application is big it makes sense to automate as much of the re-engineering as possible.