Part
3: Planning / Analysis
3.1)
How to Model History?
3.2) Componet Based Development
(CBD) and Batch
3.3) CBD 96
3.4)
Can we take an existing data model and tailor it to our needs?
3.1)
How to Model History?
Steve Thomas
Dated : 17 December, 1996 at 16:04:07
Subject: Modeling History
I thought I'd show up with a question (that's how I usually work).
Any suggestions
on modeling history? We need to model current as well as historical
information. We have been thinking on how that is best done.
We could give each record a timestamp identifier, then when a change occurs,
copy the old record to another record in the entity type, but with a different
timestamp to identify and sequence them, with the latest always being
current. Or, we could create a history entity type, and move all
the historical records there, and only keep current ones in the main entity
type. Or maybe some other way.
Any thoughts?
And any references to books, white papers, etc?
Thanks,
and great idea for the Bulletin Board!
====
Andy Ward
Dated : 17 December, 1996 at 16:30:50
Subject: Re: Modeling History
Hi Steve. This has always been a thorny question, and as usual, there
is no straight answer. A lot depends on the use of the entity type and
its likely volumetrics. If you are having alot of changes to a row, and
are loading history on the same table, then fairly obviously the table
grows pretty quickly. Issues such as fragmentation (both data and index)
quickly occur, and of course any queries to the table may take longer
depending on the access path chosen. You do not mention which RDBMS you
are using as some of the issues may be specific to that. Traditionally,
I always prefer to keep historical information away from current info,
unless operationally there is a need for them to be together. This has
to be looked at for each application. For instance, my company has to
value customer portfolios based on the shares value at the time it was
purchased (for Capital gains tax purposes) and this means that we need
to hold historical information. For the stock itself, we hold the data
purchased, and the price at that time. We still need to keep a history
of prices (for other reasons) and these are kept in a separate table.
You do not mention if the date/time stamp was going to be part of the
key. Often it is desirable from a search criteria to keep it there, but
from an update point of view this is an irritant. Often delete and re-insert
is the answer to this. This also doesn't help the fragmentation case.
On a DB/2 table we used table partitioning to keep the individual tablespaces
down to a reasonable size, which can be a useful technique. We used the
quarter and year as the partitioning attribute (e.g. Q197) and because
you can have 64 partitions, this gives us 16 years before we have to 'drop'
off the earliest. Not quire a year 2000 problem, but certainly a year
2012 problem!. Let me know more of the details (eg target DB, table size,
query activity ins/del activity and the like), and maybe I can be of more
help.
Cheers
Andy
====
Steve Thomas
Dated : 18 February, 1997 at 17:30
Subject: History Modeling
We're having a difficult time on a project settling on an approach to
modeling history. We have a requirement that we keep all versions of an
application for services, and be able to recreate them as of a point in
time. So, the original comes in, and over the course of working
on this several amendments to the original are posted. We must be
able to see what the status of this information is at any given time in
it's history. The application for service is in
reality a lot of information, 25 separate forms with 2-4 pages each, so
a lot of data.
What is
a good way to model this, so we have the current and all past versions
easily available?
Any ideas
will be appreciated.
====
David Rothschild
Dated : 19 February, 1997 at 01:40
Subject: Re: History Modeling
Of the many possible methods of modeling history, the one that is right
for you has a lot to do with the frequency with which you access the historical
data and whether space or performance is the highest priority. Answering
the following questions will help you decide the best approach
? Total
number of records (not including history) in table
? Frequency of access to current record.
? Frequency of access to history record
? Frequency of what changed analysis is done.
? Is requirement to produce value at date and time or to detail what changed
over time.
====
Doug Scott
Dated : 20 February, 1997 at 19:47
Subject: Re: History Modeling
Are the forms (you say 25, each of 2-4 pages) totally replaced with
each update?
I must confess that my inclination is to have a series of transactions,
and then to retain the transactions together with an annual snapshot of
the master data. That way you can re-create to a chosen point in time,
without the overhead of holding all the intermediate data.
It also satisfies one of my criteria - every transaction must be
reversible. The history transactions then allow me to control a reversal
mechanism should I ever need it.
====
Alida Riddell
Dated : 21 February, 1997 at 14:30
Subject: Re: History Modeling
In essence you need to use 'effective dating'. Each record (row)
contains the date it becomes effective as this is part of the identifier.
For performance reasons, we have implemented two databases (or tables)
where the history with changes is kept on one 'history' database and the
current active version is kept on the operational database.
You are welcome to call me for any further details.
====
Stacy R. Pickett
Dated : 14 July, 1997 at 21:05
Subject: Modelling and processing history
A question which falls under the "associated methods, or related
matters,..." part of the BBS guidelines.
Does anyone know of some good literature (book, white paper, technical
article, etc...) which discusses the pros and cons of different methods
to model and process the history requirements of a system?
By different methods I mean model history in the same entity type,
model it in a seperate "history" entity type for every "current
information" entity type, model a completely seperate data warehouse,
etc.....
I haven't been able to find a good comparison of the different history
processing methods and thought that I would poll the collective minds
and libraries of the IEF/Composer/COOL:Gen community.
On a completely unrelated topic, I wonder if the new COOL:Gen tool(s)
will install into and over the IEF directory on my PC like the Composer
tool does? Then we'll have COOL:Stuff that we can't find on our PCs because
it's named something completely different than what I am expecting.
8-}
====
Anders Romell, Volvo Data AB
Dated : 18 July, 1997 at 14:10
Subject: Re: Modelling and processing history
I attended a course a couple of years ago called "Advanced
Data Modeling". It was a Codd & Date course but it was presented
by Larry English from "Information Impact International". In
this course we learned a lot of different special case data modeling.
Perhaps LE is giving a course near you or maybe he has written a book?
Check out their Web-site at:
http://www.infoimpact.com
or courses at
http://www.infoimpact.com/educate.htm
====
Steve Thomas
Dated : 24 July, 1997 at 00:37
Subject: Re: Modelling and processing history
Stacy,
Just went through all that history stuff. We produced a few
papers and found a little stuff on it. I do have a file from the
Compuserve TI/IEF forum on this subject, with lots of opinions.
Generally, every approach imaginable was suggested and championed! All
approaches had application in some cases.
I'd be glad to get this together and send to you. I might
have to snailmail some of it, but I can email most of the stuff (two or
three papers, and all the dialog from Compuserve).
Let me see what I can do over the next couple of days.
====
Richard Veryard
Dated : 07 August, 1997 at 18:06
Subject: Re: Modelling and processing history
If you're thinking in object terms, it's worth noting that the behaviour
of the 'history' object is often completely different to the behaviour
of the 'current' object. In fact, all history objects probably share some
common processing characteristics. (In object-speak, they 'inherit' the
properties of some generic history object type.)
It follows that when you partition the data into separate logical
data stores, or onto different physical platforms, it may well make sense
to place the 'history' objects separately from the 'current' objects.
But this gets us into the details of the distribution design for
your chosen technical architecture (e.g. client/server).
That's fine if you're working with a pure object-oriented platform.
But what about Composer (or Cool, as I guess I'm going to have to get
used to calling it)?
If you're trying to use Composer/Cool in an object-oriented way,
you'd have separate entity types for the history data and for the current
data. One problem here is that you may want to use the same UI - the same
screens or windows - for the current data and the history data. Then if
you've put the current data and the history data into different places,
you may have to have a component that accesses both places and presents
both current and history data onto the same windows in the same format.
I don't think there is a single right answer - there never is -
but I'm inclined to favour a more object-oriented approach - separating
current and history data - in the interests of future maintainability.
For one thing, it makes it much easier to implement changes to the data
structure if you only have to convert current data and not history data
as well. Similar arguments apply to data portability - moving data across
platforms.
Hope this helps. Feel free to contact me if you want to discuss
further.
Richard
rxv@veryard.com
http://www.veryard.com
3.2) Componet Based Development (CBD) and Batch
Tim Courtney
Dated : 09 June, 1997 at 09:31
Subject: CBD and Batch at Old Mutual.
The following article is probably only of interest to those of you involved
with
component based development (CBD). However, I encourage anybody and everybody
to comment if they wish.
Tim Courtney
Design Architecture Team, Old Mutual, Cape Town, South Africa.
-----------------------------------------------------------------
Objective.
What follows
is a summary of what has been done at OM with the batch side of the Flexible
Investment Choice (FIC) CBD project. My objective in circulating this
is to gather opinions and experiences in order that we may put together
a set of guidelines for future CBD projects which have a need to
use batch programming styles.
Brief history
About 2
years ago we started on the component identification and construction
phase of the FIC project. This was not the first CBD project at OM, but
it was the first to use the then current CBD architecture which
would have a need to use batch processing on input/output files from/to
OM clients. In order to identify the operations (previously called methods)
and to confirm the boundaries of the components the project when through
a transaction analysis phase. This was chiefly done with the on-line processes
and produced a set of operations for each component to support the user
interface, in this case a GUI C/S front end.
At this
time the architecture at OM called for the user interface to be outside
of the component boundaries and as a result of what was learnt during
this project the architecture has evolved such that this is no
longer always true. It is probably also worth saying that the project
had identified ten components, of which two were of type 2 - Infrastructure
and the remainder fell into type 3 - Domain Generic or 4 - Domain Specific.
The Batch issue
The latter
part of this project involved the construction of a number of batch programs
to process files received from or to be sent to OM clients. A number of
common themes ran through all of these batch programs, mainly that they
all needed to use the code table (type 2) component and they all used
at least two of the other business (type 3 or 4) components. The question
then arose as to where the core (main loop) of the batch program should
lie. Should it be within a component, and if so which one, or should it
lie outside of all of the components. After much debate, the decision
as taken to place the core of the batch programs outside of all of the
components. This was done because it was felt that if the core was placed
inside of a business component, which then required information from other
business components (via operations), these components would start to
develop an affinity for each other which was considered to be a bad trend.
If you thought
of the core as a sort of user interface then this was then also inline
with the OM architecture in force at that time, mainly that the user interface
layer should lie outside of the component.
The result
of this was that the core of the program lay outside of all of the components
and had to access all component data required via operations. The problem
with this was the performance of the resulting batch programs was appalling.
In a traditional environment a constructor could join tables and maintain
open cursors on the required tables, but by forcing them to use operations
this flexibility was lost. It was realised
at the time that the component approach had a cost in terms of program
efficiency, but the scale of that cost didnt become apparent until latter.
Now, having put some additional effort into tuning these batch programs
and the component operations, we have improved the performance without
breaking our component architecture. It could be argued that this should
have been done in the first place, but nobody expected the performance
cost to be so high.
What next
?
Since the
FIC project we have updated our architecture. The main amendments
to this are that :-
a) components may in certain cases now contain a user interface.
b) some business components seem always to have an affinity for other
business components. This is now recognised and projects must now define
these inter component relationships up front.
What we
would like to do now is to establish a set of batch design and construction
guidelines for building batch CBD programs in the future. From our experiences
with FIC a number of observations and views have been expressed which
may provide input to these guidelines. Please note what follows is not
OM CBD policy, but rather opinions from various people which may help
in forming the CBD batch guidelines which we are after. The statements
are intended to provoke comment and I have not included any comments on
them, rather this is what I would like to come from you.
- Batch
programs have no place in a CBD environment.
- Batch programs should not be constrained by component boundaries, but
should be allowed to cross these boundaries, join tables etc.
- The cost in poor performance will be out weighed by the improvement
in our ability to build systems to meet our business needs and should
therefore be ignored.
- Allowing a batch programs core to be within a component will improve
performance sufficiently to bring the performance cost within acceptable
boundaries.
- The concept of batch programs is forced on us by our historical view
of processing data. In some environments there is no concept of batch
and we should rather adjust our view of how to process data to a more
transaction based concept.
- Some experienced developers have a very fixed and traditional view of
how systems should be designed and built and dont have the flexibly of
mind to embrace the use of components and address the issues that they
raise. (I know this is a real bitchy point, but I note that it has been
raised by a number of different CBD sites.)
- The transaction analysis should have taken more account of the batch
processing requirements which may in turn have affected the component
boundaries and operations.
- Our operations were designed for an on-line environment so why is anybody
surprised that when called from a batch program they perform badly. What
we should really have is two types of operations, one for on-line and
one for batch programs.
An example.
To aid further
discussion I have included an example of one of our batch programs with
component names. In case you were not aware of what we do, our business
is about providing Employee Benefits schemes for a number of large corporate
clients.
Batch member
update program.
A 80 000
record file is loaded to the DB2 Application layer tables. The Batch
program core exists within the application (user interface) layer. In
order to process the records, information from 4 business components is
required. In some cases, a component is visited more than once for
information, based on different criteria. The majority of the information
is stored/retrieved utilising effective date.
The program
uses the following components :-
CLIENT
CAR
(Client Agreement Role)
AGREEMENT
RISK
CODE TABLE & PERMITTED VALUES
Program
Flow
For each
input record of type 1060:-
The
CLIENT component is read to get the Client relationship details, in order
to obtain the salary details from the CLIENT Component. IF there has been
a salary increase, the new salary is inserted into the CLIENT component.
The CAR component is visited to get the employment details, this is used
to get the Plan details from the AGREEMENT Component. The AGREEMENT Component
is accessed to determine the annualising factor, based on the salary type,
paymeny details and frequency, which is used to calculate the annual salary,
this is then inserted into the CAR component under membership details.
The RISK Component is called to determine the Formula, based on the age
of the Client and frequency of payment, for calculating the benefits,
the new Benefits are then inserted into the CAR component.
These stats
may not mean much to you, but just in case you are interested the size
of last end of month file from just one customer site was 31,564 records.
After validation only 564 were processed in 6.04 minutes.
The TOTAL
number of DB2 statements executed in these 6.04 minutes = 535575. Made
up of the following statements: Select = 93311 Insert = 1374
Update = 30924 (this being a status indicator on the application table)
Open cursor = 126 294 Close cursor = 126 292 Fetch
from cursor = 157380
====
Mike Scott
Dated : 20 June, 1997 at 22:42
Subject: Re: CBD and Batch at Old Mutual.
Tim,
Thanks for
the information, it has come at just the right time for us. We are
currently designing a sales bonusing system. The rules for the bonus
engine are likely to change every year and so we have decided to componentize
(is that a word?) rather that try to build a complicated data driven rule
based system. The main part of the system will be batch and is likely
to call operations of 3 or 4 domain specific components. We will
have to process about one million input records each month and run the
bonus engine for a few thousand employees. Your statements have caused
a bit of agitation here as your timings would appear to suggest that it
will take us over a week to process a million records! Or have I
misunderstood?
You asked
for comments on your statements, here is my two penny worth.
- CBD must support batch for it to be any use in large enterprises.
Batch is not just a technical convenience there are business reasons for
doing it. In any case if components can't be made to perform and you go
large scale down this route and attemt to do all business operations online
then online operations will be too slow to use.
- Allowing systems under the covers defeats the object of components.
If you start by allowing it for batch it is not long before you discover
other exceptions and eventually we will end up having to invent another
method to make CBD usable (just like CBD did for OO).
- I agree with the sentiment about performance, in a few years time the
machines will be so fast we will wonder what the fuss was about (our COBOL
staandards used to tell us to use GO TO instead of PERFORM because it
saved a couple of machine cycles). However in the short term we must develop
ways of making components perform.
- I don't know if it is possible to add a batch core to components, by
their nature batch programs will need to call the operations of a number
of components. I wonder if it is possible to add operations to components
which do some of the batch work.
- Batch is not going to go away in the short term so we need to tackle
the issues now.
- As you said the culture issue has been discussed before and I don't
have anything to add except that we had the same battles with CASE tools
so maybe we can learn from that.
- A good point. It looks like batch performance is another factor
which must be considered when putting the component architecture together.
- I can see a need to have some operations which will only be used in
a batch environment, but we will be creating a maintenance nightmare if
we start designing parallel versions of operations.
Other points.
- Is it a problem of maintaining context on an object instance?
- Would memory caching of object instances help? What are the side effects
of this?
- If all operations are designed with batch in mind will online performance
suffer.
- Accessing components via their specifications should allow one to tune
the implmentation without affecting the application so is there a better
way to implement?
This is
a critical issue for us and will seriously affect the success of our latest
CBD project if we do not get it resolved. So I would appreciate
any contributions to this thread.
Regards,
Mike Scott
====
Darius Panahy
Dated : 23 June, 1997 at 21:41
Subject: Re: CBD and Batch at Old Mutual.
Hi Tim,
My initial
impression is that this is an extreme example of a problem that has afflicted
IEF/Composer for many years.
The issues
that are raised by CBD components have arisen in the past when having
the deal with the question of common action blocks, elementary processes,
etc. Do we use a CAB or do we re-code as in-line code?
Fine tuning
the code for the specific batch program will always offer better performance
at the cost of extra development effort and duplication of code, leading
to higher maintanance costs over time.
I feel that
the 80/20 rule applies here and that limited duplication of code to achieve
the desired level of performance is necessary and justifible. This does
not mean that CBD has no place in batch, merely that you have to be prepared
to compromise.
Apart from
the overhead caused by calling action blocks, there are many other causes
of poor performance. Some of the main ones are:
- Systems
are often designed for online execution with little consideration for
implications in high volume batch. Symptoms of this include:
- Common routines that are designed for single execution, often re-reading
data to avoid having to pass data through import/export views
- little use of persistent entity views
- highly normalised data models
- Overuse of IEF supplied functions (concat, substr, numtext and the real
killers - the date routines. The TI runtime routines can often consume
a considerable amount of the CPU in a high volume batch program.
- Reluctance to use EABs or external code even when this would result
in a significant performance advantage c.f. Composer equivalent, for example
DB2 load utility for bulk data load.
- Inability to have multiple open cursors in Composer action diagrams.
Something
else to consider is the use of 3rd party products to help with the SQL
overhead that CBD brings. I know of one company that has had dramatic
reductions in CPU through using an SQL caching product that is transparent
to the application code.
Regards
Darius Panahy
Information Engineering Technology Ltd
====
Tim Courtney
Dated : 27 June, 1997 at 16:13
Subject: Re: CBD and Batch at Old Mutual.
Mike,
Please dont
get to concerned about the timings quoted in my example. They were given
to demonstrate how surprised we were when this particular program was
initially run. The project team have now spent some time improving the
performance so that it falls within acceptable bounds.
However
it would appear from other correspondence I received on this subject that
some feel that they would rather allow their batch programs to break component
boundaries. We view this as very much a last resort and feel that there
are a number of other design and technical options like those highlighted
by Darius which would solve this problem. Whatever the answer is, if CBD
is to be more that just a buzzword of the moment then it will have to
address this issue.
On your
other points we are not entirely clear what you mean in items 1 and 4,
perhaps you could give small examples. As for item 2, we opted to cache
some static data and I would expect that batch orientated operations would
not fit well in an online environment.
====
Tim Courtney
Dated : 30 June, 1997 at 09:50
Subject: Re: CBD and Batch at Old Mutual.
Hi Darius,
I think
the points you list are good examples of problems which can occur when
using Composer and most experienced developers could probably add a few
more items to the list. It has been my experience that to some extent
many Composer sites go through a learning curve with these items and it
is perhaps a pity that TI have not been able to help out with things such
as the date/time routines. Indeed there are a number of things which we
would ask TI/SS to do to the toolset to assist with these issues.
What was
also a disappointment to us was the fact that those who purport to know
so much about CBD seemed unable to offer advice on CBD in a batch mode.
The performance problems we experienced were due in part to some of the
issues you listed, but also we believe to the CBD architecture and level
of operations which we had defined. For our current and forthcoming CBD
projects we hope we have addressed both of these areas.
However
one of the side affects of the performance problem has been that we have
had a number of suggestions put forward which would start to break our
component boundaries and join them in a much more permanent way. This
to my way of thinking is starting to defeating the object of CBD, but
there are some who would strongly disagree with me on this.
I also know
of another site who independently to us have already decided to join their
components when being used by batch programs. There has also been a suggestion
from another to use DB2 views across components (so called friendly views).
I am not convinced that this is the correct solution to the problem and
have a suspicion, based on our experiences here that the performance problem
is due to a combination of poor component boundary definition as well
as the issues you listed.
From your comments I was also unsure if you felt that CBD was just the
buzz word of today.
------
Regards,
Tim Courtney (MB&A),
Design Architecture Team, Old Mutual, Cape Town, South Africa.
====
Darius Panahy
Dated : 01 July, 1997 at 10:38
Subject: Re: CBD and Batch at Old Mutual.
I do not think that CBD is just a buzz word. I support its aims of promoting
reuse of software, not just within a Composer environment but across different
technologies. Reusability has always been possible in a Composer world
but difficult outside of Composer. If CBD and the Microsoft repository,
UML, etc. can deliver the vision of CBD, then this will be a major step
forward.
However
the constraints of today's tools and technologies need to be recognised
when designing systems, especially high volume batch. Just as the early
IEF promise of 'programmerless' development proved to be misleading, there
is still a need with CBD (and all other development approaches) to design
systems carefully.
Darius
3.3)
CBD 96
Gerry Wethington
Dated : 02 January, 1997 at 20:07
Subject: CBD 96
Here at the Missouri State Highway Patrol we are beginning an aggressive
project utilizing the CBD standards. We will be working closely
with TI and their Rosalyn(sp) Center to implement CBD at the Patrol, exercise
the standards documented to date and make
recommendations based upon the results of this project. It
is our intent to move two legacy applications through the process, taking
them from older 3GL and 4GL languages to Composer 4. We hope to
take the lessons learned and the resulting standards to form the
basis for future projects and potentially use them in our Year 2000
project which will begin July 1, 1997. We are currently having the
assigned staff review the CBD material and will be sending them through
training in late January. Project activity is to begin the first
week of February, 1997.
We would be happy to share the results.
Gerry Wethington
========
Mike Scott
Dated : 07 January, 1997 at 16:39
Subject: Re: CBD 96
I would be very grateful if you could keep me posted about your
progress. We are about to embark on a small scale project using some of
the CBD 96 method. At present we are not proposing full scale adoption
of the standards but will stay fairly near them so any feedback from you
will help us avoid the pitfalls and keep on the right track. I will of
course share our results with you.
Mike Scott
3.4) Can we take an existing data model and tailor it to our needs?
Thiyagu
Dated : 19 December, 1996 at 09:07
Subject: Analysis
We have to build a logistics system on IEF-DB2-COBOL. There is an
existing system on IDMS/MVS written in COBOL. The existing system is for
handling for one region and proposed is a global solution. Our target
environment is CICS/DB2. Now is it a good idea to take existing IDMS data
model and then modify to meet new requirements? That is, go directly to
data modeling and create entity types present in existing systems along
with its attribute and draw ERD.
What effect does will it have on the design issues and on later
phases?
========
Andy Ward, Dated : 19 December, 1996 at 09:58
Subject: Re: Analysis
Hi Thiyagu! I don't know if I can help too much on this. As ever,
with all these things there's never a straight answer. There is absolutely
nothing to stop you taking a data model from a different source, and then
clicking Composer diagrams against it, but you may lose out on some of
the facilities offered by the tool. Composer was originally designed to
work against DB/2 and has since been targeted at other RDBMS. The key
word here, is relational. I know that IDMS was originally not relational,
but may have made some progress along that route. If it is truly relational,
then you will be able to make use of this from within Composer. If it
is not, then it may be more tricky (for instance to read along relationships).
The other concern is whether or not you wish to have a model of your business.
Taking a database schema and using this as a data model is fine for all
us techy people, but means little to business users. One of the concepts
behind Composer was the use of a business model to enable business users
to contribute to the development. This is then transformed in to a technical
design for the particular RDBMS you are using. At this time, you would
use denormalisation techniques, add indices duplicate data etc. as necessary.
Leaving out the 'analysis' portion may preclude some of this. I hope this
has been helpful, but if you want any more info, then post a reply.
Regards
Andy Ward
========
George Hawthorne
Dated : 19 December, 1996 at 10:50
Subject: Re: Analysis
There is a tool from Viasoft (your TI account manager should be
able to get details) which will take an IDMS schema and create from it
a Composer ERD. There is also a tool which captures information from CoBOL
source code. I don't know how effective these tools are but it may be
a labour-saving device. You would then, of course, have an analysis phase
to rationalize the resulting ERD before forward engineering.
TI has a method for approaching the situation you describe. They
call it Transition Solutions and are trying it out in a few companies
at present. I suggest you ask your account manager for information on
it. If the existing application is big it makes sense to automate as much
of the re-engineering as possible.
|