Hallett_German

CA Tuesday Tip: A General Approach to Troubleshooting APM Integration Issue

Discussion created by Hallett_German Employee on Jun 8, 2014
Latest reply on Jun 12, 2014 by Anand Yadav

CA Tuesday Tip: Integration Problems: Visual Inspection/Functional Workflow Approach

Introduction
In Tech #33, I talked about the steps to resolve Problem Resolution Triage (also known as CEM transaction trace or original integration.)

https://support.ca.com/irj/portal/kbtech?searchID=TEC599818&docid=599818&bypass=yes&fromscreen=productKBDocs&techDocAccess=N]Steps for integrating problem-resolution (transaction trace) APM CEM with Introscope.

Following that approach should solve many of these issues. However, a steady stream of these types of cases are still coming in. So, I decided to expand what was written into a tech note. (It will be announced in a Community posting when ready.) And this has gotten me thinking of what should one do in working an integration issue of any kind. Here are the steps I follow while working on integration issues:

Step 0: Do the Pre-work
Before looking at any logs, you should already know the following:

- Which components comprise the integration?
- What function does each component perform?
- Which files need to be configured for the integration to work?
- How will I know that the integration is working?
- Which logs are available for each component? Do they have a debug mode?
- Which server does each component reside in my environment?

To find the answers to this, look for the following in the APM documentation:

- A components list
- An architectural diagram
- A workflow diagram or description
- Screenshots of the integration functionality working.
- A list of log and configuration file names and location.

A good example of a document having this is the CA SiteMinder Application Server Agents Guide.

Step 1: Troubleshooting (In the fire.) 

The inevitable issue has occurred and it is time to show how the above will help in resolving issues. Let us get back to Problem Resolution Triage. Two different approaches can be used together

- Visual Inspection
- Functional Workflow

                                                          Visual Inspection
With Problem Resolution Triage, seeing what is and is not appearing can be an important clue on what to do next. This includes seeing

- What is appearing in the Investigator. (Such as is anything appearing under Custom Metric Host)
- What is appearing in the APM CE defect. (Such as looking in Introscope view in the APM CE GUI or the x-wily-info header in the defect.)

If something is not appearing, it is probably a configuration issue or a threshold not being exceeded (such as percent of slow time).


                                                          Functional Workflow
After completing the visual inspection approach, you can then look at the workflow diagram or description and determine the following:

- Which was the last step successfully completed?
- What is the next step in the integration workflow?
- Which components are involved in those two steps?

By knowing which components are involved, you can avoid looking needlessly at other components. For example, looking at the section "CA APM problem resolution triage overview" in the Configuration and Administration Guide, I determine that a Transaction Trace start request is made as part of a new Incident but the transaction definition is not being matched by the agent. So knowing the components involved, I would ignore all others such as TIM and database. My focus would be on why the Component (APM Introscope Agent) was not successful in performing this Function (Matching an APM CE transaction against the Introscope Agent ruleset.) This would start with an Introscope Agent log in debug mode, a screenshot of the APM CE transaction definition. And much time has been saved on my troubleshooting.

Step 2: Post-Resolution

Build yourself a wiki keeping track of using this approach and add additional questions in the pre-work steps, visual inspections/functional workflow steps as needed.


Questions for Discussion:
1) Does this approach make sense to you
2) If not, which approach do you use?
3) What other troubleshooting topics would you like to be covered in Tuesday Tips?

Outcomes