After a bit more digging, I noticed that the trellis probe is encountering issue's upon startup-
Oct 21 14:08:57:615 [main, trellis] Initiator 'com.ca.trellis.persist.relational.DataSourceInitiator' threw an exception during application.
Oct 21 14:08:57:615 [main, trellis] Reason:
Oct 21 14:08:57:616 [main, trellis] com.lift.SystemException: configuration
Caused by: (4) not found, Received status (4) on response (for sendRcv) for cmd = 'nametoip' name = 'data_engine'
Oct 21 14:08:59:004 [main, trellis] Initiator 'com.ca.trellis.persist.relational.PersistenceUnitInitiator' threw an exception during application.
Oct 21 14:08:59:004 [main, trellis] Reason:
Oct 21 14:08:59:004 [main, trellis] com.ca.trellis.spi.deployment.DeploymentException: Referenced object identified by 'tnt2-ds' did not existPlease fix your configuration
Oct 21 14:08:59:980 [main, trellis] Caught exception while trying to start Trellis. The probe should be responsive, but Trellis isn't
Oct 21 14:08:59:980 [main, trellis] java.lang.IllegalStateException: org.springframework.context.annotation.AnnotationConfigApplicationContext@e4f8592 has not been refreshed yet
So to test, I started up the data_engine probe on the Secondary, as it appears to be a requirement per the documentation, and the trellis looks better, with the exception of the ACE probe, which I am not sure if it is required for the sdgtw:
Oct 21 14:51:17:308 [main, trellis] Creating Shift Context
Oct 21 14:51:17:376 [main, trellis] Registering service: class com.nimsoft.events.nas.NasAlarmServiceImpl
Oct 21 14:51:18:272 [main, trellis] Creating Shift Context
Oct 21 14:51:18:273 [main, trellis] Registering service: class com.ca.uim.services.ugs.DefaultGroupService
Oct 21 14:51:18:273 [main, trellis] Registering service: class com.ca.uim.tnt2.services.DefaultLegacyGroupService
Oct 21 14:51:18:273 [main, trellis] Registering service: class com.ca.uim.ugs.metadata.FlywayMigrationService
Oct 21 14:51:18:273 [main, trellis] Registering service: class com.ca.uim.tnt2.services.DefaultComputerSystemService
Oct 21 14:51:18:273 [main, trellis] Registering service: class com.ca.uim.tnt2.services.DefaultConfigurationItemService
Oct 21 14:51:18:925 [taskScheduler-1, trellis] ACE could not be located. Not configuring
Oct 21 14:51:19:120 [main, trellis] ****************[ Starting ]****************
Oct 21 14:51:19:120 [main, trellis] 2.01
Oct 21 14:51:23:748 [main, trellis] Failed to contact ACE. Configuration
After 'Resolving' a ticket within ServiceNow, I am now getting a different alert in the trace log of the sdgtw...it appears to be ignoring it now:
Oct 21 14:30:18:368 [ServiceNow, sdgtw] responseCode :: [200] response messege :: [OK]
Oct 21 14:30:18:375 [ServiceNow, sdgtw] Incident found for closing [com.ca.integration.normalization.omodel.Incident@14982fb2]
Oct 21 14:30:18:375 [ServiceNow, sdgtw] Completed executing the filter. Number of records returned - 1
Oct 21 14:30:18:375 [ServiceNow, sdgtw] Ignoring the incidentId '198d782ddb992b80995a791c8c961905' as it is not associated with any Alarm.
But the thing is, there is an Alarm, with that id, in the console. Not sure why it is ignoring it.
...and now the trellis probe is kicking out some more interesting log messages, its repeating this:
Oct 21 14:58:57:579 [attach_socket, trellis] Dispatcher caught unchecked service exception. This could be normal behavior, but you may want to examine it anyway
Additionally...while comparing the production Trellis to test Trellis...both of them receive the "ACE could not be located. Not configuring". But the prod Trellis, receives the "Failed to contact ACE". So I looked at the ACE logs for both prod and test, and they both have:
Oct 21 15:11:24:679 ERROR [attach_socket, com.nimsoft.nimbus.NimServerSession] Exception in NimServerSessionThread.run. Closing session.
Oct 21 15:11:24:680 ERROR [attach_socket, com.nimsoft.nimbus.NimServerSession] (2) communication error, Error when trying to send on session (S) com.nimsoft.nimbus.NimServerSession(Socket[addr=/10.240.135.14,port=56388,localport=48033]): Software caused connection abort: socket write error
...I decided to restart the ACE probe, cause why not...and only the production Trellis received the following:
Oct 21 15:05:41:368 [attach_socket, trellis] An exception occurred while processing a message from Socket[addr=/10.240.135.14,port=56171,localport=48043].
Oct 21 15:05:41:368 [attach_socket, trellis] (120) Callback error, Exception in callback for public void com.ca.trellis.shift.core.TrellisDispatchCoordinator.dispatch(com.nimsoft.nimbus.NimSession,com.nimsoft.nimbus.PDS) throws com.nimsoft.nimbus.NimException: No qualifying bean of type [com.ca.trellis.shift.core.ShiftDispatcher] is defined: No qualifying bean of type [com.ca.trellis.shift.core.ShiftDispatcher] is defined
Looks like we have circled back around to this 'dispatcher'. Does anyone have any insight on this one?