Monday, June 23, 2008

Goodbye BAM/BI, hello SAM/SI

OK, not quite, but it's a nice title ;-)

The term Business Activity Monitoring (BAM) is used to describe the real-time access to critical business performance metrics in order to improve the efficiency and effectiveness of business processes. Real-time process/service monitoring is a common capability supported in many distributed infrastructures. However, BAM differs in that it draws information from multiple sources to enable a broader and richer view of business activities. BAM also encompasses Business Intelligence (BI) as well as network and systems management. Plus BAM is often weighted toward the business side of the enterprise.

Although BAM was popularized by BPM, the fundamental basis behind it (monitoring the activities in an environment and informing interested parties when certain events are triggered) has been around since the early days of (distributed) system management and monitoring. BAM specializes this general notion and targets the business analyst. Using BAM within an autonomic infrastructure is often difficult if not impossible (depending upon the implementation in question).

Within a distributed environment (and many local environments) services are monitored by the infrastructure for a number of reasons, including performance and fault tolerance, e.g., detecting when services fail so that new instances can be automatically started elsewhere. Over the years distributed system implementations have typically provided different solutions to specific monitoring requirements, e.g., failure detection (or suspicion) would be implemented differently from that used to detect performance bottlenecks. For some types of event monitoring this leads to overlap and possible inefficiencies. For instance, some approaches to detecting (or suspecting) failures may also be used to detect services that are simply slow, indicating problems with the network or overloaded machine on which the service resides. But where these ad hoc approaches have differed from BAM/BI is in their intended target audience: other software components (e.g., a load balancer) rather than humans.

This separation of audience is useful from a high-level perspective: business analysts shouldn't have to be concerned about low-level infrastructural details. But in many cases this ad hoc (bolt-on) approach to BAM and BI can lead to less information being delivered to the entities that need it at the time they need it. Therefore, within the Overlord project we are working on Service Activity Monitoring (SAM) and associated Service Intelligence (SI), which will provide an architecture (and corresponding infrastructure) that brings together many different approaches to entity monitoring within distributed systems (where an entity could be a service, a machine, a network link or something else entirely) and particularly SOIs. The emergence of event processing has also seen an impact on this general entity monitoring, where some implementations treat failure, slowness to respond etc. as particular events. This uniform monitoring includes the following:

• Message throughput (the number of messages a service can process within a unit of time). This might also include the time taken to process specific types of messages (e.g., how long to do transformations).
• Service availability (whether or not the service is active).
• Service Mean Time To Failure (MTTF) and Mean Time To Recovery (MTTR).
• Information about where messages are sent.

The information is made available to the infrastructure so that it may be able to take advantage of it for improved QoS, fault tolerance etc. The streams may be pulled from existing infrastructure, such as availability probing messages that are typically used to detect machine or service failures, or may be created specifically for the SAM environment. Furthermore, streams may be dynamically generated in real-time (and perhaps persisted over time) or static, pre-defined information, where the SAM can be used to mine the data over time and based on explicit queries.

With the advent of SAM we will see BAM implementations that are built on it, narrowing the types of events of interest for the business analyst. The SAM approach offers more flexibility and power to monitoring and management over the traditional BAM approaches. As BPM and SOA move steadily towards each other, this kind of infrastructure will become more important to maintaining agility and flexibility.

No comments: