The reasons why your web response time is so critical and why Jennifer is the best tool to manage web response. Jennifer approaches the problem from the user point of view.
Why Monitor Service Response Time?
Web Application Server service response time can be expressed as the most important measurement of customer satisfaction. Even if there are some bugs in a system, if the bug does not cause any problem in service response time or site's functionality, it cannot be seen as a problem. As such, even if there is no bug found in the system, if the service response time is not fast enough to fulfill customer satisfaction, the system itself has a problem and cannot be considered normal. Service response time is an important information source in measuring system's stability and diagnosing system problems. The following describes how using service response time to resolve system performance issues and why monitoring the system resource alone is the correct approach.
Resource Usage Cannot Exceed 100%
System resource usage cannot exceed 100%. This means that system resource usage cannot be used to diagnosis system capacity. Let's take look at a situation where vmstat is being used to monitor CPU usage. CPU usage is constantly very high, 95~100%. Is this a problem? Most system administrators cannot determine if this is a problem. All they can say is that the CPU is being used heavily. The administrators cannot determine whether the number of incoming requests exceed system capacity just by monitoring the system resources alone. For example, lets say that it takes 20 concurrent request to max out the CPU usage of a WAS server. What if there are 30 concurrent requests? Whether there are 20 or 30 requests, the CPU usage will 100% in both cases. Of course administrator usually cannot tell how many concurrent incoming requests will max out system resource.
Monitoring all system resources is inefficient.
Another limitation of resource monitoring is that there are too many things to monitor. In any given system, there exist many H/W and S/W related resources such as CPU, Memory, NET, HEAP, Connection Pool, etc...; it is inefficient (also probably impossible) to monitored all these system resources individually and its is not really necessary neither. Incoming requests exceeding system capacity results in delayed response time.
In order to overcome the limitations of system resource monitoring, service response time must be monitored. As the incoming requests exceed system capacity, service response time is increased indefinitely, letting administrators know that resource shortage exists within the system. Since response time increases if any system resource is lacking, response time can be used to monitor system resource.
Response time must be measured per transaction.
Then how should service response time be measured? Before discussing this point, let's look at the relationship between service and the resource. In a web system, service may interact with many different components such as class, DB, LDAP, file, etc... and when the different system resources that are tied to each component are combined, that number can be very large. Also, resources are only used by specific requests while others may be used by many different services. To conclude, the relationship between resource and service is N:M relation and it cannot be clearly defined. The N:M relationship between services and resources are only expressible as average response time grouped by service name or functional category in a line graph. Instead the individual transaction must be plotted separately.
There are a few reasons why Jennifer monitors service response time individually rather than in groups.
First, when identical services are executed multiple times, the response time may be delayed for specific transactions only. No matter how the grouping is done, the individual service response time will be diluted if it is averaged out with other services in the group. Secondly, there is mapping issue between response time and profiling. If the grouping is done by service name, the mapping would somewhat make sense, but if the mapping is done by business object, the mapping will be too complicated to be used effectively. Thirdly, Service cannot be classified easily by name. Since service name is determined by the initial request that called it, it does not capture the internal changes that occur during its process. Grouping different services without knowing the internal changes because they share the same service name is not very effective way to group the services.
Jennifer can show response time per individual transaction in a single view
Jennifer's X-View offers direct and powerful way to monitor performance issue. The ability to see all services' individual response time and its detail in one view is more effective than using many different views or graphs combined. Resource shortage is shown as delay in response time in X-View, and the plotted dots form patterns depending on the issue thus users can benefit from this view compared to other solutions.
The IT Team quickly sees the value of Jennifer. Tasks that may hours/minutes with other solutions take only minutes/seconds in with Jennifer's X-View.
|