Organic Server Management
|
Organic Server Management (OSM) :The Organic Server Management (OSM) is a Proactive Monitoring based system, leveraging the concepts of Autonomic Computing, it proposes a self managed environment which requires minimal human intervention - except, primarily, for manual tasks such as server installs. Objectives :* Capacity on Demand model :* Maximize server utilization :* Consolidate standby capacity :* Reduce operational management costs and complexity :* Self managing, self provisioning, self sustaining operational environment :* Minimal human interaction :* Increase availability and uptime leading to higher SLAs :* Policy driven resource management ---- High Level Concept center ---- Major Components :1. Hardware Asset Library ::A record of installed hardware and the features of each, e.g. Server 102 : HP BL680, 3.2 GHz chip, 64GB RAM. :2. Software Asset Library ::A repository of all licensed software including the count, expiration date and terms of that software, e.g. Oracle 10G, 100 licenses, Effective 01/01/2001 Expires 01/01/2020 :3. Application Configuration Requirements ::A list of all software (COTS, OSS, Internal) that is required for each application to satisfy a request type (or group of requests) that will be processed in the OSM environment - e.g. Application XXX : Linux 5.0, Oracle 10G, FUSE ESB 4.0 :4. Software Container ::A collection of software, pre-packaged in accordance with the requirements specified in the Application Configuration Requirements table (above). :5. Change Log ::A record of each change that has occurred in the system, e.g. 200901012314 Server 300 Provisioned with Container 401 :6. Run Time View ::A record of the real-time configuration status of each server in the OSM environment, e.g. Server 298 Container 309, Server 299 Container 309; Server 300 Container 401 :7. Usage Trends ::A record of the hourly peaks by server and across application suite. Data is used to more accurately predict peak and valley forecasts for capacity planning, both server environment increases and decreases :8. Alerts ::Application and system metric alerts - e.g. CPU utilization, memory utilization, I/O utilization, application errors, warnings, time-outs etc. :9. Rules ::Alert condition rules. Every time an alert is submitted, the OSM monitoring will execute each rule against the new alert and - if a matching condition is found - execute corresponding script :::e.g. IF memory, CPU or I/O utilization is >85%, execute provision script for Container xxx :10. Scripts ::A list of scripts that will handle provisioning, deprovisioning, batch jobs etc. as a result of a rule match condition (above) :11. Provisioner ::A service that will dynamically provision a server with all the software (OS, Tools, Packages, Patches, Databases, Applications etc.) it requires to perform its duties, execute requests, commands, database calls etc.) :12. OSM Orchestration ::One or more tools capable of monitoring the alerts, finding matching rules, executing those rules and, as applicable - under match conditions - any corresponding scripts, as well as the service calls requested by those scripts. ---- center ---- Use Cases :Using the high level objectives and functional components defined, below is a step through of how OSM would handle three fundamental operational tasks. :1. Insufficient Capacity ::a. OSM orchestration monitors Alerts and executes each one against the rules in the Rules table ::b. Rules engine finds a matching condition suggesting a capacity threshold has been reached and executes the associated, applicable script ::c. Script engine executes the Provisioner with the container type needed to supplement that environment ::d. Provisioner checks Hardware Assets, Software Assets and Software Container to ensure all building blocks are available and initiates provisioning service ::e. Change Log is updated with all activity during this process ::f. Run Time View is updated to reflect the changes ::g. In a pull model (where the application pulls message requests from a queue) the newly provisioned server will automatically begin consuming targeted message requests ::h. In a push model (router/load balancer) the Provisioner will update the load balancer/router routing table with the VIP of the newly provisioned server which will then begin receiving traffic and consuming its targeted messages requests :2. Server Failure - Recycle / Add New Server ::a. OSM Orchestrator sees a rapid succession of Alerts for timeouts indicating that an abnormal condition or failure has occurred. It executes them against the rules in the Rules table, finds a match and executes the associated, matching script. ::b. Script engine executes command to locate the unresponsive server and recycles it x times. ::c. Change Log is updated to reflect the recycle attempts and their success rate ::d. Alerts continue after the recycling the server ’x times’ and the server remains unresponsive indicating that server is in a down condition. ::e. Rules/Script engines determine if there is sufficient capacity to support the current message volumes, targeted at the down server, without replacing that server. ::f. If not, the Provisioner service is called with the container type needed to supplement that environment ::g. Provisioner checks Hardware Assets, Software Assets and Software Container to ensure all building blocks are available to begin provisioning ::h. Change Log is updated with all activity during this process ::i. Run Time View is updated to reflect the changes :::In a pull model (where the application pulls message requests from a queue) the newly provisioned server” will automatically begin consuming its messages “populate”. In a push model, the new server’s VIP passed to the load balancer which then s=begins passing it messages :3. Application exceeds maximum connections ::a. OSM Orchestrator get Alerts showing sudden velocity increase in connections, finds a matching Rule and executes the associated, applicable script ::b. Script engine executes the Provisioner with the container type needed to supplement that environment - as above Guidelines :* Introduction of OSM is built to be phased in organically, starting with no rules in the Rules engine with the human operator monitoring the Alerts, responding accordingly and initiating requests to the Provisioner as applicable. :* Simple rules should be added first and as confidence and experience grows, more and more human tasks should be migrated to the automated OSM Rules process. :center
|
|
|