Monitoring probes and alerting service (for UEC and more)

Registered by Mathias Gug

As a UEC admin I'm alerted by UEC when physical nodes go down or services are flaky.
As a UEC admin I can integrate my UEC deployement into my existing nagios system.

Notes: nagios probes, monitoring (munin, ganglia) probes.

Blueprint information

Status:
Not started
Approver:
Robbie Williamson
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
Review
Series goal:
None
Implementation:
Deferred
Milestone target:
None

Related branches

Sprints

Whiteboard

To be discusses:
Should monitoring (collectd) and logging (rsyslog) be using one network transport? If so, which one: collectd, relp syslog, reconnoiter?

Work Items:
Move collectd to main (MIR).
Refine relevant measures for UEC deployments.
Write collectd input plugins for each of them.
Refine monitoring probes for UEC deployments.
Provide nagios plugins for each of them.
Install collectd on every UEC components.
Install all monitoring and measuring probes on every UEC components.
Automatically setup collectd to send all monitoring data to central monitoring server (CLC) with puppet recipes.
Investigate graphing solutions (munin, graphite, reconnoiter (omniti - not packaged), visage, ganglia).

(?)

Work Items