Shinken (software)

Shinken is an open source computer system and network monitoring software application compatible with Nagios. It watches hosts and services, gathers performance data and alerts users when error conditions occur and again when the conditions clear.

Shinken's architecture aims to offer easier load balancing and high availability. The administrator manages a single configuration, the system automatically "cuts" it into parts and dispatches it to worker nodes. It takes its name from this functionality: a Shinken is a Japanese sword.

Shinken was written by Jean Gabès as a proof of concept for a new Nagios architecture. Believing the new implementation was faster and more flexible than the old C code, he proposed it as the new development branch of Nagios 4. This proposal was turned down by the Nagios authors, so Shinken became an independent network monitoring software application compatible with Nagios.

Shinken is designed to run under all operating systems where Python runs. The development environment is under Linux, but also runs well on other Unix variants and Windows. The reactionner process (responsible for sending notifications) can also be run under the Android OS. It is free software, licensed under the terms of the Affero General Public License as published by the Free Software Foundation.

Overview

 * Design
 * Monitoring system written in Python
 * Distributed architecture using Pyro remote objects
 * Active and Passive monitoring methods
 * Monitoring of network services (SMTP, POP3, HTTP, NNTP, ICMP, SNMP, FTP, SSH)
 * Monitoring of host resources (processor load, disk usage, system logs) on a majority of network operating systems, including Microsoft Windows
 * Using agents such as NSClient++, send_nsca, Check MK, Thrift TSCA
 * Using agents permitting remotely run scripts via Nagios Remote Plugin Executor (An embedded pure-Python implementation is included with Shinken)
 * Using agent-less methods such as SNMP, WMI, scripted SSH or HTTP(SSL)
 * Send check results directly from programs using Apache Thrift (Java, Python, Ruby)
 * Monitoring of systems which have the ability to send collected data via a network to specifically written plugins (Ex. VMWare ESX3/4/5, Collectd)
 * Remote monitoring supported through SSH or SSL encrypted tunnels.
 * Simple plugin design that allows users to easily develop their own service checks depending on needs, by using the tools of choice (shell scripts, C++, Perl, Ruby, Python, PHP, C#, etc.)
 * Ability to calculate KPIs from State and performance data in the Shinken core to create new services and performance data
 * System external interfaces
 * Livestatus compatible API that exposes state, configuration and performance information
 * Exports data to graphing modules (PNP4Nagios, Graphite, and others available)
 * Support for native messaging API of Android
 * Export event data to logging systems using syslog and RabbitMQ
 * Modules can be attached to any Shinken process to extend its capabilities in very efficient ways
 * Performance
 * Parallelized service and host checks available
 * Ability to distribute poller processes on multiple servers
 * Support for implementing easily redundant and load balanced monitoring hosts
 * Support for multiple redundant external interfaces
 * Ability to route checks to dedicated pollers (processes specialized in executing plugins)
 * Correlation and business intelligence
 * Parent child relations
 * Ability to define network host hierarchy using "parent" hosts, allowing detection of and distinction between hosts that are down and those that are unreachable
 * 1 to 1, 1 to N
 * Free form dependency trees between any service and host
 * 1 to 1, 1 to N
 * Support for integrated business rules
 * Calculated hosts or services representing the state of a business service
 * Support assigning a business impact to each service, host or business process
 * Ability to show only root problems
 * Automatically changes child states to unknown when parent is unavailable
 * Other features
 * Contact notifications when service or host problems occur and get resolved (via e-mail, pager, SMS, or any user-defined method through plugin system)
 * Ability to define event handlers to be run during service or host events for proactive problem resolution
 * Ability to redefine the severity of an alert based on regular expression rules
 * Support for UTF-8 objects names
 * Support for monitoring multiple customers with one administration point
 * Support for recurring downtimes through the maintenance_period attribute
 * Advanced template system with inheritance and overloading

Architecture
A Shinken installation consists of several processes, each optimized for a specific task.
 * Arbiter
 * Loads the configuration files and dispatches the host and service objects to the scheduler(s)
 * Watchdog for all other processes and responsible for initiating failovers if an error is detected
 * Can route check result events from a Receiver to its associated Scheduler
 * Arbiter modules
 * There is a variety of modules to manipulate configuration data
 * Scheduler
 * Plans the next run of host and service checks
 * Dispatches checks to the poller(s)
 * Calculates state and dependencies
 * Applies KPI triggers
 * Raises Notifications and dispatches them to the reactionner(s)
 * Updates the retention file (or other retention backends)
 * Sends broks (internal events of any kind) to the broker(s)
 * Poller
 * Gets checks from the scheduler, execute plugins or integrated poller modules and send the results to the scheduler
 * Poller modules
 * NRPE - Executes active data acquisition for Nagios Remote Plugin Executor agents
 * SNMP - Executes active data acquisition for SNMP enabled agents (In beta stage using PySNMP)
 * CommandPipe - Receives passive status and performance data from check_mk script, will not process commands
 * Reactionner
 * Gets notifications and eventhandlers from the scheduler, executes plugins/scripts and sends the results to the scheduler
 * Broker
 * Has multiple modules (usually running in their own processes)
 * Gets broks from the scheduler and forwards them to the broker modules
 * Modules decide if they handle a brok depending on a brok's type (log, initial service/host status, check result, begin/end downtime, ...)
 * Modules process the broks in many different ways. Some of the modules are:
 * webui - updates in-memory objects and provides a webserver for the native Shinken GUI
 * livestatus - updates in-memory objects which can be queried using an API by GUIs like Thruk or Check_MK Multisite
 * graphite - exports data to a Graphite database
 * ndodb - updates an ndo database (MySQL or Oracle)
 * simple_log - centralize the logs of all the Shinken processes
 * status_dat - writes to a status.dat file which can be read by the classic cgi-based GUI
 * Receiver (optional)
 * Receives data passively from local or remote protocols
 * Passive data reception that is buffered before forwarding to the appropriate Scheduler (or Arbiter for global commands)
 * Allows to set up a "farm" of Receivers to handle a high rate of incoming events
 * Modules for receivers
 * NSCA - NSCA protocol receiver
 * Collectd - Receive performance data from collectd via the network
 * CommandPipe - Receive commands, status updates and performance data
 * TSCA - Apache Thrift interface to send check results using a high rate buffered TCP connection directly from programs
 * Web Service - A web service that accepts http posts of check results (beta)

There can be multiple instances for each type of process, either on a single host or spread over many hosts. Adding more processes automatically distributes the load.

The Shinken WebUI is the builtin Web interface that provides near real time status information, configuration, interaction, a dashboard to visualize trending data from Graphite databases and the visualization of dependency tree graphs.

The Shinken skonfUI is an independent web front-end used to manage the discovery process and configuration tasks.

The shinken-admin CLI script is used to manage during runtime process level aspects of the system, such as changing logging levels and getting health reports.

The install.sh CLI script is the main management script to install, remove or update Shinken and its associated software.

Development
Shinken has an open and test-driven development approach, with contributors to the project providing new features, code refactoring, code quality and bug fixing.

The source code is hosted on GitHub. An integration server runs tests at each commit and in depth tests at regular intervals.

The Shinken documentation is hosted on a wiki.