SNAMP

SNAMP is an open-source, cross-platform software platform for telemetry, tracing and elasticity management of distributed applications.

Overview
The main purpose of SNAMP is to simplify management of microservices running inside of containers or software-defined data centers. It provides telemetry data gathering (metrics and events), end-to-end tracing of requests between software components reflecting topology of communication paths, automatic scaling of cluster nodes based on workload, unification of telemetry data sources.

Telemetry
Metrics, events and health checks are used to control state and health of services in IT landscape. Gathering is carried out in real time. The collected data can be used for visualization in the form of charts, alarming and executing maintenance actions. It is possible to set up a watcher for the important metric. The watcher determines limitations and conditions applied to the value of the metric. If the value is out of range then watcher can execute a trigger. The trigger can be represented as handwritten script using one of the supported scripting languages, e.g. Groovy. It can be a maintenance action (restart server node) or notification (alert to e-mail).

Additionally, it is possible to extend functionality of existing monitoring tool used in enterprise. SNAMP can gather telemetry data and expose the data to the outside using any combination of supported protocols. For example, the data collected from JMX protocol can be exposed through SNMP and acquired by other network management software such as Nagios.

Tracing of requests
Tracing of requests allows to identify communication paths between services and to collect important metrics such as response time, requests per second, availability, scalability etc. This data helps to troubleshoot latency and scalability issues and to find the bottlenecks. Additionally, communication paths can be visualized in the form of the graph in the Web Console that allows to observe entire IT landscape in real time.

Applications should be instrumented to report the necessary information to SNAMP. Instrumentation libraries can be found at Maven Central using groupID. Third-party instrumentation libraries are also supported:
 * OpenZipkin
 * Apache HTrace

Elasticity management
Elasticity manager is a component of SNAMP that is responsible for automatic provisioning and decommissioning of cluster nodes. Its behavior is based on scaling policies. One more scaling policies can be associated with the cluster. Decision process is based on fuzzy logic. Each policy participating has its own vote weight and elasticity manager execute voting process periodically. Voting result represents one out of three possible decisions: enlarge cluster, shrink cluster or do nothing. Scaling policy can be based on health check, handwritten script or range of values associated with some metric. Due to the flexibility of the decision process it is possible to define several strategies for scaling: Moreover, it is also possible to assign custom weights for each scaling policy.
 * All-of strategy means that all scaling policies should vote for changing the capacity of the cluster
 * Any-of strategy means that at least one of the scaling policies can vote for changing the capacity of the cluster
 * Majority strategy means that majority of scaling policies can vote for changing the capacity of the cluster

Elasticity manager uses underlying cluster or cloud management platform for sending commands about provisioning and decommissioning. It can be OpenStack, Kubernetes or VMware ESXi.

Web Console
Web Console is used for visualization of metrics in the form of charts, visualization of communication paths between services in the form of graph, cluster monitor. Using Web Console for visualization is an optional feature, because SNAMP provides integration with other tools such as Grafana.

Architecture
SNAMP platform consists of following several components:
 * Resource Connector responsible for communication between SNAMP and service in IT landscape. It encapsulates communication protocol and exposes telemetry data to SNAMP in unified way. For example, JMX Connector can be used to control Java applications using JMX protocol.
 * Gateway exposes information collected from all resource connectors to the outside using the selected specified protocol. For example, SNMP Gateway can expose telemetry data obtained from all resource connectors using SNMP protocol.
 * Supervisor controls group of resources. It provides health monitor, elasticity management, automatic discovery of resources.

Combination of different gateways and resource connectors is able to transform telemetry data from one protocol to another. Each component might be customized using Groovy-based scripts. It is possible to write custom component using any JVM-compatible languages.

Features

 * Integration with third-party visualization and monitoring tools: Grafana, Nagios, SSH
 * Collecting telemetry data using following protocols and technologies: Spring Actuator, OpenZipkin spans (from Kafka and HTTP), HTTP, JMX, Modbus, rsh, stdout from command-line tools, SSH
 * Exposing telemetry data using following protocols: XMPP (chat bot), SNMPv2/SNMPv3, HTTP, NRDP, NSCA, syslog, data streaming to InfluxDB
 * Elasticity management supports OpenStack Senlin.
 * Groovy scripting

Alternatives
An alternative solution might be constructed using combination of software components:
 * Monitoring and visualization: Nagios, Grafana
 * End-to-end tracing: OpenZipkin
 * Automatic scaling: AWS provides automatic scaling of EC2 resources, OpenStack Heat

Jolokia offers JMX-to-HTTP bridge that can be hosted inside of a standalone Java program, Java EE application server or OSGi environment.