User:Ansarisam/sandbox

Accure Momentum

Accure Momentum (http://accureanalytics.com/momentum) is a big data platform for machine learning, NLP and IOT.

Momentum enables enterprises to process large volume of data at a very high speed and derive actionable insights in a fraction of time and cost.

With momentum you can perform advanced analytics – text mining, natural language processing, statistical programming, machine learning, and predictive science.

Functionality

1.   Installation and Cluster Management

A web based UI driven cluster setup and installation wizard makes creating 100s of nodes of cluster an easy task. The following UI based tools are available:

a.     Cluster setup

b.     Cluster monitoring

c.      Cluster management - scale up and down

d.     Service monitoring and management – start and stop

2.   High Scale and Superfast ETL

UI driven ETL to load multiple formats of data from multiple sources and transform them by doing aggregation and joins that may not otherwise be possible by any other ETL system. The output of a transformation can be used as an input to another transformation. The data processing is done in parallel across worker nodes and it scales linearly as the number of nodes in the cluster.

3.   Machine Learning

Data scientist and business analyst friendly machine learning implementation to super charge your data analytics without writing a single line of program. All machine learning algorithms run in parallel on distributed cluster to make it fast. It also allows machine learning and predictions on stream of data in realtime.

4.   Natural Language Processing

Momentum provides out of the box support for Text analytics with sentiment analysis, concept categorization and other NLP algorithms.

1.   Realtime Analytics

Using Momentum, develop high performance stream based analytics suitable for applications such as Internet of Things (IOT) and click stream. This UI based scalable system makes stream processing faster and easier.

2.   IoT Support

Momentum is an IOT platform. It allows easy configuration to add millions of IOT devices and collect and analyze massive amount of sensor data at a very high speed.

3.   Visualization

Most BI tools can connect to momentum data storage via JDBC/ODBC connector and visualize analytics results.

4.   Custom Analytics

Build custom analytics in Java or R without worrying about complexities of platform specific APIs.

Features

5.   Input Data Source

Momentum supports the following data sources for ETL input

a.     RDBMS based databases : MySQL, Oracle, DB2, and Postgres

b.     NoSQL: Cassandra, MongoDB,  and HBase

c.      Structured Files: CSV, TSV, Text, XML and JSON

d.     Unstructured Files:  Text

e.     HDFS

f.       Other sources can be easy added

6.   Output Data Storage

Supported storage or sink types are:

a.     HDFS

b.     MongoDB

c.      Solr

7.   Built-in Functions

Momentum supports the following analytics functions:

1.    Mathematical Functions

a.     round, floor, ceil, ceiling

b.     rand, exp,ln, log,log2, pow

c.      sqrt, hex, unhex, abs, pmod

d.     sin, asin, cos, acos, tan, atan

e.     degrees, radians

f.       positive, negative, sign

g.     e, pi

2.    Aggregation Functions

a.    count, sum

b.    avg, min, max, variance, var_pop, var_samp

c.     stddev_pop, sdtdev_samp

d.    cov_pop, covar_samp, corr

e.    percentile, percentile_approx

f.      histogram_numeric, collect_set

3.    Date Functions

a.    from_unixtime,unix_timestamp, to_date

b.    year, month, day, hour, minute, second, weekofyear

c.     datediff, date_add, date_sub

d.    from_utc_timestamp,to_utc_timestamp

4.    String Functions

a.    ascii, concat,concat_ws,

b.    context_ngrams

c.     find_in_set, format_number

d.    get_json_object

e.    in_file, instr, locate

f.      length, lower, lpad, rpad, upper, trim,ltrim,rtrim,str_to_map

g.     parse_url

h.     printf

i.       regex_extract, regex_replace, repeat, reverse

j.       sentences, space, substr, translate,

5.    Conditional Functions

a.     if

b.    COALESCE

c.     CASE .. WHEN .. THEN .. END

8.   Machine Learning Algorithms

Supported algorithms for machine learning are:

1.     Logistic Regression – both binary and multinomial logistic regressions

2.     Linear Regression

3.     Streaming Linear Regression

4.     K-Means Clustering

5      Recommendation Engine through Collaborative Filtering using Alternating Least Square (ALS)

6.     Naïve Bayes

a.     Multinomial Naïve Bayes

b.     Bernoulli Naïve Bayes

9.   Supported NLP

Momentum provides the following machine learning based NLP support out-of-the-box:

1.     Tokenization

2.     Sentence segmentation

3.     Parts of speed (POS) tagging

4.     Named entity extraction (NER)

5.     Concept categorization

6.     Sentiment Analysis

10.BI Integration

Most BI tools that provide support for JDBC/ODBC based connectors will work with momentum. The following BI tools have been tested to work with momentum.

1.     Tableau

2.     Qlik

3.     Pentaho

4.     Jasper

5.     Micro Strategy

6.     SpagoBI

11.Java Programming Interface

Momentum allows Java programmers to develop custom analytics by implementing one single Java Interface. The following is the details of the interface.

Interface name: AccureProcessor

Return type: Map

Method name: process(input String, delimiter String)

input: This is a record of the input data set

delimiter: This is an optional delimiter to allow programmers to split delimited input records (such as csv, and tsv)

12.R Integration

Momentum allows statistical and mathematical programming through a web interface that runs R program behind the scene.

13. Apache Pig Integration

Programmers can write Pig Latin code and execute Pig programs directly from the web browser.

14. Cluster Technology

Momentum is a cluster based technology with master-slave architecture as shown in the above diagram.

The workbench that hosts the management console is the main terminal user intercts with to work with Momentum. The Master manages workers and allocates tasks and negotiates resources to run those tasks. Works perform the actual processing tasks in parallel.