User:OlenaSherbinin/sandbox

Toolsverse ETL Framework is a Java-based, open-source ETL (Extract-Transform-Load) engine with plugable database drivers. It is available with a source code under LGPL license. It was originally developed as a database migration tool. Currently ETL Framework is a part of the suite which includes command line, desktop and Web tools as well as server components. The core is embeddable and fairly compact.

Implementation details
The ETL scenario which describes extract-transfer-load process is written in the declarative XML-based language. During runtime, engine translates ETL scenario into database specific code (for example PLSQL for Oracle) and executes in the very efficient manner. The pluggable database and metadata discovery drivers is a middleware between translation layer and runtime. Data sources can be streamed to destinations (with transformation in between if required). The high-level transformation algorithms (for example de-normalization and removing duplicates) are included and and it is possible to add new in form of pluggbale zero-configuration modules. ETL framework supports parallelism on all levels: ETL scenarios can be executed in parallel, sources can be extracted in parallel, data can be loaded into destinations in parallel, etc.

Main features
Connectivity
 * JDBC
 * ODBC (Windows only)
 * Excel ODBC (Windows Only)
 * Excel (*.xls)
 * Excel (*.xlsx)
 * Delimited text files
 * Fixed length text files
 * XML
 * XML with XSL transformation

Scalability
 * ETL code translated into highly efficient database-specific code (PL/SQL for Oracle, T-SQL for MS SQL Server, etc)
 * Streaming allows running on very limited memory footprint
 * ETL framework supports parallelism on all levels: scenarios can be executed in parallel, sources can be extracted in parallel, data can be loaded into destinations in parallel, etc

Extensibility
 * Core components of the ETL Framework are free and open source
 * All essential modules implemented as drop-in, zero-configuration plugins
 * Source code and JavaDoc for the core components are available for download

ETL engine
 * XML based scenario language
 * Extract data from multiple sources and load into multiple destinations with manual and automatic field mapping
 * All connectivity options supported (jdbc, XML, XML transformation, text, Excel)
 * Stream unlimited data sets from the source to destination
 * All data types supported including CLOBs and BLOBs with automatic (or manual) conversion between source and destination databases (data sources)
 * Extract and Load each data set in parallel with forks and joins
 * Execute multiple ETL scenarios in parallel
 * Inner scenarios with conditional and in-loop execution
 * Automatic table creation based on the source data set specification
 * Per field functions in SQL and JavaScript
 * Support for automatic primary/foreign key generation with mapping to old primary/foreign key
 * Validation using JavaScript
 * Conditional sources and destinations
 * Conditional (IF-THEN-ELSE) execution
 * Automatic exception handling
 * Automatic Insert/Update/Delete/Merge
 * In-line SQL in scenarios
 * Transformation using JavaScript
 * Matrix transformations
 * Regex transformation
 * XSL transformation
 * Pre/post/inline extract and load tasks
 * OS command execution
 * File based tasks (file system, ftp and sftp supported)

Oracle specific ETL functionality:
 * Use sequences to generate primary keys
 * Full PLSQL support including anonymous PLSQL blocks, inner functions, procedures, named variables, etc.
 * Use cursors as data sources
 * Extract using SQL*plus and load using SQL*loader (requires Oracle client)
 * Table copy using SQL*plus COPY command (requires Oracle client)
 * Support for MERGE, exception handling, date+time conversion, temporary tables

DB2 specific ETL functionality:
 * Use sequences and auto-increment fields to generate primary keys
 * Full SQL PL support including functions, procedures, named variables, etc.
 * Use cursors as data sources
 * Extract and load using SYSPROC.ADMIN_CMD
 * Support for MERGE, exception handling, date+time conversion, temporary tables

MS SQL Server specific ETL functionality:
 * Use auto-increment fields to generate primary keys
 * Full Transact SQL support including functions, procedures, named variables, etc.
 * Use cursors as data sources
 * Extract and load using BCP (requires Ms SQL server client)
 * Support for exception handling, date+time conversion, temporary tables

Sybase specific ETL functionality:
 * Use auto-increment fields to generate primary keys
 * Full T-SQL support including functions, procedures, named variables, etc.
 * Use cursors as data sources
 * Extract and load using BCP (requires Sybase Adaptive Server client)
 * Support for exception handling, date+time conversion, temporary tables

MySQL specific ETL functionality:
 * Use auto-increment fields to generate primary keys
 * Full MySql stored procedure language support including functions, procedures, named variables, etc.
 * Use cursors as data sources
 * Extract using select INTO OUTFILE and Load using LOAD DATA
 * Support for exception handling, date+time conversion, temporary tables

Informix specific ETL functionality:
 * Use sequences and serial fields to generate primary keys
 * Full SPL support including functions, procedures, named variables, etc.
 * Use cursors as data sources
 * Extract and load using DBACCESS (requires Informix client)
 * Support for MERGE, exception handling, date+time conversion, temporary tables

PostgreSQL specific ETL functionality:
 * Use sequences and serial fields to generate primary keys
 * Full PL/pgSQL support including functions, named variables, etc.
 * Use cursors as data sources
 * Extract and Load using COPY
 * Support for exception handling, date+time conversion, temporary tables

Other open-source Java ETL frameworks

 * Pentaho;
 * Talend Open Studio;
 * Scriptella;
 * JasperETL from JasperSoft;
 * CloverETL.
 * Apatar