User:Scholarlycoffee/Extract, Transform, Load (ETL) on cloud

The growing popularity of cloud services and companies offering various services such as PaaS, IaaS, and SaaS have been witnessed in the past decade. The key players in the market today, such as Amazon, Microsoft, and Google, offer a range of services to different industries and charge them according to usage. This difference in pricing and billing compared to traditional on-premise systems is another factor that drives many companies to adopt cloud computing. The process of performing ETL and integrating data from new and emerging sources could be relatively easily achieved in cloud infrastructure these days. This page will address the specific topic of ETL, comparing traditional and cloud approaches and the future of data integration and analytics.

Introduction
The process of connecting to different types of data sources and extracting the data from them, transforming that data according to business rules, and finally loading them into a warehouse can be summed up into ETL (Extract, Transform, and load). The different sources would be OLTP databases, flat files, cloud and spreadsheets, etc. The purpose of integrating these many sources of data is to help decision-makers in getting an overview of business and make data-driven decisions for the future. The ETL tools available in the market are a one-stop shop for performing all the functionalities required for creating a warehouse and also accomplishing other data projects such as data conversion, data migration, data cleaning, data governance, and data quality.

According to, the different models of ETL processes could be mapping expressions and guidelines, conceptual constructs, and unified modeling language (UML) environment. Common examples of ETL tools are Informatica PowerCenter, IBM DataStage, Talend, ab initio, etc.

Traditional ETL vs on cloud ETL
The traditional ETL job that was the backbone of the data warehousing process needs to be moved to cloud infrastructure after the enterprise migrates from an On-premise system to a cloud-based system. For example, the traditional ETL tools such as Informatica PowerCenter and IBM DataStage were used by many companies to perform ETL on their transaction data, which could be converted to AWS Glue jobs once the company adopts the cloud. One another aspect of looking into ETL on the cloud is handling big data projects. Diouf et.al (2018) Summaries the various scenarios and usage of ETL on Big data on the cloud. One can easily relate to the growing influence of streaming data and the necessity to bring that data for performing different types of analytics, such as predictive and prescriptive. Liu et.al (2014), had provided a similar implementation of Big Data in Hadoop and a data warehouse in Hive using a tool called CloudETL. Diouf et.al (2018) have referred to the importance of parallel processing in incorporating scalability for large-scale databases or big data. Moreover, the parallelization techniques have been realized quite efficiently in various ETL tools, both on-premise and on-cloud.

Article body
References