User:Ganesh kalyan 2105

= Standard Naming Convention for Tables, Workflows = Adherence to consistent use of best practices in naming conventions positively impacts clarity and maintainability.

Naming Tables

Flow for data table name in vertica
ods --> BU --> table type (dim/ fact) --> dataset name

Example: ods_crf_fct_pos_payment

Informatica naming convention
Tables and views should have singular names, not plural Incorrect: rollup_types Correct: rollup_type Ventures Points to remember:-


 * ONL- Data sources from the online transaction (VOX)
 * OPS- Data sources from operational destination (VOX- counter)

Flow for application name in Informatica
app --> grid --> source name

Flow for Parameter Sets in Informatica
ps --> subgroup name- --> source name --> dim or fact --> table name

Example:-ps_ven_vox_dim_item

ps_rtl_crf_dim_lookup

ps_pro_ecp_dim_mall

Examples of Workflows:-


 * wf_ master
 * wf_full_ingestion
 * wf_raw_to_stg
 * wf_stg_to_ods
 * wf_s3_to_landing

Database Code Formatting

 * Use upper case for all SQL keywords (including data types):
 * SELECT, INSERT, UPDATE, WHERE, AND, OR, LIKE, INT, etc."
 * Indent code to improve readability.
 * Comment code blocks that are not easily understandable. Use single-line comment markers (–) and reserve multi-line comments (/*.. ..*/) for blocking out sections of code.
 * Use one blank line to separate code sections.
 * Use spaces so that expressions read like sentences:
 * Incorrect: fillfactor=25
 * Correct  : fillfactor = 25

Naming Views
Prefix view name with v__

Example:-v_ dim_item

Naming Stored Procedures and Functions
Prefix view    name with p_ for a stored procedure    and    f_for a function

Example:-p_get_all_dimensions

f_calculate_target

Naming for HDFS
Format: Opco_BU_TableType_DatasetName

Example:

Parameter Sets


 * A parameter set is an object in the Model repository that contains a set of parameters and parameter values to run mappings and workflows.
 * When you create a parameter set, you choose a mapping or workflow to use the parameters.
 * After you choose a mapping or workflow, you can manually enter parameters in the parameter set or you can select parameters that are already in the repository for the mapping or the workflow.
 * You can use parameter sets for different situations. For example, you might use a specific parameter set when you run a workflow in a test environment.
 * You use a parameter set with a mapping, Mapping task, or workflow.
 * You can add one or more parameter sets to an application when you deploy the application. You can add a parameter set to multiple applications and deploy them.
 * To use a parameter set with a workflow or mapping, you must add the parameter set to the application when you deploy the workflow or mapping.

Control File


 * The Data Integration Service accesses control files when it runs mappings that generate columns for flat file sources based on control files. When the Data Integration Service runs the mapping, it fetches metadata from the control file of the flat file source.
 * We create Control File to make schema for staging area and for Raw.

= Quality Checks =


 * The parameter set name, dataset name and control files names should be same everywhere
 * While Scheduling, all the workflows and parameters should included in the workflow
 * App name should be same everywhere
 * Hash prefix should be same as in the control file
 * Infasvc should have proper folder structure
 * Order of the hash id for each table should be same everytime even if multiple injestions are done, provided if we are using copy commands the same should be reflected
 * Date time format and date format should be always checked with the source

= Manual ingestion Script Without using Informatica (used for S3 to Raw) =


 * Dataset name– Name of the dataset to be ingested
 * Processed Date– it will create a directory with processing date in raw which will help us in differentiating the data which is either processed or needs to be processed.
 * File Regex– it basically matches the regular expression in the directory in either of cases, if we need to pick either one specific file or set of files having the same expression.
 * Convert UTF– as we receive different types of file (csv,txt etc etc), to convert it to a unified text format , we use UTF
 * Character encoding– it is used to encode special character to UTF format

= Informatica Workflow =


 * 1) Link for the Informatica Workflow
 * 2) Link for the required ETL BPFs

= Profiling on Informatica Analyst = Informatica Analyst is a web-based application client that analysts can use to analyze, cleanse, standardize, profile, and score data in an enterprise. You can perform column and rule profiling, score carding, and bad record and duplicate record management.

Use the Analyst tool to accomplish the following tasks:


 * Run profiles. Create and run profiles to analyze the structure and content of enterprise data and identify strengths and weaknesses. After you run a profile, you can view the rows from the profile results. You can also add profiled columns to scorecards and add column values to reference tables.
 * Create rules in profiles. Create and apply rules within profiles. A rule is reusable business logic that defines conditions applied to data when you run a profile. Use rules to further validate the data in a profile and to measure data quality progress.
 * Score data. Create scorecards to score the valid values for any column or the output of rules. Scorecards display the value frequency for columns in a profile as scores. Use scorecards to measure and visually represent data quality progress. You can also view trend charts to view the history of scores over time.
 * Manage reference data. Create and update reference tables that analysts and developers use in data quality standardization and validation rules. Create, edit, and import data quality dictionary files as reference tables. Create reference tables to establish relationships between source data and valid and standard values. Developers use reference tables in Standardizer and Lookup transformations in the Developer tool.
 * Manage bad records and duplicate records. Filter records, edit records, and set record status. Merge groups of duplicate records from a consolidated record table into a single record.

Below are the steps to create a Profile in Analyst

 * Setting up Informatica Analyst
 * Creating Data Objects
 * Viewing Data Objects
 * Creating Custom Profiles
 * Setting up Informatica Analyst ==


 * 1) Login to jump servers where Informatica is hosted
 * 2) Start a Google Chrome browser. In the Address field, enter the URL for Informatica Analyst: http://air1ahciapvn395:8085/analyst/
 * 3) Login using credentials  || username = CA || password = ca ||
 * 4) The Analyst tool opens on the Start workspace.


 * 1) Creating Data Objects

A table object contains the metadata for a relational database source in the Analyst tool. Use tables to profile source data. When you add a table, the Analyst tool uses a database connection to connect to the source database to extract metadata.


 * 1) In the Analyst tool, click New > Table Data Object. The Add Tables wizard appears
 * 2) Select the VerticaAWS Connection as shown in the picture below
 * 3) Click Next. Unselect Show Default Schema Only to show all schemas associated with the selected connection
 * 4) Select the table that you want to add or enter a table name in the search box and click Go to search by table name
 * 5) Click the Properties View to view the properties and column metadata for the table. Or, click the Data Preview View to view the columns and data for the table.
 * 6) Click Next. The wizard displays the table to add to your folder or project.
 * 7) Click Finish.

View the Data Object Properties

 * 1) Click Open to open the Library workspace.
 * 2) In the Library workspace, click Data Objects in the Assets panel. A list of data objects appears in the Data Objects panel.
 * 3) Click on the required data object. The Data Preview panel appears with the data retrieved from the data object.
 * 4) The Analyst tool displays the first 100 rows of the flat file data object.
 * 5) Click Properties. The Properties panel displays the name, type, description, and location of the data object. You can also see the column names and column properties for the data object.

Create Custom Profiles on the Table Data Objects

 * 1) In the Navigator, select the project (preferably MuSigmaGrid_Dev) where you want to create the profile
 * 2) Click Actions > New Profile. The New Profile Wizard appears and offers the option to create a custom profile. Click Next.
 * 3) Enter a name (Naming Convention : PF_*name*_PF) and a description for the profile.
 * 4) In the Folders panel, select the folder where you want to create the profile. The Analyst tool displays the project that you selected and shared projects that contain folders where you can create the profile. The profile objects in the folder appear in the Profiles panel. Click Next.
 * 5) In the Sources panel, select the data object that you want to create a profile for. The Columns panel displays the columns and datatype of the columns in the data object. Click Next.
 * 6) In the Select Columns panel, select the columns you want to profile. The columns include any rules you applied to the profile. The Analyst tool lists the name, datatype, precision, and scale for each column. Optionally, select Name to include all columns.
 * 7) In the Sampling Options panel, configure the sampling options.
 * 8) In the Drilldown Options panel, configure the drilldown options.
 * 9) Click Save to create the profile or click Save & Run to create the profile and then run the profile.

Once the profile runs, you can export it by clicking on Action > Export Data

= List of Data tables =


 * 1) Link for the Data tables
 * 2) CRF Source Details

= Folder structure =

Raw
raw --> data --> dataset name --> processing_date --> file name

Landing
Landing --> data --> dataset name --> processing_date --> file name

= Significance of connectivity =

ODBC

 * Open Database Connectivity
 * Language independent connectivity system.
 * Faster and compatible with all the database systems.

JDBC

 * Java Database Connectivity
 * Platform independent connectivity system.
 * Slower but compatible with all the database systems.

As ODBC is language independent so it does'nt need to convert SQL queries to any other programming language, but JDBC needs to convert the SQL queries to jave and then process which makes it slower comparitively.