User:Exclusivefred/The Essence of Data Management in Business Intelligence

THE ESSENCE OF DATA MANAGEMENT IN BUSINESS INTELLIGENCE

Frederick Osei fros@itu.dk IT University of Copenhagen

Abstract

As business intelligence graduate who has read some books and articles on BI, including Kimball Lifecycle Toolkit, I have come to the conclusion that the main back bone of BI is data management.

Data management ensures data integrity and availability through methodologies such as data warehousing, cleansing, profiling, stewardship, modeling and definition. Effective business decisions rely on data accuracy and reliability.

Introduction

High-quality data is essential to a company's ability to understand its customers. Customer data that is riddled with errors (e.g., incorrect addresses or other personal information, misspelled customer names, etc.) or is inconsistent (data lacking a single, standardized format), redundant (multiple records for the same customer), or outdated will undermine a company's ability to understand its customers. After all, how can a company understand a customer if it doesn't know where the customer lives or how to spell the customer's name? If a company cannot understand its customers, then it will have problems serving its customers according to the customer's needs, preferences, goals, and the like.

Equally important, companies will have limited success up-selling and cross-selling to customers without having accurate and up-to-date customer information at their fingertips. They will have difficulties distinguishing high-value customers and segmenting customers for promotions and campaigns. Moreover, the absence of good quality data will increase the costs of obtaining and retaining customers. If a company has two records for a single customer, the costs of sending a promotion to that customer will double, while the duplicated mailing itself could irritate the customer and cost the company customer loyalty and goodwill.

The same holds true with data warehouses, data marts, data repositories, and so forth. Each is dependent on the quality of the data that populates it. If poor or inaccurate data populates a data warehouse, then poor or inaccurate information is given back. There is an acronym that has been in circulation for decades that sums up this very state of affairs - GIGO, as in "Garbage In, Garbage Out."

What is Data Quality?

A good theoretical starting point is the set of definitions from Wang and Strong's study of data quality dimensions (Wang and Strong, 1996).

Dimensions of Data Quality

1. ACCESS SECURITY - data cannot be accessed by competitors, data are of a proprietary nature, access to data can be restricted, secure

2. FLEXIBILITY - adaptable, flexible, extendable, expandable

3. ACCESSIBILITY - accessible, retrievable, speed of access, available, up-to-date

4. INTERPRETABILITY - interpretable

5. ACCURACY - data are certified error-free, accurate, correct, flawless, reliable, errors can be easily identified, the integrity of the data, precise

6. OBJECTIVITY - unbiased, objective

7. APPROPRIATE AMOUNT OF DATA - the amount of data is appropriate to the task at hand

8. RELEVANCY - applicable, relevant, interesting, usable

9. BELIEVABILITY - believable

10.REPRESENTATIONAL CONSISTENCY - data are continuously represented in the same format, are consistently represented, consistently formatted, data are compatible with previous data

11.COMPLETENESS - the breath, depth and scope of information contained in the data

12.REPUTATION - the reputation of the data source, the reputation of the data

13.CONCISE - well-presented, concise, compactly represented, well-organized, aesthetically pleasing, form of presentation, well formatted, format of the data

14.TIMELINESS -age of data

15.COST EFFECTIVENESS - cost of data accuracy, cost of data collection, cost effective

16.TRACE-ABILITY - well-documented, easily traced, verifiable

17.EASE OF OPERATION - easily joined, easily changed, easily updated, easily downloaded/uploaded, data can be used for multiple purposes, ‘manipulatable’, easily aggregated, easily reproduced, easily integrated, easily customize

18.VALUE ADDED - data provide competitive advantage, data add value to operations

19.EASE OF UNDERSTANDING - easily understood, clear, readable

20.VARIETY OF DATA AND DATA SOURCES - a variety of data and data sources are available

Data Democracy

Data democracy is very important in data management because when data is circulated between some few people in an organization and others who needed to work with that data are denied access, it becomes extremely difficult for people to coordinate in the management of that data.

Data variations

Data variation is one of the dangerous weapons that affect every organization quest to increase productivity. Without proper data management, a marketing department in an organization will claim they are doing well in their section and the indication the finance department in that same organization is having says otherwise.

Data must be available to those who need access to it and it must be managed well. The final data must be also made available to all decision makers in the organization, so that everyone will know and accept if the organization is doing well or not.

Conclusion

Historically, businesses have often failed to pay sufficient attention to the issue of data quality. This has impacted the returns in investment in information systems and the effectiveness of mergers and acquisitions. Supply chains and distribution channels configured to optimize e-business and customer service tend to produce fragmented information architectures that place a premium on building an intelligent data architecture and on maintaining high data quality. Improving data quality reduces business costs and increases revenue. A systematic approach to the problem must address the data architecture and the targeted deployment of technologies that enhance data quality. As the sheer amount of information available for Business Intelligence (BI) applications has grown, and the sophistication of BI capabilities has accelerated, organizations are striving to improve time-to-information for business users. Data storage and retrieval decisions have started to be made based on cost rather than fulfilling a business need, and this has led to compromises that impact business performance.

References

•	The Data Warehouse Lifecycle Toolkit by Ralph Kimball

•	PLATON - Master Data Management : www.platon.net

•	Quality Data (Wang and Strong, 1996)