This process of Cleansing and Conforming Data change data on its way from source system(s) to the data warehouse and can also be used to identify and record errors about data. The latter information can be used to fix how the source system(s) work(s).
Good quality source data has to do with “Data Quality Culture” and must be initiated at the top of the organization. It is not just a matter of implementing strong validation checks on input screens, because almost no matter how strong these checks are, they can often still be circumvented by the users.
There is a nine-step guide for organizations that wish to improve data quality:
Declare a high level commitment to a data quality culture
Drive process reengineering at the executive level
Spend money to improve the data entry environment
Spend money to improve application integration
Spend money to change how processes work
Promote end-to-end team awareness
Promote interdepartmental cooperation
Publicly celebrate data quality excellence
Continuously measure and improve data quality
Data Cleansing System
The essential job of this system is to find a suitable balance between fixing dirty data and maintaining the data as close as possible to the original data from the source production system. This is a challenge for the Extract, transform, load architect.
The system should offer an architecture that can cleanse data, record quality events and measure/control quality of data in the data warehouse.
A good start is to perform a thorough data profiling analysis that will help define to the required complexity of the data cleansing system and also give an idea of the current data quality in the source system(s).