Premium Essay

Data Quality Requiremetns

In: Other Topics

Submitted By LeelaRatna
Words 3437
Pages 14
A framework to implement Data Cleaning in
Enterprise Data Warehouse for Robust Data Quality
Kamran Ali

Mubeen Ahmed Warraich

Department of Computer Sciences
Shaheed Zulfikar Ali Bhutto Institute of Science and
Technology
Islamabad, Pakistan. kamranali100@hotmail.com Department of Computer Sciences
Shaheed Zulfikar Ali Bhutto Institute of Science and
Technology
Islamabad, Pakistan. mubeen_warraich@yahoo.com Abstract— every day, every hour, every minute, every second trillion of bytes of data is being generated by enterprises especially in telecom sector. To achieve level best decisions for business profits, access to that data in a well-situated and interactive way is always a dream of business executives and managers. Data warehouse is the only viable solution that can bring that dream into a reality. The enhancement of future endeavors to make decisions depends on the availability of correct information that based on quality of data underlying.
The quality data can only be produced by cleaning data prior to loading into data warehouse. So correctness of data is essential for well-informed and reliable decision making. The framework proposed in this paper implements robust data quality to ensure consistent and correct loading of data into data warehouse that necessary to disciplined, accurate and reliable data analysis, data mining and knowledge discovery.

Significant amounts of time and money are consequently spent on data cleaning, the task of detecting and correcting errors in data. Data cleaning is to deal with the dirty data in data warehouse so as to keep high data quality. The principal of data cleaning is to find and rectify the errors and inconsistencies for the data. 2.

One study estimates this combined cost due to bad data to be over US$ 30 billion in year 2006 alone [5]. As business operations rely more

Similar Documents

Premium Essay

Bsi Case

...used the Oracle Enterprise Data Quality Management suite to create a single, accurate, complete record of each client in just one month. As a result, the accuracy of its customer and corporate data has improved to nearly 100 percent, and BSI can refresh information four times faster. The project began with a simple need: BSI wanted to optimize customer insight by creating master customer records that captured each client’s profile, purchasing history, business relationships, and other attributes in a single view. The goal was to eliminate inaccurate, incomplete, nonstandard, multiformat, and duplicate customer and transactional data from the customer database, which was growing 3 percent to 4 percent each year. Senior officers knew that having complete and accurate data would improve the organization’s ability to segment customers, increase marketing effectiveness, boost sales per customer, and reduce churn. In addition, by standardizing names, dates, and values in corporate and customer records, BSI could improve the performance and productivity of its marketing, sales, and operations teams, improving the match between customer needs and BSI services. The need for standardization also extended to the publications, training documents, tools, and services that BSI sells online. They needed to ensure consistent coding, description, and pricing formats in their electronic catalogue. Always conscious of costs, BSI wanted to complete this master data management project using its...

Words: 445 - Pages: 2

Premium Essay

Data Integrity

...entry of patient data and summarizes nursing research regarding data integrity. The case study examines the impact of consequences of the use of Electronic Health Record (EHR) systems on the quality of care and proposed solutions to address accidental EHR-related mistakes. In this case study every diagnostic procedure was done correctly and promptly for patient`s well-being, while poor and improper entry of the patient’s data led to jeopardizing the integrity of information, and further endangering patient safety or decreasing quality of patient care. One of the methods to ensure data accuracy is to ask the patient to verify data that is collected during admission and assessment processes. This verification may be accomplished through: verbal confirmation, reviewing data by patient on selected computer screen or tablet, and reviewing printouts by client of entered data. Each of these methods has possible problems. Patient may not be English proficient, the atmosphere in the emergency room may be too fast paced for the patient to be able to accurately review the data or patient may have verbal or vision weakness. Usually, electronic health records data entry result in faults due to software design flaws, poor decision support rules, inadequate user training, system performance issues, human error, interruptions by colleagues while submitting patient data, or use of the system in ways not intended by the system developer. Manual review and improving wrong data entry could be...

Words: 667 - Pages: 3

Premium Essay

Quiz

...focus on market niche C) product differentiation D) customer and supplier intimacy 4) Information systems A) pose traditional ethical situations in new manners. B) raise ethical questions primarily related to information rights and obligations. C) raise the same ethical questions created by the industrial revolution. D) raise new ethical questions. 5) Place the following eras of IT infrastructure evolution in order, from earliest to most recent: (1) Cloud Computing Era (2) Client/Server, (3) Enterprise Era, (4) Personal Computer, and (5) Mainframe and Minicomputer. A) 5, 4, 2, 1, 3 B) 4, 5, 3, 2, 1 C) 4, 5, 2, 3, 1 D) 5, 4, 2, 3, 1 6) A DBMS reduces data redundancy and inconsistency by A) enforcing referential integrity. B) minimizing isolated files with repeated data. C) utilizing a data dictionary. D) uncoupling program and data. 7) The Internet is based on which three key technologies? A) Client/server computing, packet switching, and the development of communications standards for linking networks and computers B) TCP/IP, HTML, and HTTP C) TCP/IP, HTTP, and...

Words: 8082 - Pages: 33

Free Essay

Computer

...dog Research 10 Gigabit Ethernet Virtual Data Center Architectures Sponsored by: DELL, INC. The combination of a consolidated data center with a virtualized 10GbE data center can offer your company numerous benefits, including increased adaptability, improved business continuance, and reduced space requirements. Download 10 Tips For Overcoming IT Certification Struggles Sponsored by: SKILLSOFT This resource examines the challenges and benefits of IT certification and includes tips for improving pass rates. Download 10 tips: How to handle 2013 IT challenges Sponsored by: PROJECTPLACE In this resource, CIOs and other IT leaders will find 10 useful tips for how to handle some of 2013's biggest IT challenges, including security issues, consumerization of IT, transparency and collaboration demands from stakeholders, general cost cuts, and mutiple options for cloud-based services. Download 13 Infrastructure Decisions That Result In Poor IT Security Sponsored by: GLOBAL KNOWLEDGE This white paper presents 13 somewhat common infrastructure decisions that can result in poor IT security. It is possible that your organization can improve its security in one or more of these areas. Take the time to assess your current security policy in each of these areas to see if there is room for refinement or improvement. Download 2012 Gartner Magic Quadrant Report Sponsored by: RIVERBED TECHNOLOGY, INC. Riverbed is positioned in the Leaders Quadrant of the 2012 Gartner Magic Quadrant...

Words: 1208 - Pages: 5

Premium Essay

Information Management

...attitude towards changes • Physician and clinical staff level of comfort with technology • Some hospitals already have CPOE OPPORTUNITIES • Desire to significantly reduce medication errors • Interest in standardizing medication ordering processes • Concerns of adequate training and assistance available for all involved clinical staff • Need for improved workflow processes in pharmacy services • Must have a positive impact on hospital efficiency THREATS • Cost of implementing the systems • Must be committed to the CPOE Market • Product maturity – time vendor has been developing CPOE products • Ability to measure hospital efficiency through CPOE data Executive Summary Computerized Physician order Entry (CPOE) will help to reduce medical errors and any adverse drug issue and that would improve the quality of care. We...

Words: 2661 - Pages: 11

Premium Essay

Extract, Transform, and Load Process

...and consolidate data from operational systems and data warehouses, data may be migrated numerous times. How can a health care organization ensure that data quality is maintained and improved using the extract, transform and load (ETL) process? Be sure to support your position with specific examples. In order for a healthcare organization (HCO) to manage its data as an asset one can employee a data warehouse (DW). Data that is used in a DW must be extracted from operational systems. Operational data houses disparate data from multiple source systems that must be integrated prior to loading into DW, e.g. clinical, financial, registration, on-line transaction processing (OLTP), etc. (Anonymous, 2000). Since you shouldn’t directly work with operational data, a working copy of the data will be needed for manipulation without impacting other systems. Extraction, transform, load (ETL) systems will extract from operational systems and create a fixed-in-time snap shot of the data (Miron, 2011). ETL is one of the most challenging and risky steps in quality data management but one that should never be overlooked. The goal of the data extraction process is to bring all source data into a common, consistent format so it can be made ready for loading into the data warehouse. This stage is so crucial to the DW as this is where most of the data is cleaned, as different source systems can have variation in format, different source codes for the same kind of data, invalid characters...

Words: 693 - Pages: 3

Premium Essay

Mister

...Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise. Data Governance practices provide the framework for maintaining company’s investment in their data management activities (MDM, Data Profiling and Data Quality, and Metadata Management). Data Governance provides a mechanism to rationalize and control organization’s collection, storage, analyses and dissemination of its data. Most companies accomplish Data Governance by: * Defining data standards * Creating programs consisting of a governing council, personnel, procedures, and plans to execute data governance policies * Establishing auditing practices to monitor and report on the integrity of data governance activities For many companies, their intellectual property can often be more valuable than their physical assets. Having an effective IT governance strategy in place can protect this intellectual property, reducing the risk of theft and infringement. Data protection, privacy and breach regulations, computer misuse around investigatory powers are part of a complex and often competing range of requirements to which directors must respond. There is increasingly the need for an overarching information security framework that can provide context and coherence to compliance activity worldwide. IT Governance is a key resource for forward-thinking managers and executives at all levels, enabling them to understand how decisions...

Words: 486 - Pages: 2

Premium Essay

Adultration

...The Prevention of Food Adulteration Act & Rules (as on 1.10.2004) CONTENTS PRELIMINARY SECTION 1. 2. 2-A Short title, extent and commencement Definitions Rule of construction PAGE 1 1 10 MISCELLANEOUS SECTION PAGE 14. Manufacturers, distributors and dealers to give warranty 35 14-A Vendor to disclose the name, etc, of the person from whom the article of food was purchased 36 15. Notification of food poisoning 36 16. Penalties 36 16-A Power of Court to try cases summarily 39 17. Offences by companies 43 18. Forfeiture of property 45 19. Defences which may or may not be allowed in prosecutions under this Act 46 20. Cognizance and trial of offences 47 20-A Power of Court to implead manufacturer, etc. 53 20-AA Application of the Probation of Offenders Act, 1958 and Section 360 of the Code of Criminal Procedure 1973. 54 21. Magistrates power to impose enhanced penalties 54 22. Protection of action taken in good faith 54 22-A Power of Central Government to give directions 54 23. Power of Central Government to make rules 55 24. Power of the State Government to make rules 58 25. Repeal and saving 59 THE PREVENTION OF FOOD ADULTERATION RULES, 1955 Part I PRELIMINARY RULE PAGE 1. Short title, extent and commencement 60 2. Definitions 60 vi CENTRAL COMMITTEE FOR FOOD STANDARDS AND CENTRAL FOOD LABORATORY 3. 3-A 4. 5. 6. 7. The Cental Committee for Food Standards Appointement of Secretary and other staff Central Food Laboratory GENERAL PROVISIONS AS TO FOOD Prohibition of import...

Words: 130594 - Pages: 523