Premium Essay

Knowledge Discovery in Medical Databases Leveraging Data Mining

In:

Submitted By hajas
Words 35271
Pages 142
Abstract
Abstract

The goal of this master’s thesis is to identify and evaluate data mining algorithms which are commonly implemented in modern Medical Decision Support Systems (MDSS). They are used in various healthcare units all over the world. These institutions store large amounts of medical data. This data may contain relevant medical information hidden in various patterns buried among the records.

Within the research several popular MDSS’s are analysed in order to determine the most common data mining algorithms utilized by them. Three algorithms have been identified:
Naïve Bayes, Multilayer Perceptron and C4.5. Prior to the very analyses the algorithms are calibrated. Several testing configurations are tested in order to determine the best setting for the algorithms. Afterwards, an ultimate comparison of the algorithms orders them with respect to their performance. The evaluation is based on a set of performance metrics. The analyses are conducted in WEKA on five UCI medical datasets: breast cancer, hepatitis, heart disease, dermatology disease, diabetes.

The analyses have shown that it is very difficult to name a single data mining algorithm to be the most suitable for the medical data. The results gained for the algorithms were very similar. However, the final evaluation of the outcomes allowed singling out the Naïve Bayes to be the best classifier for the given domain. It was followed by the Multilayer Perceptron and the C4.5.

Keywords: Naïve Bayes, Multilayer Perceptron, C4.5, medical data mining, medical decision support Chapter 1: Introduction to the Study
Introduction
Thesis Structure
Study Overview
Background of the research
Focus Area & Motivation
Aims and Objectives
Research Problems
Motivation and Challenges
Thesis Outline
Intellectual Challenge
Justification for the Research
Methodology
Conclusion Chapter 1:

Similar Documents

Premium Essay

Data Mining

...Data Mining Introduction to Management Information System 04-73-213 Section 5 Professor Mao March 22, 2011 Group 5: Carol DeBruyn, Jason Rekker, Matt Smith, Mike St. Denis Odette School of Business – The University of Windsor Table of Contents Table of Contents ……………………………………………………………...…….………….. ii Introduction ……………………………………………………………………………………… 1 Data Mining ……………………………………………………………………...……………… 1 Text Mining ……………………………………………………………………...……………… 4 Conclusion ………………………...…………………………………………………………….. 7 References ………………………………………………..……………………………………… 9 Introduction Everyday millions of transactions occur at thousands of businesses. Each transaction provides valuable data to these businesses. This valuable data is then stored in data warehouses and data marts for later reference. This stored data represents a large asset that until the advent of data mining had been largely unexploited. As companies attempt to gain a competitive advantage over each other, new data mining techniques have been developed. The most recent revolution in data mining has resulted in text mining. Prior to text mining, companies could only focus on leveraging their numerical data. Now companies are beginning to benefit from the textual data stored in data warehouses as well. Data Mining Data mining, which is also known as data discovery or knowledge discovery is the procedure that gathers, analyzes and places into perspective useful information. This facilitates the analysis of data from...

Words: 2331 - Pages: 10

Premium Essay

Business Intelligence

...AND ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big data analytics, Web 2.0 Introduction...

Words: 16335 - Pages: 66

Premium Essay

Data Warehousing and Data Mining

...Introduction 2 Assumptions 3 Data Availability 3 Overnight processing window 3 Business sponsor 4 Source system knowledge 4 Significance 5 Data warehouse 6 ETL: (Extract, Transform, Load) 6 Data Mining 6 Data Mining Techniques 7 Data Warehousing 8 Data Mining 8 Technology in Health Care 9 Diseases Analysis 9 Treatment strategies 9 Healthcare Resource Management 10 Customer Relationship Management 10 Recommended Solution 11 Corporate Solution 11 Technological Solution 11 Justification and Conclusion 12 References 14 Health Authority Data (Appendix A) 16 Data Warehousing Implementation (Appendix B) 19 Data Mining Implementation (Appendix B) 22 Technological Scenarios in Health Authorities (Appendix C) 26 Technology Tools 27 Data Management Technology Introduction The amount of information offered to us is literally astonishing, and the worthiness of data as an organizational asset is widely acknowledged. Nonetheless the failure to manage this enormous amount of data, and to swiftly acquire the information that is relevant to any particular question, as the volume of information rises, demonstrates to be a distraction and a liability, rather than an asset. This paradox energies the need for increasingly powerful and flexible data management systems. To achieve efficiency and a great level of productivity out of large and complex datasets, operators need have tools that streamline the tasks of managing the data and extracting valuable...

Words: 8284 - Pages: 34

Premium Essay

Bpcl

...ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big data analytics...

Words: 16335 - Pages: 66

Premium Essay

Bigdata

...4. 4.1 Big Data Introduction In 2004, Wal-Mart claimed to have the largest data warehouse with 500 terabytes storage (equivalent to 50 printed collections of the US Library of Congress). In 2009, eBay storage amounted to eight petabytes (think of 104 years of HD-TV video). Two years later, the Yahoo warehouse totalled 170 petabytes1 (8.5 times of all hard disk drives created in 1995)2. Since the rise of digitisation, enterprises from various verticals have amassed burgeoning amounts of digital data, capturing trillions of bytes of information about their customers, suppliers and operations. Data volume is also growing exponentially due to the explosion of machine-generated data (data records, web-log files, sensor data) and from growing human engagement within the social networks. The growth of data will never stop. According to the 2011 IDC Digital Universe Study, 130 exabytes of data were created and stored in 2005. The amount grew to 1,227 exabytes in 2010 and is projected to grow at 45.2% to 7,910 exabytes in 2015.3 The growth of data constitutes the “Big Data” phenomenon – a technological phenomenon brought about by the rapid rate of data growth and parallel advancements in technology that have given rise to an ecosystem of software and hardware products that are enabling users to analyse this data to produce new and more granular levels of insight. Figure 1: A decade of Digital Universe Growth: Storage in Exabytes Error! Reference source not found.3 1 ...

Words: 22222 - Pages: 89

Free Essay

Title

...Center for US Health System Reform Business Technology Office The ‘big data’ revolution in healthcare Accelerating value and innovation January 2013 Peter Groves Basel Kayyali David Knott Steve Van Kuiken Contents The ‘big data’revolution in healthcare: Accelerating value and innovation 1 Introduction1 Reaching the tipping point: A new view of big data in the healthcare industry  2 Impact of big data on the healthcare system 6 Big data as a source of innovation in healthcare 10 How to sustain the momentum 13 Getting started: Thoughts for senior leaders 17 1 The ‘big data’ revolution in healthcare: Accelerating value and innovation Introduction An era of open information in healthcare is now under way. We have already experienced a decade of progress in digitizing medical records, as pharmaceutical companies and other organizations aggregate years of research and development data in electronic databases. The federal government and other public stakeholders have also accelerated the move toward transparency by making decades of stored data usable, searchable, and actionable by the healthcare sector as a whole. Together, these increases in data liquidity have brought the industry to the tipping point. Healthcare stakeholders now have access to promising new threads of knowledge. This information is a form of “big data,” so called not only for its sheer volume but for its complexity, diversity, and timeliness...

Words: 9757 - Pages: 40

Premium Essay

Information Technology

...relationships with customers and business partners. Nothing about business or corporate strategy is untouched by IT. Information technology is used in a wide variety of business organizations like Wal-Mart, Galeries Lafayette. The IT has also been applied to optimize police departments’ performance to reduce crime. The following points illustrate the use of IT to optimize police departments’ performance to reduce crime.     • It stores the data of the previous crimes in a single location for easy access. Whereas with street patrolling accessing of data regarding previous crimes takes some extra efforts as the data is not in a single location.     • We can apply certain logics and calculations on the collected data to come up with some predictions. With street patrolling, based on the previous data and experience we come up with some predictions     • The output of such a prediction is a report that gives the location and time of   where the crime will occur. With street patrolling no such reports are available and the prediction is made on the data available and experience.     • As new crimes occur, they are updated so as to produce an accurate report for the future crimes. With street patrolling no such reports guide for future analysis.     • They use Geographical Information System (GIS) to map the location of where the crime occurs. This feature is not available with the street patrolling.     • Using the GIS and the...

Words: 10995 - Pages: 44

Premium Essay

Daimler-Chrysler Merger Portrayal

...Knowledge Management Tools and Techniques Practitioners and Experts Evaluate KM Solutions This page intentionally left blank Knowledge Management Tools and Techniques Practitioners and Experts Evaluate KM Solutions Edited by Madanmohan Rao AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Butterworth-Heinemann is an imprint of Elsevier Elsevier Butterworth–Heinemann 200 Wheeler Road, Burlington, MA 01803, USA Linacre House, Jordan Hill, Oxford OX2 8DP, UK Copyright © 2005, Elsevier Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier.com.uk. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Customer Support” and then “Obtaining Permissions.” Recognizing the importance of preserving what has been written, Elsevier prints its books on acid-free paper whenever possible. Library of Congress Cataloging-in-Publication Data Rao, Madanmohan. KM tools and techniques : practitioners and experts evaluate KM solutions / Madanmohan Rao. p. cm. Includes...

Words: 182966 - Pages: 732

Free Essay

Mcda Analysis

...McKinsey Global Institute June 2011 Big data: The next frontier for innovation, competition, and productivity The McKinsey Global Institute The McKinsey Global Institute (MGI), established in 1990, is McKinsey & Company’s business and economics research arm. MGI’s mission is to help leaders in the commercial, public, and social sectors develop a deeper understanding of the evolution of the global economy and to provide a fact base that contributes to decision making on critical management and policy issues. MGI research combines two disciplines: economics and management. Economists often have limited access to the practical problems facing senior managers, while senior managers often lack the time and incentive to look beyond their own industry to the larger issues of the global economy. By integrating these perspectives, MGI is able to gain insights into the microeconomic underpinnings of the long-term macroeconomic trends affecting business strategy and policy making. For nearly two decades, MGI has utilized this “micro-to-macro” approach in research covering more than 20 countries and 30 industry sectors. MGI’s current research agenda focuses on three broad areas: productivity, competitiveness, and growth; the evolution of global financial markets; and the economic impact of technology. Recent research has examined a program of reform to bolster growth and renewal in Europe and the United States through accelerated productivity growth; Africa’s economic potential;...

Words: 60035 - Pages: 241

Free Essay

Big Data

...McKinsey Global Institute June 2011 Big data: The next frontier for innovation, competition, and productivity The McKinsey Global Institute The McKinsey Global Institute (MGI), established in 1990, is McKinsey & Company’s business and economics research arm. MGI’s mission is to help leaders in the commercial, public, and social sectors develop a deeper understanding of the evolution of the global economy and to provide a fact base that contributes to decision making on critical management and policy issues. MGI research combines two disciplines: economics and management. Economists often have limited access to the practical problems facing senior managers, while senior managers often lack the time and incentive to look beyond their own industry to the larger issues of the global economy. By integrating these perspectives, MGI is able to gain insights into the microeconomic underpinnings of the long-term macroeconomic trends affecting business strategy and policy making. For nearly two decades, MGI has utilized this “micro-to-macro” approach in research covering more than 20 countries and 30 industry sectors. MGI’s current research agenda focuses on three broad areas: productivity, competitiveness, and growth; the evolution of global financial markets; and the economic impact of technology. Recent research has examined a program of reform to bolster growth and renewal in Europe and the United States through accelerated productivity growth; Africa’s economic potential;...

Words: 60035 - Pages: 241

Premium Essay

Human Resoucre

...SYSTEMS………………………………………….6 2. Strategic role of information systems…………………..21 3. Information systems in organizations…………………..26 4. Computer and information processing…………………42 5. Managing data resources………………………………………..60 6. Networking and information systems…………………..81 7. Systems development…………………………………………………90 8. Implementation of information systems……………….97 9. Managing knowledge……………………………………………….106 10. Decision support systems………………………………………….129 THE STRUCTURE OF THIS STUDY MODULE The Module has margin icons that show the student the objectives, activities, in-text questions, feedback, further reading, key words and terms, stop and reflex signs. Chapter One covers the importance of Information Systems in running today’s organizations. Chapter Two looks at the strategic role played by information systems in today’s organizations. Chapter Three focuses on the impact of Information Systems on the organizational structure and how information systems help managers improve their decision making. Chapter Four looks at the hardware and software requirements for organizations to be able to implement information systems structures Chapter Five looks at the traditional file environments and the rise of the database management systems. Chapter 6 shows looks at networks and how they make information systems a reality. Chapter 7 focuses on Systems Development in the creation of Information Systems in today’s organizations. ...

Words: 43854 - Pages: 176

Premium Essay

E-Health

...Athens Information Technology Master in Management of Business, Innovation & Technology (MBIT) Management Information Systems E-Health in Greece compared to EU/US and the impact of Big Data in healthcare Prepared by: Athina Klaoudatou Christos Panagiotou Abstract The aim of this report is to describe the eHealth market. The focus is the Greek business landscape, current trends in the market, industry growth, drivers, and restraints, the technologies and the players in various aspects of the field. Data are presented about the evolution of the market and there are descriptions of what Greek companies offer. Moreover implementation measures are presented, along with progress achieved with respect to national and regional eHealth solutions in EU and EEA Member States. Table of Contents 1. The National Health System 1 1.1. Organizational structure 1 1.2. Some facts & figures 1 2. What is eHealth, definitions, areas of application, benefits 5 2.1. What is eHealth 5 2.2. Forms of eHealth 5 2.3. Benefits of eHealth 6 3. eHealth framework in European Union countries 7 3.1. eHealth Action Plan 2012 - 2020 7 3.2. eHealth in the European Countries 8 4. Application of eHealth practices 10 4.1. Electronic Health records (EHR) 10 4.1.1. Examples of current EHR use 10 4.1.2. Electronic Health Record in Greece 12 4.1.3. Summing up 14 4.2. Interoperability 15 4.2.1. Defining Interoperability in Healthcare Systems 15 4.2...

Words: 36524 - Pages: 147

Free Essay

Empoyment

...THE FUTURE OF EMPLOYMENT: HOW SUSCEPTIBLE ARE JOBS TO COMPUTERISATION?∗ Carl Benedikt Frey† and Michael A. Osborne‡ September 17, 2013 . Abstract We examine how susceptible jobs are to computerisation. To assess this, we begin by implementing a novel methodology to estimate the probability of computerisation for 702 detailed occupations, using a Gaussian process classifier. Based on these estimates, we examine expected impacts of future computerisation on US labour market outcomes, with the primary objective of analysing the number of jobs at risk and the relationship between an occupation’s probability of computerisation, wages and educational attainment. According to our estimates, about 47 percent of total US employment is at risk. We further provide evidence that wages and educational attainment exhibit a strong negative relationship with an occupation’s probability of computerisation. Keywords: Occupational Choice, Technological Change, Wage Inequality, Employment, Skill Demand JEL Classification: E24, J24, J31, J62, O33. We thank the Oxford University Engineering Sciences Department and the Oxford Martin Programme on the Impacts of Future Technology for hosting the “Machines and Employment” Workshop. We are indebted to Stuart Armstrong, Nick Bostrom, Eris Chinellato, Mark Cummins, Daniel Dewey, David Dorn, Alex Flint, Claudia Goldin, John Muellbauer, Vincent Mueller, Paul Newman, Seán Ó hÉigeartaigh, Anders Sandberg, Murray Shanahan, and Keith ...

Words: 26582 - Pages: 107

Premium Essay

Managing Information Technology (7th Edition)

...CONTENTS: CASE STUDIES CASE STUDY 1 Midsouth Chamber of Commerce (A): The Role of the Operating Manager in Information Systems CASE STUDY I-1 IMT Custom Machine Company, Inc.: Selection of an Information Technology Platform CASE STUDY I-2 VoIP2.biz, Inc.: Deciding on the Next Steps for a VoIP Supplier CASE STUDY I-3 The VoIP Adoption at Butler University CASE STUDY I-4 Supporting Mobile Health Clinics: The Children’s Health Fund of New York City CASE STUDY I-5 Data Governance at InsuraCorp CASE STUDY I-6 H.H. Gregg’s Appliances, Inc.: Deciding on a New Information Technology Platform CASE STUDY I-7 Midsouth Chamber of Commerce (B): Cleaning Up an Information Systems Debacle CASE STUDY II-1 Vendor-Managed Inventory at NIBCO CASE STUDY II-2 Real-Time Business Intelligence at Continental Airlines CASE STUDY II-3 Norfolk Southern Railway: The Business Intelligence Journey CASE STUDY II-4 Mining Data to Increase State Tax Revenues in California CASE STUDY II-5 The Cliptomania™ Web Store: An E-Tailing Start-up Survival Story CASE STUDY II-6 Rock Island Chocolate Company, Inc.: Building a Social Networking Strategy CASE STUDY III-1 Managing a Systems Development Project at Consumer and Industrial Products, Inc. CASE STUDY III-2 A Make-or-Buy Decision at Baxter Manufacturing Company CASE STUDY III-3 ERP Purchase Decision at Benton Manufacturing Company, Inc. CASE STUDY III-4 ...

Words: 239887 - Pages: 960

Premium Essay

Industry

...MarketLine Industry Profile Global Data Processing & Outsourced Services June 2011 Reference Code: 0199-2312 Publication Date: June 2011 WWW.MARKETLINE.COM MARKETLINE. THIS PROFILE IS A LICENSED PRODUCT AND IS NOT TO BE PHOTOCOPIED Global - Data Processing & Outsourced Services © MARKETLINE THIS PROFILE IS A LICENSED PRODUCT AND IS NOT TO BE PHOTOCOPIED 0199 - 2312 - 2010 Page | 1 EXECUTIVE SUMMARY Market value The global data processing & outsourced services market grew by 3.2% in 2010 to reach a value of $601.1 billion. Market value forecast In 2015, the global data processing & outsourced services market is forecast to have a value of $854.5 billion, an increase of 42.2% since 2010. Category segmentation IT outsourcing (ITO) is the largest segment of the global data processing & outsourced services market, accounting for 48.6% of the market's total value. Geography segmentation Americas accounts for 44.2% of the global data processing & outsourced services market value. Market share IBM is the leading player in the global data processing & outsourced services market, generating a 3.7% share of the market's value. Market rivalry The market is fragmented despite the presence of large, international incumbents, who together account for 10.2% of global revenues. Global - Data Processing & Outsourced Services © MARKETLINE THIS PROFILE IS A LICENSED PRODUCT AND IS NOT TO BE PHOTOCOPIED 0199 - 2312 - 2010 Page | 2 TABLE OF CONTENTS ...

Words: 9838 - Pages: 40