Premium Essay

Data Mining In Computer Science

Submitted By
Words 2594
Pages 11
CHAPTER 2
DATA MINING TECHNIQUE OVERVIEW

2.1 Introduction

In the 21st century as we are moving towards more and more online system, the databases have grown into terabytes. Within this huge data, information of importance needs to be identified. Since the evolution of human life, the people discover patterns. As farmer recognizes pattern of growth in the field, bank recognizes the earning and spending pattern of a customer and politicians seeks pattern in voter opinion. This huge amount of data needs to be used either for business growth or scientific discoveries. The process of discovering the patterns and relationships in data using the analysis tools is called Data Mining. The simplest form of data mining is as follows:

1. Describing
…show more content…
Data Mining helps us in taking appropriate decisions at appropriate time, to increase the profit of business. Data mining is highly related with another important area of research in Computer Science, namely, Machine Learning. Machine Learning is the field of research where machine learns from the past data and takes informed and efficient decisions for future. In number of applications, for example, optical character recognition, one needs to build the past data in the form of training patterns. These training patterns are usually taken in such an efficient way that machine can take an appropriate decision in a situation when a previously unknown pattern presents itself. The training patterns are generally taken in the form of features extracted from data. In case of data mining, creation of these patterns is not generally required as we already have the data from where knowledge is to be discovered. We however, have to be able to extract efficient features from this data, so a decision can be …show more content…
In data mining the technique to solve the problem depends on the type of problem. Some techniques are more suitable than the others in terms of expensive search and prediction error. Classification tree is not suited for the problem with true decision boundaries between the classes. Michalski and Kaufman describes the applicability of machine learning and multi strategy methodology to data mining. The multi strategy is used for conceptual data exploration that is finding out high level concept and description from data. The issue of having noise in the data is one of the challenges [53].

The other challenges are:

1. Learning dataset may or may not represent actual distribution pattern
2. Learning data may be in complete and some of the values of some attributes are unknown or missing
3. Learning set may be in distributed form. it means that learning database is a collection of datasets which are brought together and patterned within them needs to be identified.
4. Learning from the continuous evolving concept. It is seen some of dataset particularly related to the human being such as interest of user in choosing book is a changing over a period of

Similar Documents

Premium Essay

Data

...Data Mining Data mining began with the advent of databases. Databases are warehouses full of computer data. Computer scientists began to realize that this data contains patterns and relationship to other sets of data. As computer technology emerged, data was extracted into useful information. Often, hidden relationships began to appear. Once this data became known and useful, industries grew around data mining. Data mining is a million dollar business aimed at improving marketing, research, criminal apprehension, fraud detection and other applications. History of Data Mining Computers began to be more widely used in the 1960’s. Computers were used to collect and store data. The data was stored on tapes and disks. The companies and organizations began to wonder about the data that was stored. They wanted to know about past sales, past performances and other pertinent information that was stored on these tapes and disks. The next step was to find an accurate way to retrieve the needed information without manually reading all the data. The next step in this quest came in the 1980’s with relational databases and structured queries. Query language could be used to find out more of what was in the data. The companies and organizations could now identify what has happened in the past. They also wanted to know how to apply this knowledge to future predictions based on past performances. In 1989, the first knowledge discovery workshop was held in Detroit (SQL Data Mining,......

Words: 3258 - Pages: 14

Premium Essay

Data Minig

...512 Use of Data Mining in the field of Library and Information Science : An Overview Roopesh K Dwivedi Abstract Data Mining refers to the extraction or “Mining” knowledge from large amount of data or Data Warehouse. To do this extraction data mining combines artificial intelligence, statistical analysis and database management systems to attempt to pull knowledge form stored data. This paper gives an overview of this new emerging technology which provides a road map to the next generation of library. And at the end it is explored that how data mining can be effectively and efficiently used in the field of library and information science and its direct and indirect impact on library administration and services. R P Bajpai Keywords : Data Mining, Data Warehouse, OLAP, KDD, e-Library 0. Introduction An area of research that has seen a recent surge in commercial development is data mining, or knowledge discovery in databases (KDD). Knowledge discovery has been defined as “the non-trivial extraction of implicit, previously unknown, and potentially useful information from data” [1]. To do this extraction data mining combines many different technologies. In addition to artificial intelligence, statistics, and database management system, technologies include data warehousing and on-line analytical processing (OLAP), human computer interaction and data visualization; machine learning (especially inductive learning techniques), knowledge representation, pattern......

Words: 3471 - Pages: 14

Premium Essay

An Evolution of Computer Science Research

...Abbreviated version of this report is published as "Trends in Computer Science Research" Apirak Hoonlor, Boleslaw K. Szymanski and M. Zaki, Communications of the ACM, 56(10), Oct. 2013, pp.74-83 An Evolution of Computer Science Research∗ Apirak Hoonlor, Boleslaw K. Szymanski, Mohammed J. Zaki, and James Thompson Abstract Over the past two decades, Computer Science (CS) has continued to grow as a research field. There are several studies that examine trends and emerging topics in CS research or the impact of papers on the field. In contrast, in this article, we take a closer look at the entire CS research in the past two decades by analyzing the data on publications in the ACM Digital Library and IEEE Xplore, and the grants awarded by the National Science Foundation (NSF). We identify trends, bursty topics, and interesting inter-relationships between NSF awards and CS publications, finding, for example, that if an uncommonly high frequency of a specific topic is observed in publications, the funding for this topic is usually increased. We also analyze CS researchers and communities, finding that only a small fraction of authors attribute their work to the same research area for a long period of time, reflecting for instance the emphasis on novelty (use of new keywords) and typical academic research teams (with core faculty and more rapid turnover of students and postdocs). Finally, our work highlights the dynamic research landscape in CS, with its focus......

Words: 15250 - Pages: 61

Premium Essay

Introduction to Data Mining

...Data Mining D t Mi i Module 1 Introduction to Data Mining Dr. Jason T.L. Wang, Professor Department of Computer Science New Jersey Institute of Technology / Data Management: Its Evolution  1960s: – File management and network DBMS  1970s: – Relational DBMS  1980s: 980s – Non-first normal form, extended-relational, OO, deductive databases and application-oriented DBMS pp (spatial, scientific, CAD/CAM, etc.)  1990s - present: p – Data mining, digital library, and Web databases – Cloud databases, data science, and Big Data Data Mining © Jason Wang 2 Data Mining: Its Definition  Data mining (knowledge discovery in databases): ) – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases  Alternative names: – Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, analysis data archeology, data dredging archeology dredging, information harvesting, etc. Data Mining © Jason Wang 3 Data Mining: A Multidisciplinary Field  Pattern Recognition  Machine Learning  Databases  St ti ti Statistics  Information Visualization Data Mining © Jason Wang 4 Data to be mined  Text databases  Web databases  Scientific and biological databases  Transactional databases Data Mining © Jason Wang 5 Knowledge to be discovered K l d t b di d  Association......

Words: 687 - Pages: 3

Premium Essay

Report on Data Mining

...8/13/14 on Data Mining SAS (Statistical Analysis System) was originally developed as a project to analyze agriculture from 1966-1976 at North Carolina State University. As demand for such software grew, SAS Institute was founded in 1976. SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and they provide more advanced options through the SAS programming language. On August 13 2014, SAS sponsored a web seminar titled “Analytically Speaking” the topic of the webcast was data mining techniques. Michael Berry and Gordon Linoff were the featured speakers, they have written a leading introductory book (on data mining) titled “Data Mining Techniques”. They discussed a lot of the current data mining landscape, including new methods, new types of data and the importance of using the right analysis for your problem (as good analysis is wasted doing the wrong thing). They also briefly discussed using ‘found data’ – text data, social data and device data. Michael Berry is the Business Intelligence Director at TripAdvisor and co-founder of Data Miners Inc. Gordon Linoff is co-founder of Data Miners Inc. and a consultant to financial, media and pharmaceutical companies. Data mining is the analysis step of the “KDD” (Knowledge Discovery in Databases). Data mining is an interdisciplinary sub-field of computer......

Words: 818 - Pages: 4

Premium Essay

Report on Data Mining

...8/13/14 on Data Mining SAS (Statistical Analysis System) was originally developed as a project to analyze agriculture from 1966-1976 at North Carolina State University. As demand for such software grew, SAS Institute was founded in 1976. SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and they provide more advanced options through the SAS programming language. On August 13 2014, SAS sponsored a web seminar titled “Analytically Speaking” the topic of the webcast was data mining techniques. Michael Berry and Gordon Linoff were the featured speakers, they have written a leading introductory book (on data mining) titled “Data Mining Techniques”. They discussed a lot of the current data mining landscape, including new methods, new types of data and the importance of using the right analysis for your problem (as good analysis is wasted doing the wrong thing). They also briefly discussed using ‘found data’ – text data, social data and device data. Michael Berry is the Business Intelligence Director at TripAdvisor and co-founder of Data Miners Inc. Gordon Linoff is co-founder of Data Miners Inc. and a consultant to financial, media and pharmaceutical companies. Data mining is the analysis step of the “KDD” (Knowledge Discovery in Databases). Data mining is an interdisciplinary sub-field of computer......

Words: 818 - Pages: 4

Premium Essay

Disadvantages Of Educational Technology

...positive manner that promotes a more diverse learning environment and a way for students to learn how to use technology as well as their common assignments”. (Benjamin Herold, 2015) defines that “Anything that enhances classroom learning in the utilization of blended or online learning is considered as Education technology” and such technology encompasses Web based Education systems (e-learning, Technology Enhanced Learning (TEL), Internet-based training), Computer-Based Training (CBT), Information and communication technology (ICT) in education, Virtual education, Virtual learning environments (VLE) and Learning Management Systems (LMS). Each of these numerous terms has had its advocates, who point up potential distinctive features. (Moorea,2011). The use of...

Words: 993 - Pages: 4

Free Essay

Crime Investigation

...(Online): 2347 - 4718 DATA MINING TECHNIQUES TO ANALYZE CRIME DATA R. G. Uthra, M. Tech (CS) Bharathidasan University, Trichy, India. Abstract: In data mining, Crime management is an interesting application where it plays an important role in handling of crime data. Crime investigation has very significant role of police system in any country. There had been an enormous increase in the crime in recent years. With rapid popularity of the internet, crime information maintained in web is becoming increasingly rampant. In this paper the data mining techniques are used to analyze the web data. This paper presents detailed study on classification and clustering. Classification is the process of classifying the crime type Clustering is the process of combining data object into groups. The construct of scenario is to extract the attributes and relations in the web page and reconstruct the scenario for crime mining. Key words: Crime data analysis, classification, clustering. I. INTRODUCTION Crime is one of the dangerous factors for any country. Crime analysis is the activity in which analysis is done on crime activities. Today criminals have maximum use of all modern technologies and hi-tech methods in committing crimes. The law enforcers have to effectively meet out challenges of crime control and maintenance of public order. One challenge to law enforcement and intelligence agencies is the difficulty of analyzing large volumes of data involved......

Words: 1699 - Pages: 7

Premium Essay

Why I Want To Study In Computer Science And Technology

...message given by Swami Vivekananda, has been the motto of my life and my school named after this famous personality was instrumental in imbibing this principle in me. After three years of professional life this simple yet meaningful message drives me on to reach my goal. I was deeply inclined towards the field of science and technology, being allured by its sheer vastness and the plethora of knowledge. My desire to pursue graduate studies in the field of Computer Science and Technology is a cumulative result of my interest in it and my strong desire to contribute something meaningful to the society to the best of my abilities. I believe graduate study will refine and sharpen my skills and help me in realizing my goal as a research scholar in an academic or a commercial research oriented organization. The desire to explore and innovate in academic and research-oriented setting, leads me to apply to the Master of Science program in Computer Science & Engineering at the University of North Carolina, Chapel Hill...

Words: 1281 - Pages: 6

Premium Essay

Business Intelligence

...AND ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big data analytics, Web......

Words: 16335 - Pages: 66

Premium Essay

Bpcl

...ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big......

Words: 16335 - Pages: 66

Premium Essay

Analytics

...INTRODUCTION TO BUSINESS ANALYTICS Sumeet Gupta Associate Professor Indian Institute of Management Raipur Outline •  Business Analytics and its Applications •  Analytics using Data Mining Techniques •  Working with R BUSINESS ANALYTICS AND ITS APPLICATIONS What is Business Analytics? Analytics is the use of: data, information technology, statistical analysis, quantitative methods, and mathematical or computer-based models to help managers gain improved insight about their business operations and make better, fact-based decisions. Evolution of Business Analytics? •  Operations research •  Management science •  Business intelligence •  Decision support systems •  Personal computer software Application Areas of Business Analytics •  Management of customer relationships •  Financial and marketing activities •  Supply chain management •  Human resource planning •  Pricing decisions •  Sport team game strategies Why Business Analytics? •  There is a strong relationship of BA with: •  profitability of businesses •  revenue of businesses •  shareholder return •  BA enhances understanding of data •  BA is vital for businesses to remain competitive •  BA enables creation of informative reports Global Warming Poll Winner Sales Revenue Predicting Customer Churn Credit Card Fraud Loan Default Prediction Managing Employee Retention Market Segmentation Medical Imaging Analyzing Tweets stylus ...

Words: 952 - Pages: 4

Free Essay

Computative Reasoning

...Scott Clark Graduate Student, DOE Computational Science Graduate Fellow 657 Rhodes Hall, Ithaca, NY, 14853 September 19, 2011 sc932@cornell.edu cam.cornell.edu/∼sc932 Education Cornell University Ph.D. Applied Math (current), M.S. Computer Science Ithaca, NY 2008 - 2012(projected) • – Department of Energy Computational Science Graduate Fellow (Full Scholarship, 4 years) – Emphasis on machine learning/data mining and algorithm design/software development related to bioinformatics and optimization • Oregon State University B.Sc. Mathematics, B.Sc. Computational Physics, B.Sc. Physics Corvallis, OR 2004 - 2008 – Graduated Magna Cum Laude with minors in Actuarial Sciences and Mathematical Sciences – Strong emphasis on scientific computing, numerical analysis and software development Skills • Development: C/C++, Python, CUDA, JavaScript, Ruby (Rails), Java, FORTRAN, MATLAB • Numerical Analysis: Optimization, Linear Algebra, ODEs, PDEs, Monte Carlo, Computational Physics, Complex Systems, Iterative Methods, Tomology • Computer Science: Machine Learning, Data Mining, Parallel Programming, Data Structures, Artificial Intelligence, Operating Systems • Discovering and implementing new ideas. Give me an API and a problem and I will figure it out. • Diverse background in Math, Computer Science, Physics and Biology allows me to communicate to a wide scientific and general audience and begin contributing to any group immediately. • I have worked in many places in a myriad......

Words: 673 - Pages: 3

Premium Essay

Intro to Data Mining

...Data Mining: Concepts and Techniques (3rd ed.) Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University ©2011 Han, Kamber & Pei. All rights reserved. Adapted for CSE 347-447, Lecture 1b, Spring 2015 1 1 Introduction n  n  n  n  n  n  n  n  n  n  Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technologies Are Used? What Kind of Applications Are Targeted? Major Issues in Data Mining A Brief History of Data Mining and Data Mining Society Summary 2 Why Data Mining? n  The Explosive Growth of Data: from terabytes to petabytes n  Data collection and data availability n  Automated data collection tools, database systems, Web, computerized society n  Major sources of abundant data n  n  n  Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, YouTube n  n  We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets 3 Evolution of Sciences: New Data Science Era n  n  Before 1600: Empirical science 1600-1950s: Theoretical science n  Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our......

Words: 3169 - Pages: 13

Premium Essay

Trends in Information Analysis & Data Management

...Information Analysis and Data Management Trends in Information Analysis and Data Management Over the last decade, advancements in digital technology have enabled companies to collect huge amounts of new information. This data is so large in scope, it has traditionally been difficult to process and analyze this information using standard database management systems such as SQL. The commoditization of computer technology has created a new paradigm in which data can be analyzed more efficiently and effectively than ever before. This report analyzes the some of the most important changes that are currently taking place within this new paradigm. The first part of this report covers trends in database analysis by analyzing the field of data mining. The report covers the topic of data mining by providing an explanation of it, and then by providing examples of real-world examples of data mining technology. Benefits and challenges of data mining are then provided. The second part of the report outlines an even more recent trend in data science, which is the increasing usage of noSQL databases to analyze “big data,” also referred to web-scale datasets. The most recent and major technological developments in the industry are then provided and described. Data Mining Background & Definition Data mining involves the process of discovering and extracting new knowledge from the analysis of large data sets. This is most often done through the use of data mining software, which......

Words: 2546 - Pages: 11