Free Essay

Intro to Data Mining

In:

Submitted By mac2jen
Words 481
Pages 2
Assignment 1: Introduction
Miles A. Cabanos
Data Mining - 10303 CAP4770

Q1: Present an example where data mining is crucial to the success of a business. What data mining function does this business need? Can they be performed alternatively by data query processing or simple statistical analysis?
Best example that I could think of would be Amazon.com. I think one of the functions the business uses is the characterization. This way it can keep track of what type of products customers buy, and then the pop-up windows to suggest similar items. No, I do not think they could be performed by data query processing or simple statistical analysis.
Q2: Define each of the following data mining functionalities: characterization, discrimination, association and correlation analysis, classification, prediction and clustering. Give examples of each data mining functionality, using a real-life database with which you are familiar.
 Characterization: Given data that shares similarities of characteristics and/or other requested specific information. (Example: Data from Amazon.com customers buying superhero books, can then be used to determine what age group buys superhero books. Then can suggest other types of similar books)
 Discrimination: Compares or contrasts data information. (Example: Amazon could find out what customers types buy more superhero book compared to biographies)
 Association and correlation analysis: How certain types of data can be associated with each other. Patterns, relationships, or correlations. (Example: Amazon could also find out what types of items sell together like people who buy cooking with the George Forman grill also buy a George Foreman grill.
 Classification: Data that is can be used to create certain labels for certain classes with certain distinctions or that maybe unknown. (Example: Amazon could create classes of customer of age groups, sex, etc. that buy action or science fiction books)
 Prediction: Data gathered to guess the future of certain outcomes. (Example: Amazon collects data from the past two years of Christmas sales of certain item. They can try to predict sales, amount of items to have on hand in stock.)
 Clustering: Data that may not yet have classifications or labels, raw data gathered together that may share similarities or patterns. (Example: Amazons new data that may not yet be classed, but can be found through clusters of similar attributes.)
Q3: What are the major challenges of mining a huge amount of data (such as billions of tuples) in comparison with mining a small amount of data (such as a few hundred tuple data set)?
It seems that if I understand correctly it would be efficiency and scalability algorithms would have to be efficient, while at the same time be quick enough to be usable. It seems like that is one of the issues of huge amounts of data. Sometimes would have to be mined in partial pieces. Dealing with smaller amounts of data, algorithms are more easily configured to run efficiently for quicker performance.

Similar Documents

Premium Essay

Intro to Data Mining

...Data Mining: Concepts and Techniques (3rd ed.) Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University ©2011 Han, Kamber & Pei. All rights reserved. Adapted for CSE 347-447, Lecture 1b, Spring 2015 1 1 Introduction n  n  n  n  n  n  n  n  n  n  Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technologies Are Used? What Kind of Applications Are Targeted? Major Issues in Data Mining A Brief History of Data Mining and Data Mining Society Summary 2 Why Data Mining? n  The Explosive Growth of Data: from terabytes to petabytes n  Data collection and data availability n  Automated data collection tools, database systems, Web, computerized society n  Major sources of abundant data n  n  n  Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, YouTube n  n  We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets 3 Evolution of Sciences: New Data Science Era n  n  Before 1600: Empirical science 1600-1950s: Theoretical science n  Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding...

Words: 3169 - Pages: 13

Premium Essay

Bi Outline

...Inc. (7 am – 8 am) Intro to BI – Business Intelligence is bringing the right information at the right time to the right people in the right format and is the framework for decision support. BI combines architecture, databases or data warehouses, analytical tools and applications. BI’s major objective is to enable easy access to data and provide business mangers with the ability to conduct analysis. BI helps to transform data, to information, to decisions and finally to action. (8am – 9am) Management Support Systems – Extends information retrieval to the end user with queries and analysis capabilities of the XYZ Inc.’s data. Decision-making tools are available to all functions of XYZ Inc. i. OLAP – Online Analytical Processing ii. CRM – Consumer Relationship Management iii. OLTP – Online Transaction Processing iv. ETL – Enterprise Resource Planning System (10am –11am) Data – Can be pulled in from each of the 5 main areas of XYZ Inc. Finance, Human Resources, Marketing, Accounting and Manufacturing will supply data to be input, cleansed and stored in a data warehouse. (11am – 12 pm) Storing Data – A database used for reporting analysis. The data stored in the warehouse is uploaded from operational systems. Information systems use the data from data warehouses / data marts. Lunch Break - (12pm-1pm) (1pm -2pm) Mining Data - The process through which previously undiscovered patterns in data were identified. Data mining is performed by Information...

Words: 428 - Pages: 2

Premium Essay

Finance

...INTRO: •Online transaction processing (OLTP) gathering and processing information and updating existing information to reflect the processed information * —Supports operational processing * —Sales orders, accounts receivable, etc * —Supported by operational databases & DBMSs * •Online analytical processing (OLAP) manipulation of information to support decision making * —Helps build business intelligence * —Supported by data warehouses and data-mining tools RELATIONAL DATABASE MODEL: žDatabase – collection of information that you organize and access according to the logical structure of the information žRelational database – series of logically related two-dimensional tables or files for storing information * —Relation = table = file * —Most popular database model DATABASE- CREATED WITH LOGICAL STRUCTURES žData dictionary – contains the logical structure for the information in a database žPrimary key – field (or group of fields) that uniquely describes each record žForeign key – primary key of one file that appears in another file žIntegrity constraints – rules that help ensure the quality of information žData dictionary, for example, defines type of information – numeric, date, and so on žForeign keys – must be found as primary keys in another file * —E.G., a Customer Number in the Order Table must also be present in the Customer Table Database management system (DBMS) helps you specify the logical requirements for...

Words: 1431 - Pages: 6

Premium Essay

Data Mining

...Data Mining Information Systems for Decision Making 10 December 2013 Abstract Data mining the next big thing in technology, if used properly it can give businesses the advance knowledge of when they are going to lose customers or make them happy. There are many benefits of data mining and it can be accomplished in different ways. The problem with data mining is that it is only as reliable as the data going in and the way it is handled. There are also privacy concerns with data mining. Keywords: data mining, benefits, privacy concerns Data Mining Benefits of Data Mining for a Business Data mining can be explained as the process of a business collecting data on their customers or potential customers to increase customer business. A business will collect data on their customers or potential customers and use that data to give them coupons, promote sells, and analyze buying and selling trends. Data mining can benefit the customer as well as the business. Data mining can be used in the retail industry, the finance industry, and the healthcare industry. Any industry can benefit from data mining but those are the top three (Turban & Volonino, 2011). Data mining is a way for large businesses to get to know their customers. The information gathered from data mining can let a large company learn what their customers want and how they want it. It can also benefit large companies get to know their employees, the company can learn how to satisfy their...

Words: 1953 - Pages: 8

Premium Essay

Data Mining

...Data Mining By Jamia Yant June 1st, 2012 Predictive Analytics and Customer Behavior “Predictive analysis is the decision science that removes guesswork out of the decision-making process and applies proven scientific guidelines to find right solution in the shortest time possible.” (Kaith, 2011) There are seven steps to Predictive Analytics: spot the business problem, explore various data sources, extract patterns from data, build a sample model using data and problem, Clarify data – find valuable factors – generate new variables, construct a predictive model using sampling and validate and deploy the model. By using this method, businesses can make fast decisions using vast amounts of data. There are three main benefits of predictive analytics: minimizing risk, indentifying fraud, and pursuing new sources of revenue. Being able to predict the risks involved with loan and credit origination, fraudulent insurance claims, and making predictions with regard to promotional offers and coupons are all examples of these benefits. It basically reduces the cost of making mistakes. This type of algorithm allows businesses to test all sorts of situations and scenarios it could take years to test in the real world. Studying customer behavior gives businesses a competitive advantage and allows them to stay ahead of the competition in their market place. Associations Discovery and Customer Purchases Association analysis is useful for discovering interesting relationships...

Words: 1650 - Pages: 7

Premium Essay

Data Mining

...Data Mining/Data Warehousing Matthew P Bartman Strayer University Ibrahim Elhag CIS 111– Intro to Relational Database Management June 9, 2013 Data Mining/Data Warehousing When it comes to technology especially in terms of storing data there are two ways that it can be done and that is through data mining and data warehousing. With each type of storage there are trends and benefits. In terms of data warehousing there are 5 key benefits one of them being that it enhance business intelligence. What this means is that business processes can be applied directly instead of things having to be done with limited information or on gut instinct. Another benefit of data warehousing is that it can also save time meaning that if a decision has to be made the data can be retrieved quickly instead of having to find data from multiple sources. Not only does data warehousing enhance business intelligence and save time but it can also enchance data quality and consistency.This is accomplished by converting all data into one common format and will make it consistent with all departments which ensures accuracy with the data as well. While these key benefits another one is that it can provide historical intelligence which means that analayze different time periods and trends to make future predictions. One other key benefit is that it provides a great return on investment. The reason being that a data warehouse generates more revenue...

Words: 2018 - Pages: 9

Free Essay

Ciminal Analysis

...Unit 5 Midterm Project By: Taria Davis CJ220: Intro to Criminal Intelligence Analysis Professor Chet Effler August 18, 2012 CityPlus Insurance has several injured parties that are seeing a specific group of chiropractors. These parties are claiming to have soft tissue injuries. CityPlus would like for us to investigate these claims because they believe them to be fraudulent. CityPlus has already done most of the fact finding on this case for us. This will be a big help when the team arrives and starts analyzing the data. Now we need to focus on the information that CityPlus has provided. Now we need to put together a team of skilled analysts. Once the team has been assembled it will be time to get organized. A record of the facts will have to be kept and maintained throughout the length of the investigation. The team also will need to be informed about the attorneys that are involved in the case. No items can be overlooked during the investigation due to the scrutiny that we will be under. All the information needs to be put into categories, such as vehicle information, witnesses, and victims. There also needs to be a category for the date, time, and location of the accidents. These will help organize all of the information that has and will be gathered. Each person and item involved will offer valuable information. Once all of this information is categorized the team will be able to start seeing a pattern. These patterns will help strengthen the...

Words: 628 - Pages: 3

Premium Essay

Paper

...OIM 310 Intro to Management Science - The most frequently used methods in modeling and analyzing business and economic problems. The process of abstracting and model building, and the role of various types of models in description and decision making. OIM 320 Quality Management - Quality control concepts including: fundamental computer and statistical concepts: Statistical Process Control (SPC) using control charts; methods for quality improvement; acceptance sampling; industrial experimentation and reliability. OIM 321 Business Process Simulation - Computer simulation presented for carrying out trial-and-error experiments on computer approximations of real, management systems. The goal is to 1) validate a new idea quickly, 2) diagnose potential product design problems, 3) optimize performance of complex systems, and 4) learn about something complex. The Arena environment, based on the SIMAN language, used to build models and video game-like animations. Prerequisites: (FINOPMGT 347 or OIM 301) and (FINOPMGT 353 or OIM 310) OIM 322 Business Forecasting - Introduction to the uses, limitations, and shortcomings of various short-term and long-term forecasting techniques. Problems selected from accounting, finance, management and marketing. Prerequisite: upper-level Isenberg School of Management standing. OIM 410 Business Process Optimization OIM 412 Supply Chain Management - Integrated supply chain constitutes a core firm competency, spanning most business functions...

Words: 709 - Pages: 3

Premium Essay

Bi Project Report

...[BI-PROJECT REPORT] April 13, 2014 DATA MINING Analysis of Bike sharing dataset April 13, 2014 Group 007 MIS 6324 1 [BI-PROJECT REPORT] April 13, 2014 Project Report for Analysis of bike sharing dataset MIS-6324 Intro. to business intelligence software and techniques Prepared by Group Name Group007 Group Members Rohith Raj Abhay Joshi Sai Karan Jahnavi Papanaboina Under the guidance of Professor Kelly Slaughter, PhD Clinical Professor Information Systems University of Texas at Dallas MIS 6324 2 [BI-PROJECT REPORT] April 13, 2014 Table of Contents 1.Introduction to Data Mining ...................................................................................................................... 4 2. Background of the dataset ........................................................................................................................ 4 2.1 Description of dataset ......................................................................................................................... 5 3.Outline of Analysis ..................................................................................................................................... 6 4. The Methodology ...................................................................................................................................... 7 5. Pre-processing the dataset ...........................................................................................................

Words: 2575 - Pages: 11

Premium Essay

Online Register

...Two- Week ISTE Workshop for teachers on ‘Database Management Systems’ (21st – 31st May, 2013) Course Coordinator: Prof. S. Sudarshan Day / Date Tue 21 May 2013 09:00 – 10:30 Inaugural remarks (15 mins) Relational Model, SQL Part 1: Relations and Relational Algebra, Basic SQL, Joins, Set operations (Chapters 1, 2 and 3) SQL Part 2: Aggregate functions, Nested Subqueries, Database modification (Chapters 3 and 4) Tea Break 11:00 – 13:00 Session Continues.. (last 30 min discussion/quiz) Lunch 14:00-17:00 Tea Break 5:15-6:00 10:30 – 11:00 13:00 – 14:00 Lab 1: Basic SQL Installing, administering and using PostgreSQL and pgAdmin3; Basic SQL queries Lab 2: Intermediate SQL Aggregation, nested subqueries, database modification 17:0017:15 No Session Wed 22 May 2013 Session 10:30 – Continues.. 11:00 (last 30 min discussion/quiz) 13:00– 14:00 17:0017:15 Linux System Admin. (IITB CSE Sysadms) Thu 23 May 2013 Fri 24 May 2013 Sat 25 May 2013 Sun 26 May 2013 SQL Part 3: Session Outerjoins, Transactions, Integrity 10:30 – Continues.. constraints, Triggers, 11:00 (last 30 min Authorization, JDBC discussion/quiz) (Chapters 4 and 5) ER Design (Chapter 7) Session 10:30 – Continues.. 11:00 (last 30 min discussion/quiz) 10:30 – 11:00 Session Continues.. (last 30 min discussion/quiz) 13:00 – 14:00 Lab 3: Advanced SQL Outerjoins, DDL: integrity constraints, authorization Lab 4: ER Design Tutorial (Last 45 mins for solutions discussion, broadcast)...

Words: 591 - Pages: 3

Premium Essay

Hi I Am Very Simple Boy. I Like to Meet New People.

...AMERICAN INTERNATIONAL UNIVERSITY-BANGLADESH Summer’ 2012-2013 FINAL EXAM SCHEDULE (Released on July 13, 2013) Day 1: July 29, 2013 (Monday) TIME CAMPUS 1 ,4 & 5 GLOBAL TRADE HEALTH AND SAFETY FINANCIAL ACCOUNTING E-GOVERNANCE URBAN DESIGN-2 E-MARKETING AUDITING CAMPUS 7 SECTIONS ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL A,B,C,D,E,F,G,H,I,J,K,L ALL ALL ALL M,N,O,P,Q,R,S,T,U ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL 9:3011:30 ELECTRICAL CIRCUITS 1 ELECTRICAL CIRCUITS 2 ELECTRICAL MACHINES-1 ADVANCE MACRO ECONOMICS POWER SYSTEM PROTECTION STATISTICS FOR SOCIAL SCIENCE INTERNATIONAL LOGISTICS AND SUPPLY CHAIN MANAGEMENT STRATEGIC MANAGEMENT BANK FUND MANAGEMENT SOCIETY & THE ARCH. OF BENGAL BUSINESS COMMUNICATIONS 12-2 INTRODUCTION TO ENGLISH POETRY COMPILER DESIGN MACRO ECONOMICS BUSINESS COMMUNICATIONS ELECTRICAL MACHINES -2 MANAGEMENT AND MEDIA MARKETING TRAINING & DEVELOPMENT FINANCIAL STATEMENT ANALYSIS OPERATING SYSTEM-2 DISCRETE MATHEMATICS FINANCIAL MANAGEMENT CONSUMER BEHAVIOR ALGORITHMS 3-5 AGRICULTURAL ECONOMICS SYNTAX AND SEMANTICS DESIGN THEORY-I MATH METH OF ENG’G VLSI CIRCUIT DESIGN Released on July 13, 2013 Day 2: July 30, 2013 (Tuesday) TIME CAMPUS 1 ,4 & 5 WEB TECHNOLOGIES TAXATION FUNDAMENTALS OF COMPENSATION BRAND & PRODUCT MGMT. INTRODUCTION TO BUSINESS PROGRAMMING LANGUAGE 1 (CS) CAMPUS 7 SECTIONS ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL...

Words: 1161 - Pages: 5

Free Essay

Data Mining

...Assignment-4 1) Determine the benefits of data mining to the businesses when employing: a. Predictive analytics to understand the behavior of customers b. Associations discovery in products sold to customers c. Web mining to discover business intelligence from Web customers d. Clustering to find related customer information Ans: a. Services and applications: Some examples regarding products and services made available from 3G wi-fi networks are usually CDMA2000, UMTS, and EDGE along with a list of others while 4G networks present Wimax2 and also LTE-Advance. The applications are where a great deal of end users find hooked on technology. 3G opened a whole new sphere regarding options together with purposes permitting end users to mode video clip and also sound, video clip getting in touch with, video clip conferencing plus a massive variety of media purposes in the cell phone natural environment. 4G purposes include things like video gaming products and services, amended cell phone net admittance, high definition cell phone television set, video clip conferencing, IP telephone systems and in some cases 3D images television set. b. Network Architecture: One of many variations involving both of these specifications regarding cpa networks is based on the particular network architecture; the particular 3G cell network is really a Large Spot Cell phone Dependent network that has a circuit-switched subsystem. It relies on substantial satellite connections which...

Words: 2003 - Pages: 9

Premium Essay

Final Exam Schedule of Spring 2015-2016

...AMERICAN INTERNATIONAL UNIVERSITY-BANGLADESH Spring’ 2015-2016 FINAL EXAM SCHEDULE [Released on March 29, 2016] Day 1: April 23, 2016 (Saturday) TIME 9:3011:30 Building 1, 5 & 4 PRINCIPLES OF ECONOMICS ECONOMIC GEOGRAPHY EMBEDDED PROGRAMMING MEASUREMENT & INSTRUMENTATION PROFESSIONAL TRAINING BASIC PLANNING SELECTION AND STAFFING [HRM] Building 7 CHEMISTRY NEWSPAPER DESIGN, MAKE UP AND DESKTOP PUBLISHING DEVELOPMENT ECONOMICS GLOBAL FINANCE MODERN PHYSICS THEORY OF COMPUTATION FINANCIAL INSTITUTIONS AND MARKETS LEGAL ENVIRONMENT IN BUSINESS BRAND & PRODUCT MGMT.[MKT.] INTRO. TO SOCIOLINGUISTICS 12-2 3-5 PHYSICS 2 LEGAL SYSTEM OF BANGLADESH [LAW] BUILDING AND FINISH MATERIALS STRUCTURE-I (BASIC MECHANICS OF SOLIDS) CONTROL SYSTEM BASICS IN SOCIAL SCIENCE ENTREPRENEURSHIP DEVELOPMENT TRAINING & DEVELOPMENT [HRM] SOFTWARE REQUIREMENT ENG. RURAL MARKETING [MKT.] PROFESSIONAL PRACTICE STATISTICS & PROBABILITY STATISTICS & PROBABILITY CONTEMPORARY ISSUES IN GLOBAL ECONOMY INTRODUCTION TO ECONOMICS PUBLIC SPEAKING POWER STATIONS Released on March 29, 2016 SECTIONS A,B,C,D,E,F,G,H,I,J,K,L,M,O,P,Q,R A,B,C,D,E A A,B A A A,B A,C1,C2,C3,C4,C5,C6,C7,C8 A A A,B,C A,B,C,D,E,F,G,H A,B,C,D,E,F,G,H,I A,B,C B,C,D,I,J A,B A A,B,C,D,E,F,G,H,I,J,K,L,M,N,O F1 A A A,C,D A,B,C,E,F,G,H,I, A,B,C,E,F,G,H,I A A,B,C A A N,O,P,Q A,B,C,D,E,F,G,H,I,J,K,L,M A A A A,B,C,D,E,F,G Day 2: April 24, 2016 (Sunday) ...

Words: 1987 - Pages: 8

Premium Essay

Senior Systrem Engineer

...network architecture and administration; web technologies; and application development, implementation, and maintenance. This undergraduate degree program includes 45 credits in the required course of study and 15 credits in the concentration. Some courses have prerequisites. In addition, students must satisfy general education and elective requirements to meet the 120-credit minimum, including a minimum of 48 upper-division credits required for completion of the degree. At the time of enrollment, students must choose a concentration. The Information Management concentration is designed to provide coverage of the collection, architecture, modeling, retrieval and management of data for meaningful presentation to the organization. This concentration prepares students to develop, deploy, manage, and integrate data and information systems to support the organization. Note: The diploma awarded for this program will read: Bachelor of Science in Information Technology and will not reflect the concentration. Concentrations are reflected on the transcript only. Required Course of Study GEN 195 Foundations of University Studies The essential information, skills, tools, and techniques necessary for academic success and personal effectiveness at University of Phoenix are introduced in this course. The course develops and applies practical knowledge and skills immediately relevant to...

Words: 1892 - Pages: 8

Free Essay

Ontology Based Web Searching Mechanism for Information Retrieval

...1 Ontology Based Web Searching Mechanism for Information Retrieval W.A.C.M. Wickrama Arachchi & K.L. Jayarathne University of Colombo School of Computing, Sri Lanka chamil.madusanka@gmail.com & klj@ucsc.cmb.ac.lk Abstract—The largest data repository, World Wide Web is being a popular research domain where many experiments carry on various types of search architectures. This paper explore the ability of applying concept to concept mapping to the search architecture that applied to a semantic model of given domain. This novel search architecture combines classical search techniques with ontological approach. This research presents effective mechanism to represent the result of meaningful web search. For the simplicity, the breast cancer domain has been used. Index Terms—ontology, semantic web, web search, Semantic Search, concept, keyword extraction I. I NTRODUCTION T HE World Wide Web has been grown up as tree which has spread its branches in all the areas. Thus it can be identified as the largest data repository in the world that presents key driving force for large scale of information technology. With the increase of the amount of content it has been difficult to build an interactive web search with traditional keyword search. The idea presented here is improve the searching process with information extracted from the semantic model of the domain. Ontology is the backbone of semantic web technologies. One of the greatest problems of the...

Words: 5464 - Pages: 22