Premium Essay

Statistical Learning and Data Mining

In:

Submitted By natakpaper
Words 471
Pages 2
Project Proposal
Statistical Learning and Data Mining

Overview: Efficient asset allocation through statistical learning methods and comparison of methods for the creation of an index tracking ETF (Exchange traded fund)
Datasets:
The datasets are chosen from the website of the book “Statistics and Data Analysis for Financial Engineering” by David Ruppert. The book is mentioned as one of the references for this course. The two data sets chosen are 1. Stock_FX_Bond.csv 2. Stock_FX_Bond_2004_to_2006.csv

The data includes the volumes and adjusted closing prices for GM, F, UTX, CAT, MRK, PFE, MSFT, IBM, C and XOM. The data also contains the volumes and adjusted closing prices for the S&P 500 index. The data set also includes treasury rates for different maturities and rates on corporate bonds as well as foreign exchange rates for the period of 1987 to 2006.

Objectives: 1. Optimal portfolios for various levels of Risk.

Conventional investors look to attain maximum alpha values (rates of return) at levels of risk they are comfortable with. We can hence at any level of risk, define portfolios that generate maximal returns. In this project, we aim to identify the composition of portfolios that achieves this desired objective.

Existing models such as CAPM, along with additional forms of regression will be used to compare with additional methods, not covered in the duration of the class to identify the better methods of portfolio creation. We will use learning tools and models to predict the rates of return and risk for each stock that will allow us to build portfolios to suit needs. We will carry out uncertainty analysis using resampling techniques and attempt to use Bayesian methods as well. The performance will be tested using future rates of return on these portfolios.

2. Creation of an index tracking exchange traded instrument

An ETF is an

Similar Documents

Premium Essay

Data Mining

...1. Define data mining. Why are there many different names and definitions for data mining? Data mining is the process through which previously unknown patterns in data were discovered. Another definition would be “a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and identify useful information and subsequent knowledge from large databases.” This includes most types of automated data analysis. A third definition: Data mining is the process of finding mathematical patterns from (usually) large sets of data; these can be rules, affinities, correlations, trends, or prediction models. Data mining has many definitions because it’s been stretched beyond those limits by some software vendors to include most forms of data analysis in order to increase sales using the popularity of data mining. What recent factors have increased the popularity of data mining? Following are some of most pronounced reasons: * More intense competition at the global scale driven by customers’ ever-changing needs and wants in an increasingly saturated marketplace. * General recognition of the untapped value hidden in large data sources. * Consolidation and integration of database records, which enables a single view of customers, vendors, transactions, etc. * Consolidation of databases and other data repositories into a single location in the form of a data warehouse. * The exponential increase...

Words: 4581 - Pages: 19

Premium Essay

Report on Data Mining

...Report – Webcast 8/13/14 on Data Mining SAS (Statistical Analysis System) was originally developed as a project to analyze agriculture from 1966-1976 at North Carolina State University. As demand for such software grew, SAS Institute was founded in 1976. SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and they provide more advanced options through the SAS programming language. On August 13 2014, SAS sponsored a web seminar titled “Analytically Speaking” the topic of the webcast was data mining techniques. Michael Berry and Gordon Linoff were the featured speakers, they have written a leading introductory book (on data mining) titled “Data Mining Techniques”. They discussed a lot of the current data mining landscape, including new methods, new types of data and the importance of using the right analysis for your problem (as good analysis is wasted doing the wrong thing). They also briefly discussed using ‘found data’ – text data, social data and device data. Michael Berry is the Business Intelligence Director at TripAdvisor and co-founder of Data Miners Inc. Gordon Linoff is co-founder of Data Miners Inc. and a consultant to financial, media and pharmaceutical companies. Data mining is the analysis step of the “KDD” (Knowledge Discovery in Databases). Data mining is an interdisciplinary sub-field...

Words: 818 - Pages: 4

Premium Essay

Report on Data Mining

...Report – Webcast 8/13/14 on Data Mining SAS (Statistical Analysis System) was originally developed as a project to analyze agriculture from 1966-1976 at North Carolina State University. As demand for such software grew, SAS Institute was founded in 1976. SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and they provide more advanced options through the SAS programming language. On August 13 2014, SAS sponsored a web seminar titled “Analytically Speaking” the topic of the webcast was data mining techniques. Michael Berry and Gordon Linoff were the featured speakers, they have written a leading introductory book (on data mining) titled “Data Mining Techniques”. They discussed a lot of the current data mining landscape, including new methods, new types of data and the importance of using the right analysis for your problem (as good analysis is wasted doing the wrong thing). They also briefly discussed using ‘found data’ – text data, social data and device data. Michael Berry is the Business Intelligence Director at TripAdvisor and co-founder of Data Miners Inc. Gordon Linoff is co-founder of Data Miners Inc. and a consultant to financial, media and pharmaceutical companies. Data mining is the analysis step of the “KDD” (Knowledge Discovery in Databases). Data mining is an interdisciplinary sub-field...

Words: 818 - Pages: 4

Premium Essay

Managment

...An Introduction to Data Mining Kurt Thearling, Ph.D. www.thearling.com 1 Outline — Overview of data mining — What is data mining? — Predictive models and data scoring — Real-world issues — Gentle discussion of the core algorithms and processes — Commercial data mining software applications — Who are the players? — Review the leading data mining applications — Presentation & Understanding — Data visualization: More than eye candy — Build trust in analytic results 2 1 Resources — Good overview book: — Data Mining Techniques by Michael Berry and Gordon Linoff — Web: — My web site (recommended books, useful links, white papers, …) > http://www.thearling.com — Knowledge Discovery Nuggets > http://www.kdnuggets.com — DataMine Mailing List — majordomo@quality.org — send message “subscribe datamine-l” 3 A Problem... — You are a marketing manager for a brokerage company — Problem: Churn is too high > Turnover (after six month introductory period ends) is 40% — Customers receive incentives (average cost: $160) when account is opened — Giving new incentives to everyone who might leave is very expensive (as well as wasteful) — Bringing back a customer after they leave is both difficult and costly 4 2 … A Solution — One month before the end of the introductory period is over, predict which customers will leave — If you want to keep a customer that is predicted to churn, offer them something based on their predicted...

Words: 3180 - Pages: 13

Premium Essay

Data Mining In Computer Science

...CHAPTER 2 DATA MINING TECHNIQUE OVERVIEW 2.1 Introduction In the 21st century as we are moving towards more and more online system, the databases have grown into terabytes. Within this huge data, information of importance needs to be identified. Since the evolution of human life, the people discover patterns. As farmer recognizes pattern of growth in the field, bank recognizes the earning and spending pattern of a customer and politicians seeks pattern in voter opinion. This huge amount of data needs to be used either for business growth or scientific discoveries. The process of discovering the patterns and relationships in data using the analysis tools is called Data Mining. The simplest form of data mining is as follows: 1. Describing...

Words: 2594 - Pages: 11

Premium Essay

Business Analytics

...Data Mining for Fraud Detection: Toward an Improvement on Internal Control Systems? Mieke Jans, Nadine Lybaert, Koen Vanhoof Abstract Fraud is a million dollar business and it’s increasing every year. The numbers are shocking, all the more because over one third of all frauds are detected by ’chance’ means. The second best detection method is internal control. As a result, it would be advisable to search for improvement of internal control systems. Taking into consideration the promising success stories of companies selling data mining software, along with the positive results of research in this area, we evaluate the use of data mining techniques for the purpose of fraud detection. Are we talking about real success stories, or salesmanship? For answering this, first a theoretical background is given about fraud, internal control, data mining and supervised versus unsupervised learning. Starting from this background, it is interesting to investigate the use of data mining techniques for detection of asset misappropriation, starting from unsupervised data. In this study, procurement fraud stands as an example of asset misappropriation. Data are provided by an international service-sector company. After mapping out the purchasing process, ’hot spots’ are identified, resulting in a series of known frauds and unknown frauds as object of the study. 1 Introduction Fraud is a million dollar business and it is increasing every year. ”45% of companies worldwide have fallen victim...

Words: 6259 - Pages: 26

Premium Essay

Data Mining

...Data Mining Objectives: Highlight the characteristics of Data mining Operations, Techniques and Tools. A Brief Overview Online Analytical Processing (OLAP): OLAP is the dynamic synthesis, analysis, and consolidation of large volumns of multi-dimensional data. Multi-dimensional OLAP support common analyst operations, such as: ▪ Considation – aggregate of data, e.g. roll-ups from branches to regions. ▪ Drill-down – showing details, just the reverse of considation. ▪ Slicing and dicing – pivoting. Looking at the data from different viewpoints. E.g. X, Y, Z axis as salesman, Nth quarter and products, or region, Nth quarter and products. A Brief Overview Data Mining: Construct an advanced architecture for storing information in a multi-dimension data warehouse is just the first step to evolve from traditional DBMS. To realize the value of a data warehouse, it is necessary to extract the knowledge hidden within the warehouse. Unlike OLAP, which reveal patterns that are known in advance, Data Mining uses the machine learning techniques to find hidden relationships within data. So Data Mining is to ▪ Analyse data, ▪ Use software techniques ▪ Finding hidden and unexpected patterns and relationships in sets of data. Examples of Data Mining Applications: ▪ Identifying potential credit card customer groups ▪ Identifying buying patterns of customers. ▪ Predicting trends of market...

Words: 1258 - Pages: 6

Premium Essay

Intro to Data Mining

...Data Mining: Concepts and Techniques (3rd ed.) Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University ©2011 Han, Kamber & Pei. All rights reserved. Adapted for CSE 347-447, Lecture 1b, Spring 2015 1 1 Introduction n  n  n  n  n  n  n  n  n  n  Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technologies Are Used? What Kind of Applications Are Targeted? Major Issues in Data Mining A Brief History of Data Mining and Data Mining Society Summary 2 Why Data Mining? n  The Explosive Growth of Data: from terabytes to petabytes n  Data collection and data availability n  Automated data collection tools, database systems, Web, computerized society n  Major sources of abundant data n  n  n  Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, YouTube n  n  We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets 3 Evolution of Sciences: New Data Science Era n  n  Before 1600: Empirical science 1600-1950s: Theoretical science n  Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding...

Words: 3169 - Pages: 13

Premium Essay

Business Intelligence

...AND ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big data analytics, Web 2.0 Introduction...

Words: 16335 - Pages: 66

Premium Essay

Business Analytics

...exploration of an organization’s data with emphasis on statistical analysis.  It describes the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics is used by companies committed to data-driven decision making.  It focuses on developing new insights and understanding of business performance based on data and statistical methods. BA is used to gain insights that inform business decisions and can be used to automate and optimize business processes. Business analytics makes extensive use of statistical analysis, including explanatory and predictive modeling, and fact-based management to drive decision making. It is therefore closely related to management science. Analytics may be used as input for human decisions or may drive fully automated decisions. Data-driven companies treat their data as a corporate asset and leverage it for competitive advantage. Successful business analytics depends on data quality, skilled analysts who understand the technologies and the business and an organizational commitment to data-driven decision making. Once the business goal of the analysis is determined, an analysis methodology is selected and data is acquired to support the analysis.  Data acquisition often involves extraction from one or more business systems, cleansing, and integration into a single repository such as a data warehouse or data mart.  The analysis is typically...

Words: 4604 - Pages: 19

Free Essay

Computative Reasoning

...Scholarship, 4 years) – Emphasis on machine learning/data mining and algorithm design/software development related to bioinformatics and optimization • Oregon State University B.Sc. Mathematics, B.Sc. Computational Physics, B.Sc. Physics Corvallis, OR 2004 - 2008 – Graduated Magna Cum Laude with minors in Actuarial Sciences and Mathematical Sciences – Strong emphasis on scientific computing, numerical analysis and software development Skills • Development: C/C++, Python, CUDA, JavaScript, Ruby (Rails), Java, FORTRAN, MATLAB • Numerical Analysis: Optimization, Linear Algebra, ODEs, PDEs, Monte Carlo, Computational Physics, Complex Systems, Iterative Methods, Tomology • Computer Science: Machine Learning, Data Mining, Parallel Programming, Data Structures, Artificial Intelligence, Operating Systems • Discovering and implementing new ideas. Give me an API and a problem and I will figure it out. • Diverse background in Math, Computer Science, Physics and Biology allows me to communicate to a wide scientific and general audience and begin contributing to any group immediately. • I have worked in many places in a myriad of fields. I can readily learn and adapt to a new discipline, area or environment and start pushing real results quickly. Research and Work Experience Bloomberg LP Financial Software Development Intern New York, NY Summer 2011 • – Developed end-to-end reporting software in C++ and javascript – Implemented statistical models to perform forward and backward portfolio...

Words: 673 - Pages: 3

Premium Essay

Bpcl

...ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big data analytics...

Words: 16335 - Pages: 66

Premium Essay

Disadvantages Of Educational Technology

...“Education is not the learning of facts, but training of the mind to think”. - Albert Einstein. Educational technology is defined by the Association for Educational Communications and Technology (AECT) as "the study and ethical practice of facilitating learning and improving performance by creating, using, and managing appropriate technological processes and resources."(Robinson,2007). Further, Educational technology is defined as “the process of integrating technology into education in a positive manner that promotes a more diverse learning environment and a way for students to learn how to use technology as well as their common assignments”. (Benjamin Herold, 2015) defines that “Anything that enhances classroom learning in the utilization of blended or online learning is considered as Education technology” and such technology encompasses Web based Education systems (e-learning, Technology Enhanced Learning (TEL), Internet-based training), Computer-Based Training (CBT), Information and communication technology (ICT) in education, Virtual education, Virtual learning environments (VLE) and Learning Management Systems (LMS). Each of these numerous terms has had its advocates, who point up potential distinctive features. (Moorea,2011). The use of...

Words: 993 - Pages: 4

Premium Essay

The Pros And Cons Of Online Education

...suggesting that student learning outcomes in online courses are superior to those in traditional face-to-face courses. online learning is a promising means to increase access and improve student progression through college, the Department of Education report does not provide evidence that fully online college courses produce superior learning outcomes. Research shows that it is possible to...

Words: 2168 - Pages: 9

Premium Essay

Paroll and It

...ORG An Efficient Connection between Statistical Software and Database Management System Sunghae Jun Department of Statistics, Cheongju University Chungbuk 360-764 Korea ABSTRACT In big data era, we need to manipulate and analyze the big data. For the first step of big data manipulation, we can consider traditional database management system. To discover novel knowledge from the big data environment, we should analyze the big data. Many statistical methods have been applied to big data analysis, and most works of statistical analysis are dependent on diverse statistical software such as SAS, SPSS, or R project. In addition, a considerable portion of big data is stored in diverse database systems. But, the data types of general statistical software are different from the database systems such as Oracle, or MySQL. So, many approaches to connect statistical software to database management system (DBMS) were introduced. In this paper, we study on an efficient connection between the statistical software and DBMS. To show our performance, we carry out a case study using real application. Keywords Statistical software, Database management system, Big data analysis, Database connection, MySQL, R project. 1. INTRODUCTION Every day, huge data are created from diverse fields, and stored in computer systems. These big data are extremely large and complex [1]. So, it is very difficult to manage and analyze them. But, big data analysis is important issue in many...

Words: 2685 - Pages: 11