Free Essay

Analytics Text Mining

In: Business and Management

Submitted By necropol
Words 745
Pages 3
Task A: Text Clustering - 2 Clusters
After running the Text Cluster, the following observations were obtained:
Table 1. Cluster Summary
Cluster Weight Frequency RMSSTD Cluster Description
1 0.8 2248 0.124345 +action +good +plot characters effects movies pretty +movie real +year first +old +few +end films +character +feel +watch +cast +director
2 0.2 551 0.094437 +battle +history +man +stone alexander angelina anthony battles colin farrell historical hopkins hours jolie men oliver scenes stone troy +life

Table 2. Cluster-Specific Means
Cluster Rat_10scl (mean) Useful (mean) RevLen_Words (mean)
1 6.121 0.388 241.981
2 5.461 0.413 277.301

Table 3. Cluster-Specific Genre Distribution
Cluster thriller romance action drama comedy animation Sum
1 11.65% 5.43% 40.39% 25.71% 15.97% 0.93% 100%
2 0.36% 0.36% 13.07% 86.21% 0% 0% 100%

Description of the clusters:
Cluster 1:
This cluster has larger number of observations under it which is 80% of total reviews with a frequency, that is, number of reviews of 2248. Total number of reviews processed is 2799. Even though this cluster is larger, from the value of RMSSTD of 0.124 which is higher than that of Cluster 2, it shows that this cluster is more heterogeneous. That is, the reviews are more varied and inconsistent.
Overall from the list of terms displayed under the ‘Descriptive Terms’, we see quite a different variety of terms. Terms like ‘plot’, ‘characters’, ‘cast’, ‘director’, ‘watch’, etc, shows that this cluster is more concerned with the type or quality of the movie dependent on the characters and plot. Even though we cannot definitively say what is the overall view of all the reviewers under this cluster, we can still however say that the reviews are more about the all-around quality of the movies. Even our cluster analysis shows that the reviews are more scattered as we can understand from the value of the RMSSTD.
‘Rat_10scl’ means the rating provided to a movie on a 10 point scale. And the mean of all these ratings in cluster 1 is 6.121 in Table 2, which is higher than that of cluster 2. So people in general have rated these movies higher in this cluster.
‘Useful’ is the ratio of people who have found the reviews useful to the total number of people who have read the reviews. The value of this is 0.388 for cluster 1, which is lower than that of cluster 2, so lesser people have found the reviews under cluster 1 useful. Technically, this is how the clustering has been done – reviews which were useful to less people and reviews which rated movies higher, were clustered in cluster 1.
‘RevLen_Words’ is the number of words in a review. So from the average of this value in cluster 1, we see that it is lower than that in cluster 2.
From Table 3, we can see the percentage distribution of genres amongst the two clusters. For example, cluster 1 has highest number of action movies (40.39%), second highest being drama (25.71%). Another important observation is cluster 1 has movies from all the 6 genres listed here, whereas cluster 2 doesn’t have any movies from the comedy and animation genres.
So from the above observations, we can Label Cluster 1 as Scattered/Action (By scattered we mean that there are varied categories of reviews here, more varied than in cluster 2).

Cluster 2:
This cluster has just 20% of observations totaling 551 reviews. The RMSSTD of 0.094 which is lesser than that of cluster 1 shows that cluster 2 is more homogeneous, it has more consistent reviews. From the ‘Descriptive Terms’, we see terms like ‘battle’, ‘history’, ‘anjelina’, ‘jolie’, ‘colin’, ‘farell’, ‘troy’, etc, which definitively shows that this cluster has more movies which are about historic battles and generally involve more drama.
This cluster has however received a lower average ratings for the movies than cluster 1, but more people have found the reviews useful than those in cluster 1. ‘Rat_10scl’ value is 5.461 and ‘Useful’ is 0.413. The average number of words in the reviews in this cluster is also higher than that of cluster 1.
From Table 3, we see that majority of movies in cluster 2 are of the genre drama. It is a clear majority. The second highest is action.
So from these observations, we can Label Cluster 2 as Consistent/Drama.

Similar Documents

Free Essay

Texting Mining for Gold

...Text Mining For Gold 1) What is the business impact of text mining? What problems does it solve? Text mining is the discovery of patterns and relationships from large sets of unstructured data; such as text files, emails, memos, call center transcripts, survey responses, legal cases, patent descriptions, and service reports. Text mining and text mining tools help businesses analyze this data (Laudon 164). The tools are able to extract the key elements from large unstructured data sets, discover patterns and relationships and summarize the information. Businesses use these tools to analyze transcripts of calls to customer service centers to identify major service and repair issues. The problems that are solved with text mining is; it shortens the time to accurately find data. By converting unstructured text into structure output, text mining results can feed into further analytics or be combined with the results of other data analyses. By doing so it enables delivery of comprehensive, high quality text mining results as part of systematic and reproducible workflows. 2) How does text mining improve operational efficiency and decision making? Text mining improves efficiency and decision making by providing the tools such as software so that companies can choose what data they want to focus on. Text mining software is starting to get popular and software companies are developing software to accommodate business needs. Example, the Law Firm DLA Piper discussed in...

Words: 815 - Pages: 4

Free Essay

Text Mining

...2013 Submitted To: Prof. Raleigh 06/07/2013 Text Mining Submitted By: Roshan Bhattachan What challenges does the increase in unstructured data present for businesses? Text mining is the discovery of pattern and relationships from large set of unstructured data-the kind of data we generate in emails, phone conversation, blog posting, online customer surveys, and tweets (Laudon & Laudon, 2012). These unstructured data contains lots of useful information, and businesses can use this information to make a better decision making. The challenges for today businesses are how they can make best use of this unstructured information. It’s not a piece of cake to get information out easily because there are millions of information over the internet, and the success of businesses lies in how effectively and efficiently they can process and analyze this information , and use it to make better decision making. It’s a complex and rigorous tasks, and needs people time and money to take out best of information from this unstructured data. How does text-mining improve decision making? Text mining tools are now available to help businesses analyze unstructured data. These tools are able to extract key elements from large unstructured data sets, discover patterns and relationship, and summarize the information. For example: JetBlue in 2007 experienced a number of customer discontent which resulted in large number flight cancelation. It received around 15000 emails per day, and...

Words: 758 - Pages: 4

Free Essay

Social Media White Paper

...Social Media Data: Network Analytics meets Text Mining Killian Thiel Tobias Kötter Dr. Michael Berthold Dr. Rosaria Silipo Phil Winters Killian.Thiel@uni-konstanz.de Tobias.koetter@uni-konstanz.de Michael.Berthold@uni-konstanz.de Rosaria.Silipo@KNIME.com Phil.Winters@KNIME.com Copyright © 2012 by KNIME.com AG all rights reserved Revision: 120403F page 1 Table of Contents Creating Usable Customer Intelligence from Social Media Data: Network Analytics meets Text Mining............................................................................................................................................ 1 Summary: “Water water everywhere and not a drop to drink” ............................................................ 3 Social Media Channel-Reporting Tools. .................................................................................................. 3 Social Media Scorecards .......................................................................................................................... 4 Predictive Analytic Techniques ............................................................................................................... 4 The Case Study: A Major European Telco. ............................................................................................. 5 Public Social Media Data: Slashdot ......................................................................................................... 6 Text Mining the Slashdot Data .........

Words: 5930 - Pages: 24

Free Essay

Wendy's International Case Study

..."Wendy's International Relies on Text Mining for CEM," and answer the. Select one side of the argument as described and provide convincing points either in favor or against the proposal that an investment in text message collection and mining should be made even if no clear positive ROI (return on investment) from better execution can be determined in advance. A healthy customer relationship plays a crucial part in the success of a business. In today's competitive marketplace, every organization wants to know whether its customers are satisfied with the company services or not and their views about the products and services. These all things can be done with a customer survey. Companies can get many benefits by surveying their customers. It is really an inexpensive way to get customer feedback which can help the company to improve customer retention, and customers' suggestion or their creative ideas can help to improve company's product or may help to launch a new product. A company can survey its customers by several ways including text messaging,web-based feedback forms, social media, e-mail messages, call center notes and receipt-based surveys. Nowadays scenario has been changed. People are always on move, you are not able to see them sitting all the time in front of the computer. But most of all the people keep their mobile with them 24 hours a day. So when the company sends its customers survey questions through text message, the text message is arrived instantaneously...

Words: 1055 - Pages: 5

Free Essay

Hgchlg

...is done as a final project as a part of the training course titled “Business Analytics with R”. I am really thankful to our course instructor Mr. Ajay Ohri, Founder, DecisionStats, for giving me an opportunity to work on the project “Twitter Analysis using R” and providing me with the necessary support and guidance which made me complete the project on time. I am extremely grateful to him for providing me the necessary links and material to start the project and understand the concept of Twitter Analysis using R. In this project “Twitter Analysis using R” , I have performed the Sentiment Analysis and Text Mining techniques on “#Kejriwal “. This project is done in RStudio which uses the libraries of R programming languages. I am really grateful to the resourceful articles and websites of R-project which helped me in understanding the tool as well as the topic. Also, I would like to extend my sincere regards to the support team of Edureka for their constant and timely support. Table of Contents Introduction 4 Limitations 4 Tools and Packages used 5 Twitter Analysis: 6 Creating a Twitter Application 6 Working on RStudio- Building the corpus 8 Saving Tweets 11 Sentiment Function 12 Scoring tweets and adding column 13 Import the csv file 14 Visualizing the tweets 15 Analysis & Conclusion 16 Text Analysis 17 Final code for Twitter Analysis 19 Final code for Text Mining 20 References 21 Introductions Twitter is an amazing micro blogging tool...

Words: 2107 - Pages: 9

Free Essay

Student

...HP and Text Mining What is the pratical application of text mining: The practical application of text mining really is the combination of structured and unstructured data. Text mining applications pull data from sources such as word documents and emails in the form of texts, filters the data, and then translates it into a format the can be analyzed and recorded. Without text mining, written texts and other unstructured data would really become worthless data sources. According to studies done on BI and data mining, businesses and other BI clients are looking more and more to unstructured data as a primary data source. Probably the most pratcial application of text mining would have to be marketing. How do you think text mining techniques could be used in other businesses: There is almost a limitless amount of applications for text mining in other businesses. The most obvious use of text mining for other businesses would be to analyze written customer reviews and/or comments. Essentially, Text Mining can be used anywhere where there is a direct and free form line of communication between an entity and its actors. In the past, only a human was able to read, translate, record, and respond to these lines of communications. Text mining allows these processes to be completed without any human assistance. This means that new divisions and processes within a company could become automated such as responses to customer inquiries. What were HP’s challenges in...

Words: 433 - Pages: 2

Free Essay

Life

...Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose Panagiotis G. Ipeirotis Arun Sundararajan Department of Information, Operations, and Management Sciences Leonard N. Stern School of Business, New York University {aghose,panos,arun}@stern.nyu.edu Abstract Deriving the polarity and strength of opinions is an important research topic, attracting significant attention over the last few years. In this work, to measure the strength and polarity of an opinion, we consider the economic context in which the opinion is evaluated, instead of using human annotators or linguistic resources. We rely on the fact that text in on-line systems influences the behavior of humans and this effect can be observed using some easy-to-measure economic variables, such as revenues or product prices. By reversing the logic, we infer the semantic orientation and strength of an opinion by tracing the changes in the associated economic variable. In effect, we use econometrics to identify the “economic value of text” and assign a “dollar value” to each opinion phrase, measuring sentiment effectively and without the need for manual labeling. We argue that by interpreting opinions using econometrics, we have the first objective, quantifiable, and contextsensitive evaluation of opinions. We make the discussion concrete by presenting results on the reputation system of Amazon.com. We show that user feedback affects the pricing power of merchants and by measuring their pricing...

Words: 6122 - Pages: 25

Free Essay

Als Icebucketchallenge

...DATA PREMIER LEAGUE Case 2: ALS IceBucketChallenge Objective: Sentiment analysis of twitter tweets and facebook posts during the Ice Bucket Challenge ALS Ice bucket challenge is an activity which involves dumping of ice water on one’s head to promote awareness of the disease ALS as an alternative for donation. It went on viral during July and august 2014. Challenge encourages nomination of other kith and kin’s to do the same within 24 hrs. Methodology: Data Preprocessing: From the given data all redundancies were cleaned up. By using vector source in Corpus, we cleared punctuation marks, numbers, converting all the words into a single case (as it is casesensitive), removing stop words which do not make sense in the sentence, stripping out whitespace and http links were removed. Clearing all this unnecessary data, we get the content which makes actual sentiment overall in each post/tweet. Data Analysis: The overall sentimental score was developed using an algorithm which contains 7 liker scale using R tool by considering the standard Positive and negative words. Categorical analysis was performed using excel based API developed on the NLP algorithm used by Semantria to get individual categorical analysis as to how the emotions and trend was The statements were split into words and un-listed the results in a list of words. Matched these un-listed words to the Positive master list and this returns the indices of all the matched words. The attempt made here is...

Words: 552 - Pages: 3

Free Essay

Text Analytics

...Selection of the topic- Text Analytics Title- Using text analytics to improve the hospitality experience of customers. Key Words- Text analytics, content categorization, sentiment analysis, Abstract- With advance text analytics solutions, the hotels and hospitality providers can analyze conversations on the social media and online public forums to extract valuable business insights and using the same to improve their customer’s experiences into their hotels and with their services. Introduction- Today’s travelers are vocal and willing to share their experiences with hotel and travel providers; they’re more apt to share their experiences online with others through means of social media like- facebook & twitter, in online review sites such as tripadvisor.com etc. From check-in process to the quality of services, their feedbacks provide valuable insights that hospitality providers can improve the guest experience with their brands, better target customers with offers and differentiate ‘emselves from the competitors in terms of products and services. Collecting quantitative responses from the guests through surveys was the sole feedback method used by hotels and travel service providers. Of late, the trend has changed. These days, these providers recognize the value of collecting feedback through social media and other online sites. They even encourage open-ended comments in their surveys these days. With thousands of reviews generated each day, compiling and interpreting...

Words: 890 - Pages: 4

Free Essay

What Can Business Learn from Text Mining

...Chapter 6 Case I  Interactive Session : Technology WHAT CAN BUSINESSES LEARN FROM TEXT MINING 1. What challenges does the increase in unstructured data present for businesses? Text mining enables many companies to respond to their customers satisfaction surveys, and web mining enables many web search engines to facilitate collecting data that people need to be more profitable. Now, a huge amount of unstructured data is distributed by these systems. A manager is able to use this system and make an accurate decision for unprecedented cases. information Business intelligence tools deal primarily with data that have been structured in databases and files. However, unstructured data, mostly the kind of data we generate in e-mails, phone conversations, blog postings, online customer surveys, and tweets are all valuable for finding patterns and trends that will help employees make better business decisions.  Text mining tools are now available to help businesses analyze these data. These tools are able to extract key elements from large unstructured data sets, discover patterns and relationships, and summarize the information. Businesses might turn to text mining to analyze transcripts of calls to customer service centers to identify major service and repair issues. 2. How does text-mining improve decision-making? Text mining system enables airlines to rapidly extract customer sentiments, preferences, and requests for example, when the airlines suffered from unprecedented...

Words: 532 - Pages: 3

Premium Essay

Data Management

...processing) Star schema What is OLAP (online analytical processing) Fact table OLAP (online analytical processing) is computer processing that enables a Big data analytics Data modeling Ad hoc analysis user to easily and selectively extract and view data from different points of view. For example, a user can request that data be analyzed to display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Data visualization Extract, transform, load (ETL) Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas a relational database can be thought of as two-dimensional, a multidimensional database considers each data attribute (such as product, geographic sales region, and time Association rules (in data mining) Relational database period) as a separate "dimension." OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into subattributes. Denormalization OLAP can be used for data mining or the discovery of previously Master data management (MDM) undiscerned relationships between data items. An OLAP database does not Predictive modeling ...

Words: 4616 - Pages: 19

Premium Essay

Rexer

...Rexer Analytics 4th Annual Data Miner Survey – 2010 Survey Summary Report – For more information contact Karl Rexer, PhD krexer@RexerAnalytics.com www.RexerAnalytics.com Outline •  Overview & Key Findings •  Where & How Data Miners Work •  What’s Important to Data Miners •  Data Mining Tools: Usage & Satisfaction •  Overcoming Challenges & Optimism about the Future •  Appendix: Where do Data Miners Come From? •  Appendix: Rexer Analytics © 2011 Rexer Analytics 2 Overview & Key Findings © 2011 Rexer Analytics 3 2010 Data Miner Survey: Overview Vendors Corporate •  Fourth annual survey NGO / Gov’t •  50 questions •  Data collected online in early 2010 Academics Consultants •  10,000+ invitations emailed, plus promoted by newsgroups, vendors, and bloggers •  Respondents: 735 data miners from 60 countries Note: Data from tool vendors (companies making data mining software) was excluded from many analyses. © 2011 Rexer Analytics Central & South America (4%) •  Columbia 2% •  Brazil 1% Asia Pacific •  India 4% •  Australia 3% •  China 2% Middle East & Africa (3%) •  Israel 1% •  Turkey 1% North America •  USA 40% •  Canada 4% Europe •  Germany 7% •  UK 5% •  France 4% •  Poland 4% 4 Key Findings •  FIELDS & GOALS: Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past four years. Fittingly, “improving the understanding of customers”...

Words: 4802 - Pages: 20

Premium Essay

Bpcl

...AND ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big...

Words: 16335 - Pages: 66

Premium Essay

Business Intelligence

...INTELLIGENCE AND ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big data analytics, Web 2...

Words: 16335 - Pages: 66

Free Essay

Business Management

...include relating data management to multimedia and document management, explaining the concept of data warehousing, data mining, analytical processing, and knowledge discovery management. An Overview Section 12.1 – The Need for Business Intelligence – The section serves as an overview of Business Intelligence and its use in business. It discusses the problems associated with disparate data stores where data are not integrated into a single reporting system. The section discusses the technologies involved in Business Intelligence and the vendors involved. It also talks about predictive analytics, alerts and decision support. Section 12.2 – BI Architecture, Reporting and Performance Management – This section discusses the modes of data extraction and integration into a standardized, usable and trustworthy one. It also discusses the different types of reporting systems available to organizations, data mining, query and analysis. The section provides an insight into Business Performance Management (BPM) as a way for business managers to know if their organizations are achieving their strategic goals Section 12.3 – Data, Text and Web Mining and BI Search – This section discusses data mining technology, tools, and techniques. Information types, data mining applications, text mining, and web mining are explored. There is also a discussion of the failures of data mining. Section 12.4 – Managers and Decision-Making Processes – This section delves into the roles played by business managers...

Words: 5712 - Pages: 23