Free Essay

Big Data

In: Computers and Technology

Submitted By neworiental
Words 1405
Pages 6
The Situation of Big Data Technology
Yu Liu
International American University
BUS 530: Management Information Systems
Matthew Keogh
2015 Summer 2 - Section C

Introduction In this paper, I will list the main technologies related to big data. According to the life cycle of the data processing, big data technology can be divided into data collection and pre-processing, data storage and management, data analysis and data mining, data visualization and data privacy and security, and so on.
The reason I select topic about big data My major is computer science and I have taken a few courses about data mining before. Nowadays more and more job positions about big data are showing at job seeking website, such as Monster.com. I am planning to learn some mainstream big data technologies like Hadoop. Therefore, I choose big data as my midterm paper topic.
Big data in Google Google's big data analytics intelligence applications include customer sentiment analysis, risk analysis, product recommendations, message routing, customer losing prediction, the classification of the legal copy, email content filtering, political tendency forecast, species identification and other aspects. It is said that big data will generate $23 million every day for Google. Some typical applications are as follows: Based on MapReduce, Google's traditional applications include data storage, data analysis, log analysis, search quality and other data analytical applications. Based on Dremel system, Google introduced its powerful data analysis software and services - BigQuery. It is also a part of Google's Internet search services. Google has started selling online data analysis services. Based on statistical algorithms, Google search engine can provide services like type writing error correction and statistical machine translation. Google Instant. Google Instant will predict possible search results and show them to users.
Development of big data technologies
Data collection and pre-processing There are three main sources for data collection: management information systems, web information systems and physical information systems. Management information systems are information systems within the enterprises, institutions, such as transaction processing systems, office automation system. They are mainly used for the operation, management and providing support for a particular user. Data can be input by users and also by a system processing. Usually, data in management information systems are structured. Web information systems include social networking, social media, search engines and so on. They are mainly used for constructing virtual space and providing information and social services to users. Data are generated by online users. Physical information systems refer to information systems on various physical objects and physical processes, such as real-time monitoring and real-time detection. They are mainly used for production control, process control, environmental protection and so on. Data are generated by various embedded sensors, which can be physical, chemical and biological measured value and also can be audio, video and other multimedia data. Different data sets may have different structures, such as files, XML, trees, lists, tables and so on. These different structured data need further data integration and data consolidation and get converted into a new data set for future analysis and process.
Data storage and management The most applicable technology to data storage and management now is distributed file systems and database. Distributed file systems store data into distributed devices or nodes through the network. Here is the distributed file systems that several big companies are using: Company using | Distributed System | Google | GFS&Bigtable | Amazon | Dynamo | Microsoft | Azure | Yahoo | PNUTS | Distributed databases store data into each database management system. NoSQL (referring to "non SQL") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in traditional relational databases. NoSQL increases system performance and scalability. According to the classification of the data management pattern, NoSQL can be divided into three categories: key-value pair databases, document databases and graph databases. Key-value pare databases include BigTable,Dynamo, HBase, Gemfire, Redis, and Cassandra. Document databases include MongoDB and Couchbase. Graph databases inlcude Neo4j. Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. Hive is a SQL engine based on MapReduce. Basic principle is to accept SQL, parse SQL and translate them into multiple MapReduce tasks, and then implement basic SQL operations through MapReduce.
Data analysis and mining To solve the problem that Traditional data analysis software has bad scalability and Hadoop's analysis features are very weak, IBM researchers are working on the integration of system R and Hadoop. R is an open source software for the statistical analysis. After the deep integration of R and Hadoop, the data can be computed and processed concurrently and also Hadoop has the ability to handle deep analysis. Some other researchers are working on the integration of Weka (a software, which is similar to the system R, used for machine learning and data mining) and MapReduce. After the deep integration of Weka and MapReduce, in the MapReduce cluster, Weka can handle much more data than before, its performance is highly improved through concurrent computation, and MapReduce has the capability to handle deep analysis.
Data visualization To implement data visualization, data needs to be transformed through software into the images that users can observe and analyze. And then, users can combine with their own background knowledge to understand and analyze the connotation and characteristics of data. Users also can change the settings of visualization systems to gain understanding of different aspects of the data. Visualization can quickly and effectively simplify and refine data streams and helps users to filter large amounts of data. There are four basic technologies in data visualization: data streaming, task parallelism, pipeline parallelism and data parallelism. Data streaming will divide the data into separate blocks and process them in turn. It is the main measure when the data scale is much larger than the computing resources. It is able to handle any scale of data but usually it requires a very long processing time so that it is difficult to provide a data mining. Task parallelism is the parallel processing of multiple independent tasks. This method requires an algorithm to decompose the task to subtasks and requires multiple computing resources. The method of pipeline parallelism will simultaneously process different data sub-block of multiple independent tasks. Data parallelism is the parallel processing of the data sub-blocks. This kind of method can reach very high level parallelism and at the time of increased nodes computing can achieve better extensibility.
Data privacy and security Using file access control to restrict operations on the data can solve the data security issue on certain level. Encryption on the infrastructures will keep the storage devices of big data safe, but it cannot solve the essence problem of data security. Anonymization is another security technology. It can be applied to all types of data and applications. Its algorithms have high versatility and can guarantee the authenticity of data. Data encryption can guarantee the authenticity and reversibility of data and it is nondestructive to data. It provides the high level of privacy protection. It is mainly applied to the distributed data mining and operation. But data encryption's cost is very large and cannot be applied to all big data. Reversible replacement algorithm can guarantee the authenticity of data and has very good efficiency. It normally is applied to the privacy protection for large-scale data center systems.
Conclusion
Big data are changing our lives vastly. The data ultimately will grow bigger and more complex, and our analysis systems and solutions need the support of latest and most advanced technologies and also need more professional staff to devote their efforts. As a IT staff, I hope I can contribute my part on the big data research or applications some day in the near future.

References
Crawford, Kate (2014,Jan) Big Data And Due Process: Toward A Framework To Redress Predictive Privacy Harms

Keim, Bernie (2015, Feb) Big data, big hype

Brands, Kristine M, CMA (2014, Dec) Data Visualization and Discovery

McKendrick, Joe (2012, Jun) First, There Was Big Data; Now Comes BIG STORAGE Database Trends and Applications Pages 6-8,10

Raste, Ketaki Subhash (2014, Dec) Big Data analytics - Hadoop performance analysis

Similar Documents

Free Essay

Big Data

...A New Era for Big Data COMP 440 1/12/13 Big Data Big Data is a type of new era that will help the competition of companies to capture and analyze huge volumes of data. Big data can come in many forms. For example, the data can be transactions for online stores. Online buying has been a big hit over the last few years, and people have begun to find it easier to buy their resources. When the tractions go through, the company is collecting logs of data to help the company increase their marketing production line. These logs help predict buying patterns, age of the buyer, and when to have a product go on sale. According to Martin Courtney, “there are three V;s of big data which are: high volume, high variety, high velocity and high veracity. There are other sites that use big volumes of data as well. Social networking sites such as Facebook, Twitter, and Youtube are among the few. There are many sites that you can share objects to various sources. On Facebook we can post audio, video, and photos to share amongst our friends. To get the best out of these sites, the companies are always doing some type of updating to keep users wanting to use their network to interact with their friends or community. Data is changing all the time. Developers for these companies and other software have to come up with new ways of how to support new hardware to adapt. With all the data in the world, there is a better chance to help make decision making better. More and more information...

Words: 474 - Pages: 2

Free Essay

Big Data

...Article Summary - Data, data everywhere Data 2013.10.01 | Major Media Communication | Subject Understanding Digital Media | Student no 2010017713 | Professor Soochul Kim | Name Eunkang Kim | Double-side of a vast amount of information in accordance with development of technology is treated in this article. Even now, a lot of digital information beyond imagination is being accumulated all over the world. Not only the amount of information is increasing, but the production rate of one is also getting speedy. This explosion of information has some reasons. The main reason is technology development. It can actualize things which were impossible in the past. The digital technology changes a lot of information into digitization. Also, many people utilize them with the powerful mean digital device. Men communicating by information contributed to increase the amount of information. Humans who escaped from illiteracy and economic hardship have generated many kinds of information, which are utilized in several fields such as politics, economy, law, culture, science, and so on. The production rate of information is faster than the speed of technology development. Though the digital devices handling the information are getting various, storage space is not enough to store the increased information. Sea is not calm, but it has that big waves. Likewise, lots of information comes to our life. It is important to judge what information......

Words: 614 - Pages: 3

Free Essay

Big Data

...Lecture on Big Data Guest Speaker Simon Trang Research Member at DFG RTG 1703 and Chair of Information Management Göttingen University, Germany 2014 The City City of Göttingen • Founded in the Middle Ages • True geographical center of Germany • 130,000 residents Chair of Information Management Lecture on Big Data at Macquarie University 2 2 The University Georg-August-Universität Göttingen (founded in 1737) • • • • One of nine Excellence Universities in Germany 13 faculties, 180 institutes 26,300 students (2013) 11.6% students from abroad (new entrants: approximately 20%) • 13,000 employees (including hospital and medical school), including 420 professors • 115 programs of study from A as in Agricultural Science to Z as in Zoology are offered (73 bachelor / 22 master programs) Chair of Information Management Lecture on Big Data at Macquarie University 3 “The Göttingen Nobel Prize Wonder” Over 40 Nobel prize winners have lived, studied, and/or lived, studied or/and researched 41 Prize researched at the University of Göttingen, among them… at the University of Göttingen, among them… • • • • • • • • • • • • • • Max von Laue, Physics, 1914 Max von Laue, physics, 1914 Max Planck, physics, 1918 Max Planck, Physics, 1918 Werner Heisenberg, physics, 1932 Werner Heisenberg, Physics, 1932 Otto Hahn, chemistry 1944 Otto Hahn, Chemistry 1944 Max Born, physics, 1954 Max Born, Physics, 1954 Manfred Eigen, chemistry, 1967 Manfred Eigen, Chemistry, 1967......

Words: 1847 - Pages: 8

Free Essay

Big Data

...Big Data Management: Possibilities and Challenges The term big data describes the volumes of data generated by an enterprise, including Web-browsing trails, point-of-sale data, ATM records, and other customer information generated within an organization (Levine, 2013). These data sets can be so large and complex that they become difficult to process using traditional database management tools and data processing applications. Big data creates numerous exciting possibilities for organizations, but along with the possibilities, there are challenges. Managers must understand the pitfalls and limitations, as well as the potential of big data (Levine, 2013). The focus of this report is the business potential and implications of big data as well as understanding the challenges and limitations of big data management. The potentials for big data are numerous; however, in this report only five potentials and implications for use are discussed. These include the following: knowledge management, social media, in travel, banking, and marketing and advertising. Knowledge Management One of the greatest potential for big data is knowledge management. A goal of knowledge management is the ability to integrate information from multiple perspectives to provide the insights required for valid decision-making such as where to invest marketing dollars, how much to invest, or whether to expand into a new geographic market (Lamont, 2012). In terms of knowledge management, three......

Words: 1175 - Pages: 5

Premium Essay

Big Data

...examine the definition of big data. It also seeks to examine the components of a Unified Data Architecture and its ability to facilitate the analysis of big data. 2 WHAT IS BIG DATA Cuzzocrea, Song and Davis (2011) defined big data in part as being “enormous amounts of unstructured data produced by high-performance applications falling in a wide and heterogeneous family of application scenarios”. In recent years there has been an increasing interest and focus on big data. Many and varied definitions have been proposed but without a consensus on a single definition. The MIT Technology Review (2014), brought attention to the work of Ward and Barker (2014) which examined a number of definitions of big data that have attracted some general ICT industry support from leading ICT industry analysts and organisations such as Gartner, Oracle and Microsoft. In their work they proposed to provide a “concise definition of an otherwise ambiguous term”. The author having just attended a digital government conference with a large proportion of big data tagged presentations also noted that no single definition was offered. There was however a common content theme that supported the Ward and Barker definition of: “Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.” 3 UNIFIED DATA ARCHITECTURE 3.1 WHAT IS THE UNIFIED DATA ARCHITECTURE? The......

Words: 579 - Pages: 3

Premium Essay

Big Data

...I. Big data emerging factor in IT area A. World’s notice for big data An appearance of tablet PC and social media was the hottest issue in IT market in last year. There are some successful global companies that go along the trends although it is not that long period since they appeared in the world, such as Apple, Google, Facebook, and Twitter. They have something in common. That is, they are based on ‘Big Data’ technology. As a result of using ‘big data’, the amount of stored data by their big data system during 2012 is much more than that of data which had been produced and stored until 2011. It helps to solve several problems in the company. Due to the geometrical increase of the amount of data, the important of big data will be continuous. Big data is selected as one of noticeable keyword in 2013 IT area with mobility, social, and cloud. It will be main factor of growth of IT infrastructure in the medium to longer term and is expected to provide new strategic superiority for many companies. It is highly acclaimed at the domestic market and also the foreign market. Several successful cases of applying big data shows that it can be positive factor helping to recover global economy. Moreover, it is not limited to IT-related business but the introduction in various areas will create value. B. Background of emerging big data In fact, there are many efforts to extract meaningful information through collection and analysis of huge amount of data. Through this......

Words: 2394 - Pages: 10

Premium Essay

Big Data

...have largely penetrated the communication industry and have since overtaken the use of computers in accessing the internet (Australian Communications and Media Authority, 2012). Consequently, business organizations have since devised better marketing and planning strategies by utilizing Big Data facilities and technologies whereby businesses are capable of deriving user requirements based on the searches potential users conduct on their mobile devices. From our initial report, we were able to highlight how Big Data is utilized in an organization and the accrued advantages against disadvantages of implementing Big Data technologies. We shall begin this report by first responding to the issues raised by management and then continue to make recommendations on the utilization of Big Data. Addressing Feedback Big Data technologies are fairly new to this organization and thus management was bound to raise issues concerning implementation and feasibility of the project. In this section, we shall briefly highlight these issues and how they may be addressed to achieve the organization’s objectives cost effectively. These issues include; i. Cost of implementing Big Data technologies – Big Data...

Words: 1262 - Pages: 6

Premium Essay

Big Data

...Big Data and its Effects on Society Kayla Seifert MGT-311 November 23, 2015 Big Data is a concept that has existed for a while but only gained proper attention a couple of years ago. One can describe Big Data as extremely large data sets that have grown so big that it becomes almost impossible to manage and analyze with traditional data processing tools. Enterprises can use Big Data by building new applications, improving the effectiveness, lowering the costs of their applications, helping with competitive advantage, and increasing customer loyalty. It can also be used in other industries to enable a better system and better decision-making. Big Data has become a valuable asset to everyone around the world and continues to impact society today. The ideology of Big Data first came up in the days before the age of computers, when unstructured data were the norm and analytics was in its infancy. The first Big Data challenge came in the form of the 1880 U.S. census when the information involving about 50 million people being gathered, classified, and reported. This census contained a lot of facts to deal with, however, limited technology was available to organize and manage it. It took over seven years to manually put the data into tables and report on the data. Thanks to Big Data, the 1890 census could be placed on punch cards that could hold about 80 variables. Instead of seven years, the analysis of the data only took six weeks. Big Data allowed the......

Words: 1697 - Pages: 7

Premium Essay

Big Data

...Big Data is Scaling BI and Analytics How the information surge is changing the way organizations use business intelligence and analytics Information Management Magazine, Sept/Oct 2011 Shawn Rogers Like what you see? Click here to sign up for Information Management's daily newsletter to get the latest news, trends, commentary and more. The explosive growth in the amount of data created in the world continues to accelerate and surprise us in terms of sheer volume, though experts could see the signposts along the way. Gordon Moore, co-founder of Intel and the namesake of Moore's law, first forecast that the number of transistors that could be placed on an integrated circuit would double year over year. Since 1965, this "doubling principle" has been applied to many areas of computing and has more often than not been proven correct. When applied to data, not even Moore's law seems to keep pace with the exponential growth of the past several years. Recent IDC research on digital data indicates that in 2010, the amount of digital information in the world reached beyond a zettabyte in size. That's one trillion gigabytes of information. To put that in perspective, a blogger at Cisco Systems noted that a zettabyte is roughly the size of 125 billion 8GB iPods fully loaded. Advertisement As the overall digital universe has expanded, so has the world of enterprise data. The good news for data management professionals is that our working data won't reach zettabyte scale for......

Words: 2481 - Pages: 10

Premium Essay

Big Data

...Big Data is a massive volume of data. It's usually so massive that it becomes complicated to comprehend using tools such as on-hand database, and traditional data processing applications. Some problems that come up are storage, sharing, analysis, and search.Even though these problems do occur it still can be helpful in business operations, and better business decisions. This data can also help give companies informations which can increase profit, bring more customers, and overall increase the business's value. Characteristics of Big Data include the five V’s. The first one is volume, which is the quantity of data. The second is Variety, which the type of Data. The third is velocity, which is the speed of the data is gathered. The fourth one Variability, which is inconsistency of data can hamper processes to manage it. The final one is Veracity, which is the quality of data captured can vary. These data sets are growing rapidly mainly because they are gathered at a fairly cheap. The world's technological per-capita are doubling every 40 months. Business intelligence with data with high information density to look for trends. Big Data also increased information management specialist. Some of the largest companies like IBM and Microsoft spent over 15 billion dollars on software firms which specialize in data analytics. Governments use big data because it's efficient in terms of productivity and innovation. While gathering big data is a big benefit there are also some......

Words: 293 - Pages: 2

Premium Essay

Big Data

...Introduction to Big data Every day, 2.5 quintillion bytes of complex, every changing data are generated. (IBM) Data comes from social sites, digital images, transaction records, and countless unknown resources. The amount of data we generate daily is enormous, and the rate it is being generated is accelerating. As we head into a future where technology dominates the global market, this pace will only continue accelerate. Businesses and other entities are aware of this data and its power. In a survey taken by Capgemini and the Economist, over 600 global business leaders identified their companies as data driven and identified data analytics as an integral part of their business. Big Data solutions are considered the answer for handling this data converting it into useful information. According to the O'Reilly Radar Team (Big Data Now), Big Data consists of three variables – size, velocity and variety. Data is considered big if conventional systems cannot handle its size. It is not only that size of Big Data that matters, but also the volume of transactions that come with it. The second issue is how fast the data is generated and how fast if it changes (velocity). New data and updated data is constantly generated, and it must be processed and analyzed quickly to create real value for an organization. The final issue is data structure (variety). Data is typically collected in raw form, unstructured, from a variety of sources. To acquire useful information, data needs to be......

Words: 2909 - Pages: 12

Premium Essay

Big Data

...era of ‘big data’? Brad Brown, Michael Chui, and James Manyika Radical customization, constant experimentation, and novel business models will be new hallmarks of competition as companies capture and analyze huge volumes of data. Here’s what you should know. The top marketing executive at a sizable US retailer recently found herself perplexed by the sales reports she was getting. A major competitor was steadily gaining market share across a range of profitable segments. Despite a counterpunch that combined online promotions with merchandizing improvements, her company kept losing ground. When the executive convened a group of senior leaders to dig into the competitor’s practices, they found that the challenge ran deeper than they had imagined. The competitor had made massive investments in its ability to collect, integrate, and analyze data from each store and every sales unit and had used this ability to run myriad real-world experiments. At the same time, it had linked this information to suppliers’ databases, making it possible to adjust prices in real time, to reorder hot-selling items automatically, and to shift items from store to store easily. By constantly testing, bundling, synthesizing, and making information instantly available across the organization— from the store floor to the CFO’s office—the rival company had become a different, far nimbler type of business. What this executive team had witnessed first hand was the gamechanging effects of big data. Of......

Words: 3952 - Pages: 16

Premium Essay

Big Data

...Big Data Big Data and Business Strategy Businesses have come a long way in the way that information is being given to management, from comparing quarter sales all the way down to view how customers interact with the business. With so many new technology’s and new systems emerging, it has now become faster and easier to get any type of information, instead of using, for example, your sales processing system that might not get all the information that a manger might need. This is where big data comes into place with how it interacts with businesses. We can begin with how to explain what big data is and how it is used. Big data is a term used to describe the exponential growth and availability of data for both unstructured and structured systems. Back in 2001, Doug Laney (Gartner) gave a definition that ties in more closely on how big data is managed with a business strategy, which is given as velocity, volume, and variety. Velocity which is explained as how dig data is constantly and rapidly changing within time and how fast companies are able to keep up with in a real time manner. Which sometimes is a challenge to most companies. Volume is increasing also at a high level, especially with the amount of unstructured data streaming from social media such as Facebook. Also including the amount of data being collected from customer information. The final one is variety, which is what some companies also struggle with in handling many varieties of structured and unstructured......

Words: 1883 - Pages: 8

Premium Essay

Big Data and Data Analytics

...Big Data and Data Analytics for Managers Q1. What is meant by Big Data? How is it characterized? Give examples of Big Data. Ans. Big data applies to information that can’t be processed or analysed using traditional processes or tools or software techniques. The data which is massive in volume and can be both structured or unstructured data. Though, it is a bit challenging for enterprises to handle such huge amount fast moving data or one which exceeds the current processing capacity, still there lies a great potential to help companies to take faster and intelligent decisions and improve operations. There are three characteristics that define big data, which are: 1. Volume 2. Velocity 3. Variety * Volume: The volume of data under analysis is large. Many factors contribute to the increase in data volume, for example, * Transaction-based data stored through the years. * Unstructured data streaming in social media. * Such data are bank data (details of the bank account holders) or data in e-commerce wherein customers data is required for a transaction. Earlier there used to data storage issues, but with big data analytics this problem has been solved. Big data stores data in clusters across machines, also helping the user on how to access and analyse that data. * Velocity: Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal......

Words: 973 - Pages: 4

Premium Essay

Big Data

...Big Data/Predictive Analytics First Last Name Name of the Institution Big Data/Predictive Analytics Introduction There has been a controversial debate about the big data and the predictive analytics. With the evolution of technology and innovation, one fact needs to be appreciated that, the concept of the big data and the predictive analytics is here to stay. So it is up to the users to learn to deal with it and manage it to offset any adverse effects that may result. The proponents of the big data argue that the big data is advantageous, and the 21st-century generation benefits more from the big data and predictive analytics than the harm that the big data poses to their lives. The bottom line of the matter, however, is that, big data interferes with human’s privacy, ethics, and any unauthorized third party can access the personal data for evil purposes or their benefits. The definition of the big data takes the “3V” form; High-volume, high-variety and high-velocity information that demand the innovative forms of processing, cost-effective for improved insight and decision making. This technological definition does not encompass the societal aspect and. Therefore, it can be argued to be one-sided definition. To incorporate the societal aspect, the definition needs to be viewed in a broader manner so that the aspect of data analytics can come in. In this regard, the two terms can work together so that a meaning of full terms big data/ data analytics can denote the cloud...

Words: 4196 - Pages: 17