Executive Summary

Working with big data

By Sara Royster

The primary purpose of this article is to inform readers on the topic of big data, including what it is, examples of its use, challenges of the field, and what it takes to be capable of analyzing large data sets.

Technological advances have now made it possible to collect data sets with exabytes of information. These data sets are referred to as big data, or data sets on which normal statistical methods of analysis cannot be used. Information in these data sets can either be structured or unstructured. Structured data encompasses information which can easily be organized into categories like financial figures. Unstructured data includes things which canít easily be categorized like customer comments on a feedback form or product reviews on a website.

Analysis of such large data sets is a difficult task left to data scientists. These are people with statistical knowledge and an understanding of computer programming and the field they are studying. They are responsible for looking through data and removing errors that would further complicate their analysis procedures as well as interpreting the results that their analysis yields. On average, data scientists make $76,270 per year which is double the national median income.

Big data is collected and analyzed in many different fields of business. Analysis of product reviews can help predict what products people want improved and what characteristics they desire. Interpretation of the data collected in the human genome project could give rise to medications designed for a specific person. Tracking what people post on social media and what products they buy or look at online could allow for more personalized advertising. These are just a few ways in which large data sets are being used.

Working with big data presents a volume of challenges. It is difficult to sort out usable information from unusable information, especially with unstructured data. A person interested in overcoming these challenges would need an in depth background in statistics and an understanding of computer programming. As in most aspects of life, it is also necessary for a data scientist to be interested in their work, be able to communicate their ideas effectively, and be able to work well in a team.