The Evolution of Big Data
Content Map
More chaptersThe concept of big data has been around for a long time. One of the earliest examples of humans storing lots of data in a central location was the Great Library of Alexandria in Alexandria, Egypt. The library was established sometime between 285-246 BC, and then destroyed during the Palmyrene invasion between 270 and 275 AUD. Fast forward to the 21st century, and the rate in which we collect, manage, and analyze data is faster than ever before – not to mention more complex. This is where the problem of big data comes in.
What is Big Data?
Big data is the large onset of structured, semi-structured, and unstructured data. It is data that arrives at a much higher volume, at a much faster rate, in a wider variety of file formats, and from a wider variety of sources, than that of structured data alone. The term ‘big data’ has been around since the late 1990s, when it was officially coined by NASA researchers Michael Cox and David Ellsworth in their 1997 paper, Application-Controlled Demand Paging for Out-of-Core Visualization. They used the term to describe the challenge of processing and visualizing vast amounts of data from supercomputers.
In 2001, data and analytics expert, Doug Laney, published the paper 3D Data Management: Controlling Data Volume, Velocity, and Variety, establishing the three primary components still in use today to describe big data: Volume (size of data), Velocity (speed in which data grows), and Variety (number of data types and sources with which the data comes from).
The History of Big Data
The emergence of data, and big data, is a long and storied history. There were many advancements in technology during World War 2, which were primarily made to serve military purposes. Over time though, those advancements would become useful to the commercial sector and eventually the general public, with personal computing becoming a viable option to the everyday consumer.
1940s to 1989 – Data Warehousing and Personal Desktop Computers
The origins of electronic storage can be traced back to the development of the world’s first programmable computer, the Electronic Numerical Integrator and Computer (ENIAC). It was designed by the U.S. army during World War 2 to solve numerical problems, such as calculate the range of artillery fire. Then, in the early 1960s, International Business Machines (IBM) released the first transistorized computer called TRADIC, which helped data centers branch out of the military and serve more general commercial purposes.
The first personal desktop computer to feature a Graphical User Interface (GUI) was Lisa, released by Apple Computers in 1983. Throughout the 1980s, companies like Apple, Microsoft, and IBM would release a wide range of personal desktop computers, which led to a surge in people buying their own personal computers and being able to use them at home for the first time ever. Thus, electronic storage was finally available to the masses.
1989 to 1999 – Emergence of the World Wide Web
Between 1989 and 1993, British computer scientist Sir Tim Berners-Lee would create the fundamental technologies required to power what we now know as the World Wide Web. These web technologies were HyperText Markup Language (HTML), Uniform Resource Identifier (URI), and Hypertext Transfer Protocol (HTTP). Then in April 1993, the decision was made to make the underlying code for these web technologies free, forever.
The result made it possible for individuals, businesses, and organizations who could afford to pay for an internet service to go online and share data with other internet-enabled computers. As more devices gained access to the internet, this led to a massive explosion in the amount of information that people could access and share data at any one time.
2000s to 2010s – Controlling Data Volume, Social Media and Cloud Computing
During the early 2000s, companies such as Amazon, eBay, and Google helped generate large amounts of web traffic, as well as a combination of structured and unstructured data. Amazon also launched a beta version of AWS (Amazon Web Services) in 2002, which opened the Amazon.com platform to all developers. By 2004, over 100 applications were built for it.
AWS then relaunched in 2006, offering a wide range of cloud infrastructure services, including Simple Storage Service (S3) and Elastic Compute Cloud (EC2). The public launch of AWS attracted a wide range of customers, such as Dropbox, Netflix, and Reddit, who were eager to become cloud-enabled and so they would all partner with AWS before 2010.
Social media platforms like MySpace, Facebook, and Twitter also led to a rise in the spread of unstructured data. This would include the sharing of images and audio files, animated GIFs, videos, status posts, and direct messages.
With such a large amount of unstructured data being generated at an accelerated rate, these platforms needed new ways to collect, organize, and make sense of this data. This led to the creation of Hadoop, an open-source framework created specifically to manage big data sets, and the adoption of NoSQL database queries, which made it possible to manage unstructured data – data does not comply with a relational database model. With these new technologies, companies could now collect large amounts of disparate data, and then extract meaningful insights for more informed decision making.
2010s to now – Optimization Techniques, Mobile Devices and IoT
In the 2010s, the biggest challenges facing big data was the advent of mobile devices and the IoT (Internet of Things). Suddenly, millions of people, worldwide, were walking around with small, internet-enabled devices in the palm of their hands, able to access the web, wirelessly communicate with other internet-enabled devices, and upload data to the cloud. According to a 2017 Data Never Sleeps report by Domo, we were generating 2.5 quintillion bytes of data daily.
The rise of mobile devices and IoT devices also led to new types of data being collected, organized, and analyzed. Some examples include:
- Sensor Data (data collected by internet-enabled sensors to provide valuable, real-time insight into the inner workings of a piece of machinery)
- Social Data (publicly available social media data from platforms like Facebook and Twitter)
- Transactional Data (data from online web stores including receipts, storage records, and repeat purchases)
- Health-related data (heart rate monitors, patient records, medical history)
With this information, companies could now dig deeper than ever into previously unexplored details, such as customer buying behavior and machinery maintenance frequency and life expectancy.
The Future of Big Data Solutions
While the future of big data is not entirely clear, there are current trends and predictions that can help shine a light on how big data will be managed soon. By far, the most prominent big data technology is AI (Artificial Intelligence) and automation, both of which are streamlining the process of database management and big data analysis, making it easier to convert raw data into meaningful insights that make sense to key decision makers.
Whether a company wants to collect consumer information or business analytics, big data analytics tools can help these companies keep up with the rapidly multiplying generation of data, turning meaningless data into powerful information and knowledge, significantly aiding in the decision-making process, and increasing the odds of predicting future outcomes.
Another massive hurdle for big data is ethical concerns. Over the years, government and nationwide legislation has standardized how companies and individuals can perform data collection, and use the data that they retrieve. Regulations like the GDPR (General Data Protection Regulation) are making it crystal-clear that customer privacy is a top priority, and so it is imperative that companies and individuals take data privacy seriously if they are to run their operations legally and avoid major fines. By using the latest tools to collect and analyze data, which is designed specifically to comply with such regulations, this can help companies stay safe and protect their sensitive customer and employee data.