
Table of contents
Table of contents
Currently, the world is producing 16.3 zettabytes of data a year. According to IDC, by 2025 that amount will rise tenfold, to 163 zettabytes a year. But how big, exactly, is a zetta?
To imagine how much data data scientists and managers have to handle every single day, find something familiar – Earth’s atmosphere, the Solar System or the Milky Way. According to NASA, Earth’s atmosphere has a mass of approximately five zettagrams. So for every gram of gas around our planet, we produce here on Earth a bit more than 3 bytes of data each year. In 2025 there will be 30 bytes of data generated every year for each gram of air around the globe.
Usually, the distance between the stars or planets is measured in Astronomical Units, which are equal to the distance between the Sun and the Earth. One AU is about 150 million kilometers, or 150 gigameters. So if there were one byte of information for every meter between the Sun and Earth, there would be enough space for Windows Vista and a few useful apps (not many of them though). The distance between the Sun and Pluto is equal to 6 terameters, so if there were one byte of every meter between the Sun and Pluto there would be six terabytes of storage. That equals a bit more than Wikipedia’s SQL dataset in January 2010.
According to NASA, the entire Milky Way is 1000 zettameters wide. So assuming every meter could hold a byte, it would take about six years from 2025 to fill up all the Milky Way’s diameter with data.
That being the magnitude of the world’s data, is it any surprise that data scientists and businesses are seeking ways to manage the amount of data they’re dealing with?
1. Spark firing up the big data in business
The people who manage and harvest big data say Apache Spark is their software of choice. According to Microstrategy’s data, Spark is considered “important” for 77% of world’s enterprises, and critical for 30%. Spark’s importance and popularity are growing throughout industry In 2017, it surpassed MapReduce as the most popular infrastructure solution for handling big data. Considering that, learning how to leverage Spark to boost up big data management is profitable both for engineers and data scientists. [irp posts=”16874″ name=”Playing Atari with deep reinforcement learning – deepsense.ai’s approach”]2. Real-time data processing – challenge in a batch
Modern data science is not only about gaining insight, but doing so fast. All industries benefit from getting information in real time both to optimize existing processes and to develop new ones. The ability to react during an event is crucial to maintenance (preventing breakdowns), marketing (knowing when to reach out to someone) and quality control (getting things right on the producing line). Currently, internet marketing is the best playground for data streaming. Real time data is a key tool in augmenting marketing for 40% of marketers. In Real-Time Bidding (RTB), digital-ad impressions are sold at automated auctions. Both the buyer and the seller need a platform that provides delay-free, up-to-the-second data. What’s more, internet analytics rely on processing real-time data to build heatmaps, map digital customers’ journey and gather customers’ behavioral data. Real-time processing is unachievable with traditional, batch-based data processing. Spark makes it easy by unifying batch and streaming, enabling seamless conversion between the two modes of processing.3. From academia to business – productizing the models
AI and machine learning were once nothing more than academic playthings, as the models were too unstable and unreliable to handle business challenges. Integrating them in the enterprise environment was also tricky. Machine learning models, commonly trained using Python or R, often prove hard to integrate with an existing application built with, say, Java. But the Spark framework makes this integration easy, as it provides support for Scala, Java, Python, and R. It enables you to run your machine learning model right inside the data management solution to harvest insight in a faster, automated way. With productized models, AI is set to increase labor productivity by 40%. Thus, it’s no surprise that 72% of US business leaders consider AI a “business advantage”.