The Logistics Nightmare of “Cheap” Data Storage
Data used to be expensive. Really expensive.
Some of the early computers could only store kilobytes of data, and it was cost-prohibitive to add more storage capacity. In the early 2000s, a 16MB USB flash drive would set you back at least $20. Hard drives being sold with gigabytes of data storage capacity was a dream.
Today, you can walk into any retailer with electronics and get a 128GB USB flash drive for the same price. Hard drives contain terabytes of data storage. Organizations have realized the enormous potential value achieved from collecting data. It can be analyzed to discover new insights, make data-backed decisions, and identify tends that boost efficiency and innovation. Given the negligible price of storage, they have begun historizing much of their data produced for future use.
But we pay for cheap data storage in other ways. What’s the trade off? Simple: extrapolating useful insights from the data is incredibly expensive.
The problem with this dynamic is that everyone was told data is cheap and to just store everything. This has created what I like to call a data blackhole, where data enters but is never retrieved again.
Edge Computing
But why is it never seen again? Again, the answers are simple. It can be very difficult to find which data is needed, or perhaps the organization method was insufficient and convoluted. Don’t forget the very human factor. Maybe whoever came before just named it something else, whether on purpose or accident.
Data storage has had plenty of time to become perfected and matured at scale. In that same time, a new more powerful resource has become cheap: Compute power itself. For example, a Raspberry PI is a very capable computer that costs about $35. Or even for the same money per month, a cloud provider will host a computer for you.
Compute power can and should be deployed much more throughout the shop floor, just like data collection. With computing becoming as affordable as it has, it can be deployed strategically to process data at the source of its generation. This can be achieved by integrating equipment first with a compute device, often called an edge node.
The benefit of edge computing is that it provides both compute power and data storage that’s designated for specific equipment, which allows that equipment to be decoupled from the rest of the manufacturing site. The equipment works locally, and the edge node then takes care of sending data to the right place. If something happens to the data’s intended destination, the edge node will simply store that data until it can be safely sent.
Preventing the Data Blackhole
So how can adding compute on top of equipment prevent a data blackhole?
This can be achieved by taking the disparate data formats provided via the plethora of equipment and adding a model on top, so the data has value, standardization, and contextualization the instant it is stored.
For example, a bioreactor has many synonyms and acronyms (BRX, BioRX, SUB, Wave Bioreactor, etc.). If each instance of storing data for a bioreactor uses a different synonym, it will be hard to find and compile all the information to compare bioreactors to each other. However, if the whole organization can agree on a particular vernacular, it will be simple to piece things together.
That same concept applies to the data storage. With a proper data model to store data, a data blackhole never happens in the first place. The data is much easier for an end user to understand, thus can more easily be queried and shared between people and systems.
Data Contextualization is Key
Another key benefit of adding compute power is the added data buffering capabilities. The computer can be setup to know if there is a current connection to the central data repository. If it fails, it can store that data and send it upon reestablishing the connection. This reduces the risk that data could get lost if equipment becomes disconnected from the network. This risk has been a concern for the plant floor ever since electronic data has been collected—and it can be reduced to practically nothing using compute on the edge.
The contextualization and buffering of data will enable manufacturing personnel to make more informed decisions. The data will be centrally available for anyone in the organization to utilize and easy to find since it’s contextualized. It unlocks possibilities like monitoring equipment remotely, notifying users if there is an alarm, and doing what we started this article talking about, storing the data because it’s cheap.
Try It in Parallel!
When exploring your next equipment integration project, consider giving this methodology a try to see the benefits it can provide to your organization. It can even be done in parallel to the traditional methods, so they can be compared to each other. The results may surprise you!

