The number of companies building data architectures has exploded in the 2020s and that grow is unlikely to slow down soon. That’s because more data is available than ever before.
According to a 2023 BCG study “the volume of data generated approximately doubled from 2018 to 2021 to about 84 ZB.”
1. What is Big Data, and how can it help you?
“Big” in Big Data is not just about the size of the data. According to “The Cloud Data Lake” by Rukmani Gopalan [O’Reilly, 2023], we can describe data with the “six Vs”:
- Volume: amount of data generated and stored.
- Variety: wide range of data sources and formats. Structured data (from relational database), semi-structured data (logs, CSV, XML, and JSON), unstructured data (emails, documents, and PDFs), binary data (images, audio, video).
- Velocity: speed at which data is generated and processed. Batch processing and streaming processing (real-time).
- Veracity: accuracy and reliability of data. Unreliable or incomplete sources can damage the quality of the data.
- Variability: consistency of data in terms of its format, quality, and meaning.
- Value: usefulness and relevance of data.

2. Data Maturity
It’s important for companies to understand where they are in their journey to use data compared to other companies. This is called Data Maturity.
Many industries use the term Digital Transformation: it refers to how companies embed technologies across their business to drive fundamental change in the way they get values out of data and how they operate and deliver value to customers. Digital Transformation can be broken into 4 stages, called the enterprise data maturity stages: they describe the level of development and sophistication an organization has reached in managing, utilizing, and deriving value from its data. This model is a way to assess an organization’s data management capabilities and readiness for advanced analytics, artificial intelligence, and other data-driven initiatives.
- Reactive. Company has data scattered all over. This situation is called spreadmart (”spreadsheet datamart”): informal, decentralized collection of data. Spreadmart suffers from data inconsistency, lack of governance, limited scalability, and inefficiency.
- Informative. Company starts to centralize data, making analysis and reporting much easier. Data is usually not very scalable. Most companies are at stage 2, especially if their infrastructure is still on-prem.
- Predictive. Companies have moved to the cloud and have built a system that can handle larger quantities of data, different types of data, and data is ingested more frequently. Decisions are improved by integrating machine learning to make decisions in real-time.
- Transformative. Companies can handle any data, no matter size, speed, or size.