In today's data rich society, we are presented with the luxury of storing all data, both old and new, into a data repository and going with the flow. Up until most recent years, the process of storing and sorting data was limited to following the ETL (extract, transform, load) design philosophy, which lead to transforming and summarising diverse data sets in order to populate data marts and data warehouses.
This process limited the storage of excessive data as each attribute or entity that was in the data warehouse or data mart was carefully thought through, justified and usage of it clearly articulated; the definitive value of stored data was determined.
However, this new concept of a data lake allows for vast amounts of diverse, raw data to be collected and stored in the data lake under the presumption that, in the future, there will be a need for it to solve problems and deliver answers to questions we have not thought about yet; the perceived value of data. Once needed, the data lake will have the ability to organise the required data, know where it came from, and define its value.