Data Vault is a modeling technique which can be used in data warehouse projects almost regardless of the project methodology used where DV2.0 claims to be a data warehousing methodology by itself.
Data Vault was developed by Dan Linstedt in the 1990’s. He published his new approach to model data for an enterprise data warehouse (EDW) on July 1st 2002. He followed Bill Inman who also modelled his data ware house with normalized tables. Furthermore Linstedt added a star schema data modeling approach to his 3NF-Tables. Other modeling techniques that are used by Linstedt are for example minimally redundant data sets and information hubs, which included business funciton keys. The components of Data Vault 1.0 are:
A Hub is holding the primary or business keys, at least. Those keys are used in daily business transactions like „customer number“, „employee number“ or „item number“. A link is the physical equivalent of a many-to-many relationship. So it carries the business or surrogate keys of 2 or more Hubs. Furthermore the links have timestamp which records when the link between those tables was first set. A Satelite is storing all descriptive attributes of the Hubs and Links with a start date and end date. Those attributes are also called metadata.
Unlike Data Vault 1.0, Data Vault 2.0 is a data warehousing methodology. The new aspects of this approach are the use of reference tables, improvement of Link usage, the use of hash keys as surrogate keys. Data Vault also runs on a NoSQL database. Since Data Vault 1.0 was more about the relationship between Hubs and Satelites Data Vault 2.0 is focusing more on the 3NF modelling to build up fixed datasets or more historical data. Next to the modeling and technical improvements there are also some project management improvements. Data Vault now supports agile project management like SCRUM with 2 or 3 week iterations. It also provides templates for making clear of your business rules and how to use and implement them. On the architectual aspect Data Vault 2.0 now provides a Tier3 / Tier 2 architecture and a Managed Self BI Service known as Operational Data Vault.
Start with modeling the Hubs. Next you are going to build the Link entites to build up a relationship between the Hubs. In the end you generate the Satellties to give contex to the business keys of the Hubs and to the realtions and transactions builded by the Links.
Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault by W.H. Inmon,Dan Linstedt (ISBN: 978-0-12-802044-9
Building a Scalable Data Warehouse with Data Vault 2.0 by DanLinstedt, Michael Olschimke