The snow flake schema is a specific type of a dimensional data model used in data warehouses. The model is a normalized structure, which means that redundant data is not stored in the dimension table, but is stored in more tables – in the snowflake to help with performance1).
The characteristic of a dimensional data model is a set of detailed business facts surrounded by multiple dimensions that describe those facts2). Dimensional data models are consist of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single part primary key that corresponds exactly to one of the components of the multipart key in the fact table. This standard structure of a dimensional data model is often called a Star Schema3). The snowflake schema is a specific dimensional data model which has been developed by normalizing dimension tables in the star schema. The method of normalizing dimension tables is often called “snowflaking”4). While in the star schema the dimension tables are directly connected to the fact table, in the snowflake schema one or more dimension tables do not join directly to the fact table but must be joined through other dimension tables5). A dimension is “snowflaked” when the low cardinality fields in the dimension have been removed to separate tables and linked back into the original table with artificial keys6). An example of a “snowflaked” product dimension table is shown in following picture7):
“Snowflaking” or normalization of the dimension tables can be achieved in a few different ways. By “snowflaking” the contents and the normal usage of each dimension table has to be examine. The following options indicate the different methods which may be considered for normalization of the dimension tables:
When all the dimension tables have been completely normalized, the resultant structure resembles a snowflake with the fact table in the middle8).
Advantages of the snow flake schema:
Disadvantages of the snow flake schema:
“Snowflaking” is not generally recommended in a data warehouse environment. Query performance takes the highest significance in a data warehouse and “snowflaking” makes the performance worse10).
In some situations when the technique of “snowflaking” is properly used, the performance and user understandability can be improved11). For example improvement of reporting performance for some business intelligence query tools, which prefer snowflake designs.
“Snowflaking”provide opportunities to separate out a set of attributes and form a subdimension in a database12).
Snowflake schemas support ease of dimension maintenance because they are partially or fully normalized. They reduce storage requirements, since dimensions have high volumes of data13).