Talk:Slowly changing dimension

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

I don't believe that SCD type six really exists and it is not because what the article is describing is incorrect. In fact the example described in the SCD type 6 is perfectly valid however I do not believe that there is a case where you would need this and it is not a result of lower or different granularity events being aggregated and merged. This is in effect something that can occur quite often and it has to be handled very carefully.

Imagine we have the example that the article is describing; a supply chain process that goes something like order->shipment->delivery. In this case, and based on what the article describes, the order, shipment and delivery can each in effect record different details of the supplier at the time each event occurs. We are in effect looking at three different facts with the same descriptive dimensions but each can have their own dimension values in time.

A better way to address this dimensional modeling problem would be:

Lets say that we use SCD type 2 for the supplier dimension:


Supplier_key Supplier_Code Supplier_Name Supplier_State version


001 ABC Phlogistical Supply Company CA 0

002 ABC Phlogistical Supply Company IL 1

We would have 3 conceptual facts:

Order Fact – Linked to one set of supplier dimension keys Shipment Fact - Linked to another set of supplier dimension keys Delivery Fact - Linked to a final set of supplier dimension keys

At least conceptually, regardless of the implementation, there are 3 different facts. If we wanted to merge these three facts, that are measuring different points of the same supply-chain process, we have to consider the different (but shared) dimensions to become roles.

The Order-Shipment-Delivery Fact would include 3 different roles for the supplier dimension: Order Supplier, Shipment Supplier and Delivery Supplier. In this way the querying becomes extremely easy and flexible. There would be no need to ask the user to use timestamp data or to implement the foreign keys using business keys.

This article just seems to be a bit of a mish-mash of different editor's theories and terminologies. As an encyclopedia article it should only be discussing reliable notable research, eg. Kimball. Anything non-attributed to such research does not belong here. Read WP:NOR to see why this is the case. Remy B 08:12, 7 November 2007 (UTC)[reply]


Some additional citations (e.g. Kimball, Haughey, Kalido) have been added to ensure this is not considered as "original research" so hopefully Remy B's criticism is no longer valid - AJD, Feb 08

Major Revisions to Type 6 SCDs[edit]

I have made some major revisions to the section on Type 6 / Hybrid SCDs, and added a section on Type 2 / Type 6 Fact Implementation. I believe the Type 6 description I gave reflects Kimball well, and will try to add references when I have time and the correct book in front of me.

I'd appreciate help with the fact implementation references, particularly for the "both surrogate and natural key" section. I didn't make this stuff up out of whole cloth, but I'll be darned if I can remember where I learned it.

The article probably still needs additional citations, but can we remove the "Cleanup" note at the beginning? Coolelle (talk) 20:17, 12 October 2010 (UTC)[reply]

Suggest combining or making one subsection of other[edit]

Type 2 / Type 6 Fact Implementation and Type 6 Hybrid sections are describing the same method, and even the example shown matches in both cases. The first simply shows the evolution of the dimension as new history is added over time. The Implementation section shows how facts are related to their point-in-time dimension entries. These techniques are part of the same method, the former being relevant to how you handle the changes in the dimension, and the latter being how you assign FK in the fact table and query them.

It probably should also be noted that the main reason for this approach, is because it eliminates the usage of date ranges or functions. Thus it is much easier to query, performs better, and any querying tool designed to work with data warehouses will understand how to query the relationship, because it is no more complicated than any other join between a fact and dimension. 207.156.50.129 (talk) 23:09, 13 February 2012 (UTC)[reply]

New PK in SCD type 2 (and other types)?[edit]

I wonder about the description of SCD type 2. When the supplier moves to IL, his PK changes to '124'. From then on, nothing in the table tells me that this is actually the same supplier as '123' who used to be located in CA (unless "supplier code" is actually the key - which should be explained in the article). I wonder if this is correct and/or useful. The SCD entry in German language describes this differently (same PK but different version number or validity date) which looks more convincing to me. —Preceding unsigned comment added by Oliver.schoisswohl (talkcontribs) 13:53, 13 October 2010 (UTC)[reply]

Good point about natural vs. surrogate keys. I made some small changes to the Type 1 and Type 2 entries to recitfy this. Let me know what you think. Coolelle (talk) 20:02, 13 October 2010 (UTC)[reply]

Type 6 Attribution[edit]

I am adding a citation needed tag to the statement of the origin of the term "Type 6" - there is no citation currently given and Margy Ross from the Kimball Group has disputed the accuracy of this in a class she gave today. Please provide a citation. Alucard (Dr.) | Talk 22:44, 14 December 2011 (UTC)[reply]

New source for definition of SCD types other than 1, 2, 3[edit]

On Tue 05 Feb 2013, the Kimball Group published a new "Design Tip" written by Margy Ross with the title "Design Tip #152 Slowly Changing Dimension Types 0, 4, 5, 6 and 7" [1] in order to clarify and standardize the usage of SCD types other than 1, 2, and 3. The article presents information from the forthcoming third edition of their dimensional design book "The Data Warehouse Toolkit" due July 2013.

The definition of these types (0, 4, 5, 6 and 7) differs significantly from the various different definitions that have appeared through the years, some of which appear in the present (Wed 06 Feb 2013) version of this Wikipedia article.

Suggested changes to this Wikipedia article:

1. Revise the introductory section to more accurately define the topic. Ralph Kimball et al (articles and books) and Chris Adamson (books) both provide excellent explanations.

2. Revise/replace the affected sections with content from the above referenced article. (Kimball originally introduced the concept and terminology of "slowly changing dimensions" [SCD] and their numbering. The current content of the Wikipedia article does not reflect broadly accepted usage and terminology... or at least not as broadly accepted and used as the original 1, 2, and 3 definitions.)

3. Retain the existing affected content, without a number label or with a corrected number label if applicable, IF there are appropriate references. The assignment of the SCD numbers in particular is questioned.

4. No doubt there are other changes that would enhance the article.

Since I am a Wikipedia "edit" newbie, and therefore not up to speed on the technical details and social protocols of article editing, I should not take on the task of making these changes myself at this time. However, I am willing to work with a patient and friendly mentor.

Jpoc.yoda (talk) 22:16, 6 February 2013 (UTC)[reply]

References

Possibly Incorrect Example[edit]

The example in section "Type 2 surrogate key with type 3 attribute" may be incorrect (unless I'm missing something). The text states: "The following SQL retrieves, for each fact record, the current supplier state and the state the supplier was located in at the time of the delivery:

SELECT
  delivery.delivery_cost,
  supplier.supplier_name,
  supplier.historical_state,
  supplier.current_state
FROM delivery
INNER JOIN supplier
  ON delivery.supplier_key = supplier.supplier_key

" But if we look at the table in question, we see that this is unlikely.

Supplier_Key Supplier_Code Supplier_Name Current_State Historical_State Start_Date End_Date Current_Flag
123 ABC Acme Supply Co CA CA 01-Jan-2000 21-Dec-2004 N
124 ABC Acme Supply Co IL CA 22-Dec-2004 03-Feb-2008 N
125 ABC Acme Supply Co NY IL 04-Feb-2008 31-Dec-9999 Y

Suppose I have a Delivery with Supplier Key 123. Then the join will return CA for both historical_state and current_state. But the "current" supplier state should be NY (since that is the current_state in the row with Current_flag set to Y).

Half-open intervals instead of closed intervals for SCD Type 2[edit]

The Type 2 and Type 6 examples appear to show start and end dates as closed intervals - the effective date range of the rows includes both the specified start and specified end dates. Using closed intervals creates a logical gap between each successive range, i.e. the gap between the end date/time of one row and the start date/time of the next, for example, the gap between 2004-12-21 00:00:00 and 2004-12-22 00:00:00. Ralph Kimball and others (including ISO 8601) advocate using half-open intervals, not closed intervals:

"The end-effective-datetime of a Type 2 dimension record must be exactly equal to the begin-effective-datetime of the next change for that dimension member"[1]

This means that the effective date/time range for a dimension row should be understood to be startDate <= t < endDate and not startDate <= t <= endDate.

Also:

"The most current dimension record must have an end-effective-datetime equal to a fictitious datetime far in the future." i.e. not a null (ibid.)

The article ought to be updated to use half-open intervals as a matter of sound practice and as a more general solution (closed intervals can be workable in special cases where the technical precision of the date/time is known and never less than the interval between successive timestamps but that is an exception rather than the general rule).

Oradium (talk) 13:53, 26 April 2019 (UTC)[reply]

References

  1. ^ Kimball, Ralph. "Slowly Changing Dimensions, Part 2". kimballgroup.com. Retrieved 26 April 2019.

The term "dimension" is never defined[edit]

Reading through the article, I have a hard time understanding what is meant by "dimension". A brief explanation of it, with examples, would go a long way.

What is the difference between Type 2 and "Pure Type 6"?[edit]

Looking at the examples for type 2 and for "Pure type 6" they appear to me to be identical. Am I missing something, or is there a problem with the descriptions?

Type 5 is confusingly explained[edit]

The section about "Type 5" is very hard to follow. It talks about "mini-dimension" and "base dimension" which weren't clearly defined nor mentioned anywhere else in the article. It also doesn't show any concrete example like in the previous sections. Pdfrod (talk) 10:43, 4 December 2023 (UTC)[reply]