Junk Dimension Design

  • Hello Eveyone,

    I am struck with Junk dimension.

    My target table structure is as below.

    Junk_Dim

    ----------

    JunkID -- PK

    JunkTypeID -- PK

    JunkCode

    JunkName

    JunkTypeName

    Below values need to be propagated to the Junk Dimesnion.

    SubscriptionStatusSubscriptionStatus

    4Contract End

    3Contract Cancelled

    1Contract Made

    2Contract Start

    CustomerStatusCustomerStatus

    1Customer Start

    2Customer End

    PaymentCodePaymentDesc

    1Settled Late

    2Settled On Time

    3Invoiced

    4Past Due

    5Not Invoiced

    -1N/A

    DocumentTypeDocument

    0Blank

    1Payment

    2Invoice

    3Credit Memo

    4Finance Charge Memo

    5Reminder

    6Refund

    CaseTypeCaseType

    0Unknown

    1Car

    2Alarm

    3Home

    4Security

    TransportedTocategoryIdTransportedTo

    0N/A

    1Non agreement garage

    2Place of work or business

    3Auto Salvage / Car Junk Yard

    4Home address

    5(non agreement garage)

    6Police

    7Agreement garage directly

    8Agreement garage via Comp

    9Goverment (Trafikketaten)

    I have gone through the forums but unable to come up with a solution that would be best way to fill the above Junk dimension structure with the values mentioned.

    Any help is really appreciated.

  • kewlguy13 (7/15/2016)


    Hello Eveyone,

    I am struck with Junk dimension.

    My target table structure is as below.

    Junk_Dim

    ----------

    JunkID -- PK

    JunkTypeID -- PK

    JunkCode

    JunkName

    JunkTypeName

    Below values need to be propagated to the Junk Dimesnion.

    SubscriptionStatusSubscriptionStatus

    4Contract End

    3Contract Cancelled

    1Contract Made

    2Contract Start

    CustomerStatusCustomerStatus

    1Customer Start

    2Customer End

    PaymentCodePaymentDesc

    1Settled Late

    2Settled On Time

    3Invoiced

    4Past Due

    5Not Invoiced

    -1N/A

    DocumentTypeDocument

    0Blank

    1Payment

    2Invoice

    3Credit Memo

    4Finance Charge Memo

    5Reminder

    6Refund

    CaseTypeCaseType

    0Unknown

    1Car

    2Alarm

    3Home

    4Security

    TransportedTocategoryIdTransportedTo

    0N/A

    1Non agreement garage

    2Place of work or business

    3Auto Salvage / Car Junk Yard

    4Home address

    5(non agreement garage)

    6Police

    7Agreement garage directly

    8Agreement garage via Comp

    9Goverment (Trafikketaten)

    I have gone through the forums but unable to come up with a solution that would be best way to fill the above Junk dimension structure with the values mentioned.

    Any help is really appreciated.

    ...with the values mentioned.

    I see no values mentioned.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • These are the possible values:

    SubscriptionStatusSubscriptionStatus

    4Contract End

    3Contract Cancelled

    1Contract Made

    2Contract Start

    CustomerStatusCustomerStatus

    1Customer Start

    2Customer End

    PaymentCodePaymentDesc

    1Settled Late

    2Settled On Time

    3Invoiced

    4Past Due

    5Not Invoiced

    -1N/A

    DocumentTypeDocument

    0Blank

    1Payment

    2Invoice

    3Credit Memo

    4Finance Charge Memo

    5Reminder

    6Refund

    CaseTypeCaseType

    0Unknown

    1Car

    2Alarm

    3Home

    4Security

    TransportedTocategoryIdTransportedTo

    0N/A

    1Non agreement garage

    2Place of work or business

    3Auto Salvage / Car Junk Yard

    4Home address

    5(non agreement garage)

    6Police

    7Agreement garage directly

    8Agreement garage via Comp

    9Goverment (Trafikketaten).

    I tried something like this:

    Select 100001 Junk, 1 JunkTypeID , 'Contract Made' JunkCode, 'SubscriptionStatus' JunkName, 'SubscriptionStatus' JunkTypeName

    Union all

    Select 100002 Junk, 2 JunkTypeID , 'Contract Start' JunkCode, 'SubscriptionStatus' JunkName, 'SubscriptionStatus' JunkTypeName

    Union all

    Select 100003 Junk, 3 JunkTypeID , 'Contract Cancelled' JunkCode, 'SubscriptionStatus' JunkName, 'SubscriptionStatus' JunkTypeName

    Union all

    Select 100004 Junk, 4 JunkTypeID , 'Contract End' JunkCode, 'SubscriptionStatus' JunkName, 'SubscriptionStatus' JunkTypeName

    Union all

    Select 100005 Junk, 1 JunkTypeID , 'Customer Start' JunkCode, 'CustomerStatus' JunkName, 'CustomerStatus' JunkTypeName

    Union all

    Select 100006 Junk, 2 JunkTypeID , 'Customer End' JunkCode, 'CustomerStatus' JunkName, 'CustomerStatus' JunkTypeName

    Union all

    Select 100007 Junk, 1 JunkTypeID , 'Blank' JunkCode, 'DocumentType' JunkName, 'DocumentType' JunkTypeName

    Union all

    Select 100008 Junk, 2 JunkTypeID , 'Payment' JunkCode, 'DocumentType' JunkName, 'DocumentType' JunkTypeName

    Union all

    Select 100009 Junk, 3 JunkTypeID , 'Invoice' JunkCode, 'DocumentType' JunkName, 'DocumentType' JunkTypeName

    Any advice is this a good idea?

  • What if you have a table which have multiple combinations of these lookup?

    Something like this :

    Invoice # : ABC1230092M1

    PaymentCode : PaymentCode

    DocumentType : Credit Memo

    Transported: Home address

    If this is happening then you will be ending up multiple join of you Junk_Dimension in the fact, which defeats the purpose of a junk_dimension

    You need to know how many kind of possible combination you could have in your data.

  • Your design is for an EAV table, not a junk dimension. A junk dimension is derived from the Cartesian product of your attributes so it would look more like

    JunkID

    SubscriptionStatus

    CustomerStatus

    PaymentDesc

    Document

    CaseType

    TransportedTo

    Obviously, you don't need to include all possible values from your Cartesian product. For instance CustomerEnd probably won't need to be paired with every value of TransportTo.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks Guys for the quick response!

    Looking at the target table structure, I could see that Cross Join doesn't fit. That's where I was confused.

    I will try with the EAV approach and see how it is used in Fact table.

    For now I don't see any requirement for different combinations.

    For EAV:

    Is the below logic a good idea? Any comments?

    Select 100001 Junk, 1 JunkTypeID , 'Contract Made' JunkCode, 'SubscriptionStatus' JunkName, 'SubscriptionStatus' JunkTypeName

    Union all

    Select 100002 Junk, 2 JunkTypeID , 'Contract Start' JunkCode, 'SubscriptionStatus' JunkName, 'SubscriptionStatus' JunkTypeName

    Union all

    Select 100003 Junk, 3 JunkTypeID , 'Contract Cancelled' JunkCode, 'SubscriptionStatus' JunkName, 'SubscriptionStatus' JunkTypeName

    Union all

    Select 100004 Junk, 4 JunkTypeID , 'Contract End' JunkCode, 'SubscriptionStatus' JunkName, 'SubscriptionStatus' JunkTypeName

    Union all

    Select 100005 Junk, 1 JunkTypeID , 'Customer Start' JunkCode, 'CustomerStatus' JunkName, 'CustomerStatus' JunkTypeName

    Union all

    Select 100006 Junk, 2 JunkTypeID , 'Customer End' JunkCode, 'CustomerStatus' JunkName, 'CustomerStatus' JunkTypeName

    Union all

    Select 100007 Junk, 1 JunkTypeID , 'Blank' JunkCode, 'DocumentType' JunkName, 'DocumentType' JunkTypeName

    Union all

    Select 100008 Junk, 2 JunkTypeID , 'Payment' JunkCode, 'DocumentType' JunkName, 'DocumentType' JunkTypeName

    Union all

    Select 100009 Junk, 3 JunkTypeID , 'Invoice' JunkCode, 'DocumentType' JunkName, 'DocumentType' JunkTypeName

  • kewlguy13 (7/15/2016)


    Thanks Guys for the quick response!

    Looking at the target table structure, I could see that Cross Join doesn't fit. That's where I was confused.

    I will try with the EAV approach and see how it is used in Fact table.

    For now I don't see any requirement for different combinations.

    For EAV:

    Is the below logic a good idea? Any comments?

    An EAV table suffers all the same problems as having separate dimensions with none of the benefits. You still need to store a separate ID for each attribute in the fact table. You still need multiple joins to get the attribute values. You have trouble creating indexes on multiple attributes.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks Drew for the comments!

    If I had to design the Junk dimension with these values, what do you suggest.

    What would be Junk Dimension structure?

    Logic: I was thinking about CROSS JOIN.

    Any ideas?

    Below is the complete list of values:

    SubscriptionStatusSubscriptionStatus

    4Contract End

    3Contract Cancelled

    1Contract Made

    2Contract Start

    CustomerStatusCustomerStatus

    1Customer Start

    2Customer End

    Payment CodePayment Desc

    1Settled Late

    2Settled On Time

    3Invoiced

    4Past Due

    5Not Invoiced

    -1N/A

    Document TypeDocument

    0Blank

    1Payment

    2Invoice

    3Credit Memo

    4Finance Charge Memo

    5Reminder

    6Refund

    InvoiceTypeType

    1Contract

    2Assistance

    3Other

    CaseTypeCaseType

    0Unknown

    1Car

    2Alarm

    3Home

    4Security

    TransportedTocategoryIdTransportedTo

    0N/A

    1Non agreement garage

    2Place of work or business

    3Auto Salvage / Car Junk Yard

    4Home address

    5(non agreement garage)

    6Police

    7Agreement garage directly

    8Agreement garage via Comp

    9Goverment (Trafikketaten)

    SubscriptionTypeSubscriptionTypeName

    0Car

    1Home

    2Business

    3First Aid

    4Payment Subscription

    5Miscellaneous

    DocumentStatusDocumentStatus

    1Open

    2Posted

  • kewlguy13 (7/15/2016)


    Thanks Drew for the comments!

    If I had to design the Junk dimension with these values, what do you suggest.

    What would be Junk Dimension structure?

    Logic: I was thinking about CROSS JOIN.

    Any ideas?

    I've already given you the best guess I can give based on the information provided. For anything more refined, I would want to look at the distribution of various combinations, particularly looking for degree of correlation. From that you may find that several of the attributes have a strong enough correlation that they deserve their own dimension.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Check the following Mystery or Junk data warehouse dimensions . This might help you understand the working.

    Remember you are in the best situation to understand the requirements/scenario and know what Should be done.

    Hope it helps.

  • Thanks Guys!

    I will get back to you.

  • I would not recommend a junk dimension for such a large combination of items. I've used these a few times for a few combinations of yes/no, and once for four questions with answers 0-9. The latter makes for more than 10K possible combinations as I recall. But at least they were related to the same survey. Why don't you want these as separate dimensions?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply