Data Distribution Query Assistance

  • Hello Community,

    I have couple samples below showing how to determine the data distribution of a record in a field.

    For example the following code will show that record, 'FIESTA' in the 'model' field represents 59.52% of all the records

    SELECT
    vehicles.model
    ,COUNT(*) AS cnt
    ,CAST(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS DEC(5, 2)) AS distribution
    FROM dbo.vehicles
    GROUP BY vehicles.model
    ORDER BY distribution DESC

    Likewise the following code will show that record '1.7' in the 'engine_size' field represents 15.08% of all records in the 'engine_size' field

    SELECT
    vehicles.engine_size
    ,COUNT(*) AS cnt
    ,CAST(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS DEC(5, 2)) AS distribution
    FROM dbo.vehicles
    GROUP BY vehicles.engine_size
    ORDER BY distribution DESC

    Can someone let me know how to combine the code samples above to allow me to get the distribution of both the 'model' and 'engine_size'? So that the query results similiar the following screen shot:

    distribution

    Below is SAMPLE data was used to create the queries above. (PLEASE NOTE ITS JUST SAMPLE DATA - so please don't get crazy on me because the sample data doesn't fit the exact requirements used in a Production database!!!!!)

     

    CREATE TABLE vehicles (
    registration varchar(50),
    make varchar(50),
    model varchar(50),
    engine_size float)

    INSERT vehicles VALUES
    ('JjFw5a0','SKODA','OCTAVIA',1.8),
    ('VkfCDpZ','FORD','FIESTA',1.7),
    ('5E93ZEq','SKODA','OCTAVIA',1.3),
    ('L2PPN0m','FORD','FIESTA',1.1),
    ('9xKghxp','FORD','FIESTA',1.5),
    ('WHShdBm','FORD','FIESTA',1.4),
    ('TNRHyy7','NISSAN','QASHQAI',1.2),
    ('6RNX0XG','SKODA','OCTAVIA',1.4),
    ('tJ9bOD8','FORD','FIESTA',1.1),
    ('ablFUSC','FORD','FIESTA',1),
    ('4B7RLYL','MERCEDED_BENZ','E CLASS',1.3),
    ('tlJiwVY','FORD','FIESTA',1),
    ('Fb9lcvG','FORD','FIESTA',1.4),
    ('nW4lqBC','FORD','FIESTA',1.6),
    ('LggTmL5','HYUNDAI','I20',1),
    ('2mGgSjS','FORD','FIESTA',1.1),
    ('IDvOzcM','FORD','FIESTA',1.3),
    ('JefpXK2','FORD','FIESTA',1.5),
    ('0h1uWfZ','MERCEDED_BENZ','E CLASS',1.4),
    ('ylBoGbV','MERCEDED_BENZ','E CLASS',1.7),
    ('XzoILDK','VAUXHALL','CORSA',1.8),
    ('Xhocs1Z','FORD','FIESTA',1.5),
    ('Lh2yWGa','KIA','RIO',1.5),
    ('hM5GWA0','FORD','FIESTA',1.3),
    ('PbpxkFt','FORD','FIESTA',1.7),
    ('SDHWV2r','FORD','FIESTA',1.2),
    ('n83Je2D','FORD','FIESTA',1.8),
    ('sDN0gex','FORD','FIESTA',1.2),
    ('7EICOZY','KIA','RIO',1.5),
    ('PUuMmIH','FORD','FIESTA',1),
    ('HiBwSg2','FORD','FIESTA',1.8),
    ('1yk1vDm','KIA','RIO',1.7),
    ('cMpH72R','HYUNDAI','I20',1.1),
    ('ZgQL0gt','MERCEDED_BENZ','E CLASS',1.3),
    ('jhpamQG','KIA','RIO',1.1),
    ('pk0lU2F','VAUXHALL','CORSA',1.4),
    ('fDCUeq1','FORD','FIESTA',1.1),
    ('ono5QFC','FORD','FIESTA',1.7),
    ('VohWwGR','FORD','FIESTA',1.5),
    ('Hih8dKc','SUZUKI','SWIFT',1.2),
    ('D2RNn3h','SUZUKI','SWIFT',1.2),
    ('QaYQulE','FORD','FIESTA',1.1),
    ('xmQPxAG','FORD','FIESTA',1.8),
    ('vmTqkTO','FORD','FIESTA',1.2),
    ('lvUtVUA','MERCEDED_BENZ','E CLASS',1),
    ('SFoj00d','FORD','FIESTA',1),
    ('9S6wrWV','MERCEDED_BENZ','E CLASS',1),
    ('0SBnW0z','FORD','FIESTA',1.1),
    ('HnDHdfj','MERCEDED_BENZ','E CLASS',1),
    ('RV7q947','FORD','FIESTA',1.4),
    ('JZqCtTg','FORD','FIESTA',1.7),
    ('XVgBwgi','FORD','FIESTA',1.8),
    ('iqJDsIF','FORD','FIESTA',1.6),
    ('CMbpRFa','FORD','FIESTA',1.6),
    ('vF7K5Xg','SUZUKI','SWIFT',1.1),
    ('3j6XGDH','FORD','FIESTA',1.5),
    ('ommqugM','FORD','FIESTA',1.1),
    ('LMQkPnw','NISSAN','QASHQAI',1.4),
    ('1dKgcdd','FORD','FIESTA',1.5),
    ('hC8BxiP','MERCEDED_BENZ','E CLASS',1.1),
    ('wLTWol7','FORD','FIESTA',1.6),
    ('TY8ChYN','FORD','FIESTA',1.6),
    ('Gw1CpI8','FORD','FIESTA',1.4),
    ('L4OPAJq','FORD','FIESTA',1.1),
    ('6TyYpfi','NISSAN','QASHQAI',1.6),
    ('ozoOcGL','FORD','FIESTA',1.4),
    ('6IME19U','FORD','FIESTA',1.4),
    ('BxpmJO5','FORD','FIESTA',1.4),
    ('0zc2n5A','FORD','FIESTA',1.3),
    ('FqbBZE2','FIAT','500',1.7),
    ('2EkTOTz','FORD','FIESTA',1.4),
    ('fNBvIvg','MERCEDED_BENZ','C CLASS',1.2),
    ('u5j4R4S','KIA','RIO',1.4),
    ('zpWaUZo','FORD','FIESTA',1.1),
    ('FQPVQYc','NISSAN','QASHQAI',1.7),
    ('8RBQADq','KIA','RIO',1.7),
    ('TOz2bcT','HYUNDAI','I20',1.7),
    ('jebhCex','FORD','FIESTA',1.3),
    ('cdHA1gL','FORD','FIESTA',1.2),
    ('FoaN4AT','FORD','FIESTA',1.7),
    ('atGn288','FORD','FIESTA',1.5),
    ('es8VNdW','FIAT','500',1.3),
    ('hDWoMXa','KIA','RIO',1.4),
    ('Q9C6Br1','KIA','RIO',1.5),
    ('mFSy4aF','FORD','FIESTA',1.6),
    ('bbbKnrM','SKODA','OCTAVIA',1.5),
    ('qY7lz6I','FORD','FIESTA',1),
    ('8Ch2OeU','VAUXHALL','CORSA',1.3),
    ('dcWsjJv','VAUXHALL','CORSA',1.3),
    ('bnnoBPg','SKODA','OCTAVIA',1.8),
    ('mvDyYkK','FORD','FIESTA',1.4),
    ('KpWDYap','FORD','FIESTA',1.3),
    ('7EK9K4z','FORD','FIESTA',1.3),
    ('ZPLHtlP','FORD','FIESTA',1.6),
    ('4EpYeSB','FORD','FIESTA',1.6),
    ('O1eZ20M','FORD','FIESTA',1),
    ('WfVntKk','FORD','FIESTA',1.7),
    ('6VlkBdi','FORD','FIESTA',1.1),
    ('hFQfKjk','KIA','RIO',1.4),
    ('3Y4njNP','KIA','RIO',1),
    ('3UuNqG0','FORD','FIESTA',1.7),
    ('qpvMYAu','FORD','FIESTA',1.1),
    ('NCYJUqx','FORD','FIESTA',1.3),
    ('M0AvWzg','FORD','FIESTA',1.6),
    ('XbVmtFf','FORD','FIESTA',1.3),
    ('l8qZy0H','SKODA','OCTAVIA',1.3),
    ('EDUbxaU','MERCEDED_BENZ','E CLASS',1.6),
    ('nWLd82o','FORD','FIESTA',1.7),
    ('4AkoyWx','FORD','FIESTA',1),
    ('nOoO25v','FORD','FIESTA',1.3),
    ('VAm5aV8','NISSAN','QASHQAI',1.4),
    ('zbd3cie','FORD','FIESTA',1.5),
    ('hyAN71W','NISSAN','QASHQAI',1),
    ('FxACHDf','FIAT','500',1.7),
    ('wOZdaeV','FORD','FIESTA',1.6),
    ('gfxZl99','VAUXHALL','CORSA',1.1),
    ('06HhwEJ','SKODA','OCTAVIA',1.7),
    ('PCTgYiG','KIA','RIO',1.7),
    ('U54WXZQ','KIA','RIO',1.6),
    ('FHgrRiF','FORD','FIESTA',1.6),
    ('R3jP73p','SKODA','OCTAVIA',1.5),
    ('etVPKX9','SUZUKI','SWIFT',1.1),
    ('BE3yReB','FORD','FIESTA',1.7),
    ('zXmX878','FORD','FIESTA',1.6),
    ('wdM3P2m','FORD','FIESTA',1.7),
    ('tb727BM','FORD','FIESTA',1.1)

     

     

     

  • For the FIESTA, your desired engine size is 1.7. Please define the grouping logic you wish to apply which selects 1.7 rather than any of the other FIESTA engine sizes.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi Phil,

    Thanks for reaching out.

    I'm afraid my knowledge of the type of grouping logic needed to achieve the output I showed above is limited.

    I was hoping that someone could come up with a few suggestions

  • I can probably help. But you need to help explain, in English, how your desired results are to be obtained.

    In your sample data, for example, the FIESTA has multiple engine sizes (1.1, 1.3, 1.6, 1.7, 1.8).

    Your desired results include only 1.7.

    I want to know why you chose only 1.7 and eliminated all of the other engine sizes from the results. What is the logic?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi Phil

    Thanks for your patience. Let me try and explain.

    The following query

    SELECT

    vehicles.engine_size

    ,COUNT() AS cnt

    ,CAST(COUNT() 100.0 / SUM(COUNT()) OVER () AS DEC(5, 2)) AS distribution

    FROM dbo.vehicles

    GROUP BY vehicles.engine_size

    ORDER BY distribution DESC

    gives me an output as follows:

    engine_size

    The following query:

     SELECT
    vehicles.model
    ,COUNT(*) AS cnt
    ,CAST(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS DEC(5, 2)) AS distribution
    FROM dbo.vehicles
    GROUP BY vehicles.model
    ORDER BY distribution DESC

    gives me the following output:

    model

    The queries use the same dataset

    I would like to combine both queries such that they give me something like the following output:

     

    distribution

    I hope that makes things a little clearer

  • No wait, I made a mistake in the output image...I need to change it.

  • Sorry,

    I need the output to look like the following:

     

    distribution

  • Hi Phil,

    I friend provided the following suggestion (which works)

     ;
    with M as
    (
    select model,
    count(*) as model_cnt,
    count(*) * 100.0 / (select count(*) from vehicles) as model_distribution
    from vehicles
    group by model
    ),
    MN as
    (
    select *, row_number() over (order by model_cnt desc) as rn
    from M
    ),
    S as
    (
    select engine_size,
    count(*) as engine_size_cnt,
    count(*) * 100.0 / (select count(*) from vehicles) as engine_size_distribution
    from vehicles
    group by engine_size
    ),
    SN as
    (
    select *, row_number() over (order by engine_size_cnt desc) as rn
    from S
    )
    select model, engine_size, model_cnt, engine_size_cnt,
    cast( model_distribution as decimal(30, 2)),
    cast( engine_size_distribution as decimal(30, 2))
    from MN
    full outer join SN on SN.rn = MN.rn

    However, I build my queries using a Query Builder application which doesn't recognize the WITH method. I understand from previous post the WITH method is used to build CTE's.

    Therefore, I was wondering if you could tweak the code to work without the WITH method?

     

     

     

  • I'm not familiar with Query Builder. Can it use temp tables or subqueries?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi Phil,

    The actual application is called dbForge.

    Yes, it can handle subqueries and temp tables.

     

    Cheers

  • OK, try this (I used a temp table, so you need to rename #vehicles accordingly)

    SELECT
    model
    , engine_size
    , MN.model_cnt
    , SN.engine_size_cnt
    , CAST(MN.model_distribution AS DECIMAL(30, 2))
    , CAST(SN.engine_size_distribution AS DECIMAL(30, 2))
    FROM
    (
    SELECT
    model
    , model_cnt = COUNT(*)
    , model_distribution = COUNT(*) * 100.0 / (SELECT COUNT(*) FROM #vehicles)
    , rn = ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC)
    FROM #vehicles
    GROUP BY model
    ) MN
    FULL OUTER JOIN
    (
    SELECT
    engine_size
    , engine_size_cnt = COUNT(*)
    , engine_size_distribution = COUNT(*) * 100.0 / (SELECT COUNT(*) FROM #vehicles)
    , rn = ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC)
    FROM #vehicles
    GROUP BY engine_size
    ) SN
    ON SN.rn = MN.rn;

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi Phil,

    This worked like a dream.

    Now, this leads me on to the more complicated question related to this question - that is of Partitioning....

    The output lets more know that model, FIESTA and engine_size, 1.7 take represent large portion of the data in their respective fields. I would like to somehow repartition the table such that there is a better spread of the data across the fields model and engine_size. Can you help me with that?

    Please note, I aware I'm not explaining myself very well ... but this is due to my lack of understanding of Partitioning. So, if you feel this isn't something you would like to help me with, I totally understand. However, I hope you can at the very least point me in the right direction.

  • Are you talking about modifying the existing data, or adding new rows of data, to achieve this? Out of interest, can you explain why you would want to do this?

    Either way, this is not something I've done before (as I always work with data which is externally generated in some way).

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply