Using SQL Server and R Services for analyzing Sales data (Part 3)

Question

Using SQL Server and R Services for analyzing Sales data (Part 3)

tomaz.kastrun

SSCrazy

Points: 2141
More actions
January 16, 2017 at 11:32 pm

#317771

Comments posted to this topic are about the item Using SQL Server and R Services for analyzing Sales data (Part 3)
Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

RonKyle SSC-Dedicated Points: 31551 More actions · Answer 1

This looks like a great article. I'm going to try this over the weekend on my home SQL Server 2016 sandbox. If all works as you've laid out, I'll come back and give you a top rating. Thanks for diving into this.

tomaz.kastrun SSCrazy Points: 2141 More actions · Answer 2

RonKyle - Tuesday, January 17, 2017 7:53 AM
This looks like a great article. I'm going to try this over the weekend on my home SQL Server 2016 sandbox. If all works as you've laid out, I'll come back and give you a top rating. Thanks for diving into this.

Much appreciated. Especially your opinion and how you will apply this.
Code is working fine with WideWorldImporters/WideWorldImportersDW database.

Best, TomaÅ¾

Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/

RonKyle SSC-Dedicated Points: 31551 More actions · Answer 3

I'll be testing against that database. I have it set up, but have only started some R work. This gives me something concrete to follow and try.

tomaz.kastrun SSCrazy Points: 2141 More actions · Answer 4

RonKyle - Tuesday, January 17, 2017 12:43 PM
I'll be testing against that database. I have it set up, but have only started some R work. This gives me something concrete to follow and try.

Great.
If you have any other questions, just post them here so we can discuss them!

Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/

Jonathan Mallia SSCertifiable Points: 5192 More actions · Answer 5

Amazing article!

Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?

Thanks a lot!
Jon

Jonathan Mallia SSCertifiable Points: 5192 More actions · Answer 6

Hi,

I was wondering why you did not use the CustomerKey when you created the cluster in:
dist(Sales[,c(1,3,5)])

Wouldn't it have been more effective to cluster by customer rather than by productgroup only?

Thanks in advance for the explanation!

tomaz.kastrun SSCrazy Points: 2141 More actions · Answer 7

Jonathan Mallia - Saturday, January 21, 2017 6:58 AM
Hi,
I was wondering why you did not use the CustomerKey when you created the cluster in:
dist(Sales[,c(1,3,5)])
Wouldn't it have been more effective to cluster by customer rather than by productgroup only?
Thanks in advance for the explanation!

Hi,

customerkey is just a running ID for each of the customers in the database. In this case, Clustering is done on the attributes of the customers (observation), and CustomerKey is not an attribute that would describe or unveil any information about the customer. If it would be included, it can only create dis-information in relation to other real/natural attributes.
Attribute for customer can be: business information: number of transactions created, value of invoices, basket values, business type; demographic information: area, city, country, age, etc. All these attributes describe customers. CustomerKey on the other hand, does not describe customer, nor is anyhow related to customer. it is just a database identifier.

ProductGroup can be added, because it describes products customer is buying/selling. But if you have all the customers buying all the products, it might also be a good to rethink if you want to include it / how you want to include such attribute.

Hope I made it more understanding.
Best, ToamaÅ¾

Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/

tomaz.kastrun SSCrazy Points: 2141 More actions · Answer 8

Jonathan Mallia - Saturday, January 21, 2017 6:49 AM
Amazing article!
Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?
Thanks a lot!
Jon

Hi,
Both support and confidence are important to identify and find relevant relationship between left hand side (LHS) and right hand side (RHS). Left hand side is interpreted as IF item A.... and Right hand side as THEN item B and item C.
Or shown graphically {A} => {B,C}. Imagine, this is our rule. To this rule, support represents, how many times this rules was found in the dataset. If support is 0.123, this means that this rules appeared 12,3% out of all the rules in dataset.
Confidence will tell you, how many times this rules has been proven as True. If confidence for our rules is 0.99, this means that in 99% of the dataset containing all the rules, customers that bought item A will in 99% times also buy item B and C.

Best, TomaÅ¾

Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/

Jonathan Mallia SSCertifiable Points: 5192 More actions · Answer 9

tomaz.kastrun - Saturday, January 21, 2017 9:02 AM
Jonathan Mallia - Saturday, January 21, 2017 6:49 AM
Amazing article!
Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?
Thanks a lot!
Jon
Hi,
Both support and confidence are important to identify and find relevant relationship between left hand side (LHS) and right hand side (RHS). Left hand side is interpreted as IF item A.... and Right hand side as THEN item B and item C.
Or shown graphically {A} => {B,C}. Imagine, this is our rule. To this rule, support represents, how many times this rules was found in the dataset. If support is 0.123, this means that this rules appeared 12,3% out of all the rules in dataset.
Confidence will tell you, how many times this rules has been proven as True. If confidence for our rules is 0.99, this means that in 99% of the dataset containing all the rules, customers that bought item A will in 99% times also buy item B and C.
Best, TomaÅ¾

Thank you Tomaz,

it's a lot clearer with your explanation.

Many thanks.