January 16, 2017 at 11:32 pm
Comments posted to this topic are about the item Using SQL Server and R Services for analyzing Sales data (Part 3)
Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/
January 17, 2017 at 7:53 am
This looks like a great article. I'm going to try this over the weekend on my home SQL Server 2016 sandbox. If all works as you've laid out, I'll come back and give you a top rating. Thanks for diving into this.
January 17, 2017 at 12:39 pm
RonKyle - Tuesday, January 17, 2017 7:53 AMThis looks like a great article. I'm going to try this over the weekend on my home SQL Server 2016 sandbox. If all works as you've laid out, I'll come back and give you a top rating. Thanks for diving into this.
Much appreciated. Especially your opinion and how you will apply this.
Code is working fine with WideWorldImporters/WideWorldImportersDW database.
Best, Tomaž
Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/
January 17, 2017 at 12:43 pm
I'll be testing against that database. I have it set up, but have only started some R work. This gives me something concrete to follow and try.
January 17, 2017 at 11:14 pm
RonKyle - Tuesday, January 17, 2017 12:43 PMI'll be testing against that database. I have it set up, but have only started some R work. This gives me something concrete to follow and try.
Great.
If you have any other questions, just post them here so we can discuss them!
Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/
January 21, 2017 at 6:49 am
Amazing article!
Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?
Thanks a lot!
Jon
January 21, 2017 at 6:58 am
Hi,
I was wondering why you did not use the CustomerKey when you created the cluster in:dist(Sales[,c(1,3,5)])
Wouldn't it have been more effective to cluster by customer rather than by productgroup only?
Thanks in advance for the explanation!
January 21, 2017 at 8:35 am
Jonathan Mallia - Saturday, January 21, 2017 6:58 AMHi,I was wondering why you did not use the CustomerKey when you created the cluster in:
dist(Sales[,c(1,3,5)])
Wouldn't it have been more effective to cluster by customer rather than by productgroup only?
Thanks in advance for the explanation!
Hi,
customerkey is just a running ID for each of the customers in the database. In this case, Clustering is done on the attributes of the customers (observation), and CustomerKey is not an attribute that would describe or unveil any information about the customer. If it would be included, it can only create dis-information in relation to other real/natural attributes.
Attribute for customer can be: business information: number of transactions created, value of invoices, basket values, business type; demographic information: area, city, country, age, etc. All these attributes describe customers. CustomerKey on the other hand, does not describe customer, nor is anyhow related to customer. it is just a database identifier.
ProductGroup can be added, because it describes products customer is buying/selling. But if you have all the customers buying all the products, it might also be a good to rethink if you want to include it / how you want to include such attribute.
Hope I made it more understanding.
Best, Toamaž
Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/
January 21, 2017 at 9:02 am
Jonathan Mallia - Saturday, January 21, 2017 6:49 AMAmazing article!Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?
Thanks a lot!
Jon
Hi,
Both support and confidence are important to identify and find relevant relationship between left hand side (LHS) and right hand side (RHS). Left hand side is interpreted as IF item A.... and Right hand side as THEN item B and item C.
Or shown graphically {A} => {B,C}. Imagine, this is our rule. To this rule, support represents, how many times this rules was found in the dataset. If support is 0.123, this means that this rules appeared 12,3% out of all the rules in dataset.
Confidence will tell you, how many times this rules has been proven as True. If confidence for our rules is 0.99, this means that in 99% of the dataset containing all the rules, customers that bought item A will in 99% times also buy item B and C.
Best, Tomaž
Tomaž Kaštrun | twitter: @tomaz_tsql | Github: https://github.com/tomaztk | blog: https://tomaztsql.wordpress.com/
January 21, 2017 at 12:54 pm
tomaz.kastrun - Saturday, January 21, 2017 9:02 AMJonathan Mallia - Saturday, January 21, 2017 6:49 AMAmazing article!Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?
Thanks a lot!
JonHi,
Both support and confidence are important to identify and find relevant relationship between left hand side (LHS) and right hand side (RHS). Left hand side is interpreted as IF item A.... and Right hand side as THEN item B and item C.
Or shown graphically {A} => {B,C}. Imagine, this is our rule. To this rule, support represents, how many times this rules was found in the dataset. If support is 0.123, this means that this rules appeared 12,3% out of all the rules in dataset.
Confidence will tell you, how many times this rules has been proven as True. If confidence for our rules is 0.99, this means that in 99% of the dataset containing all the rules, customers that bought item A will in 99% times also buy item B and C.Best, Tomaž
Thank you Tomaz,
it's a lot clearer with your explanation.
Many thanks.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply