loop and query CSV files in a folder using union all query to form resultant table on server

Question

Post reply

loop and query CSV files in a folder using union all query to form resultant table on server

sifar786

Right there with Babe

Points: 750
More actions
June 27, 2014 at 12:07 pm

#308670

Hi,
I am trying to run a UNION ALL query in SQL SERVER 2014 on multiple large CSV files - the result of which i want to get into a table in SQL Server. below is the query which works in MSAccess but not on SQL Server 2014:
SELECT * INTO tbl_ALLCOMBINED FROM OPENROWSET
(
'Microsoft.JET.OLEDB.4.0' , 'Text;Database=D:\Downloads\CSV\;HDR=YES',
'SELECT t.*, (substring(t.[week],3,4))*1 as iYEAR,
''SPAIN'' as [sCOUNTRY], ''EURO'' as [sCHAR],
IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week
FROM [DATABASE_SPAIN_EURO.CSV] as t, ( SELECT t3.[Level],t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description],
min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as first_week_on_sale,
max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as last_week_on_sale,
(max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) -
min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)))+1 as weeks_on_sale
FROM [DATABASE_SPAIN_EURO.CSV] as t3
WHERE t3.[Sales Value with Innovation] is NOT NULL
and t3.[Sales Value with Innovation] <>0
and t3.[Level]=''Item''
GROUP BY t3.[Level], t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description]
) as t2
WHERE
t.[Level] = t2.[Level]
and t.[Category] = t2.[Category]
and t.[Manufacturer] = t2.[Manufacturer]
and t.[Brand] = t2.[Brand]
and t.[Description] = t2.[Description]
and t.[Sales Value with Innovation] is NOT NULL
and t.[Sales Value with Innovation] <>0
and t2.first_week_on_sale >=1
and t2.weeks_on_sale <=52
UNION ALL
SELECT t.*, (substring(t.[week],3,4))*1 as iYEAR,
''SPAIN'' as [sCOUNTRY], ''EURO'' as [sCHAR],
IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week
FROM [DATABASE_FRANCE_EURO.CSV] as t, ( SELECT t3.[Level],t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description],
min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as first_week_on_sale,
max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as last_week_on_sale,
(max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) -
min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)))+1 as weeks_on_sale
FROM [DATABASE_FRANCE_EURO.CSV] as t3
WHERE t3.[Sales Value with Innovation] is NOT NULL
and t3.[Sales Value with Innovation] <>0
and t3.[Level]=''Item''
GROUP BY t3.[Level], t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description]
) as t2
WHERE
t.[Level] = t2.[Level]
and t.[Category] = t2.[Category]
and t.[Manufacturer] = t2.[Manufacturer]
and t.[Brand] = t2.[Brand]
and t.[Description] = t2.[Description]
and t.[Sales Value with Innovation] is NOT NULL
and t.[Sales Value with Innovation] <>0
and t2.first_week_on_sale >=1
and t2.weeks_on_sale <=52
')
What i need is:
1] to create the resultant tbl_ALLCOMBINED table
2] transform this table using PIVOT command with following transformation as shown below:
PAGEFIELD: set on Level = 'Item'
COLUMNFIELD: Sale_Week (showing 1 to 52 numbers for columns)
ROWFIELD: sCOUNTRY, sCHAR, CATEGORY, MANUFACTURER, BRAND, DESCRIPTION, EAN (in this order)
DATAFIELD: 'Sale Value with Innovation'
3] Can the transformed form show columnfields >255 columns i.e. if i want to show all KPI values in datafield?
P.S: the CSV's contain the same number of columns and datatype but the columns are >100, so i dont think it will be feasible to use a stored proc to create a table specifying that number of columns.
can anyone please help me with a solution asap?

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 1

Quick question, can you provide some sample data?

😎

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 2

I am trying to run a UNION ALL query in SQL SERVER 2014 on multiple large CSV files - the result of which i want to get into a table in SQL Server. below is the query which works in MSAccess but not on SQL Server 2014:

what error messages are you getting?

what is the reasoning for doing union and calcs on the csvs?

maybe better to import each csv to a single staging table in SQL and work from there?

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 3

is this correct....

UNION ALL

SELECT t.*, (substring(t.[week],3,4))*1 as iYEAR,

''SPAIN'' as [sCOUNTRY], ''EURO'' as [sCHAR],

IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,

((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),

((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week

FROM [DATABASE_FRANCE_EURO.CSV] as t,

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sifar786 Right there with Babe Points: 750 More actions · Answer 4

the above queries work fine in sql server now, but donot know how to automate job schedule to scan network folder for csv files and run those queries on the csv files.

i think i should not go ahead with PIVOTING the 2 tables that are produced after running the above queries on the csv file, but rather work with those 2 tables themselves as they contain all the KPI's starting after column 'EAN'.

BTW, as per your earlier suggestion, i am reading on cubes and think they might be the right approach - however, donot know how to create a cube from a single denormalized table containing everything, but not having a primary key.

please advice.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 5

sifar786 (7/5/2014)
the above queries work fine in sql server now, but donot know how to automate job schedule to scan network folder for csv files and run those queries on the csv files.
i think i should not go ahead with PIVOTING the 2 tables that are produced after running the above queries on the csv file, but rather work with those 2 tables themselves as they contain all the KPI's starting after column 'EAN'.
BTW, as per your earlier suggestion, i am reading on cubes and think they might be the right approach - however, donot know how to create a cube from a single denormalized table containing everything, but not having a primary key.
please advice.

here is one method for bringing files into SQL.........

http://www.mssqltips.com/sqlservertip/2874/loop-through-flat-files-in-sql-server-integration-services/

in one of your previous posts you mentioned passing parameters back to a SQL sproc from excel.....

http://sqlwithmanoj.wordpress.com/2009/09/24/create-parameterized-excel-refreshable-reports/

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sifar786 Right there with Babe Points: 750 More actions · Answer 6

Thanks for the links mate. 🙂

BTW, i am currently looking at this Cube Tutorial. But it doesnt explain how to create and deploy cubes made from a single table. In my case, the 2 tables, viz., tbl_BrandYearWise & tbl_ItemWeekwise tables.

Any ideas on how to create cubes from a single denormalized table ?

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 7

sifar786 (7/6/2014)
Thanks for the links mate. 🙂
BTW, i am currently looking at this Cube Tutorial. But it doesnt explain how to create and deploy cubes made from a single table. In my case, the 2 tables, viz., tbl_BrandYearWise & tbl_ItemWeekwise tables.
Any ideas on how to create cubes from a single denormalized table ?

google says...

http://stackoverflow.com/questions/14756846/ssas-cube-from-a-flat-table

http://social.msdn.microsoft.com/Forums/sqlserver/en-US/f8771384-faa4-471e-bb9f-f90971af4a0a/want-to-create-cube-using-single-table

btw....are you doing any aggregation at all...ie SUM/AVG of the KPis...or grouping by Year/Month etc....or is this just transposing week rows to columns for readability in excel?......

also can you explain the rationale around this part of your code...

IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,

((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),

((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sifar786 Right there with Babe Points: 750 More actions · Answer 8

[font="Verdana"]Hi,

If you see the CSV's they dont have a Sale_week column. I am adding it to the query, also the sCountry, sChar (Character language), iYear columns. Each CSV contains 3 yrs (156 weeks) data for a particular Country & Char. So there are 2 CSV's for say, FRANCE viz., DATABASE_FRANCE_EURO.CSV & DATABASE_FRANCE_GLOBAL.CSV. likewise such CSV's for other countries. Also, in each csv fields are same, however Level field contains 2 types of data, Brand level & Item level. Brand level contains consolidated summary totals of Items in each brand. Item level is the detail breakup.

what i am doing is clubbing all the CSV data together using UNION ALL and also adding the above columns to create 2 outputs. The Brand level output - we are only looking at the years, so 3 yrs(156 weeks) as per data (iYear column). The Item level output - we want to fit the 3yrs (156 weeks) data into a 52 week window. The idea is - any Innovation (Description field or SKU) will run only for not more than 52 weeks from the week of launch. So in above SQL code, i am just applying rules:

1] if any Innovation that starts from 1st week of launch and <=52, then it may have had some weeks in previous years. So shift it to column 52-count of sku+1 (Sale_week).

2] else any Innovation that does not start from 1st week of launch and <=52 then just number its column sequentially. (Sale_week).

Exceptions: if you filter on Level='Item', you will see some sku's are having >52 weeks data & 0 & blanks. So i am excluding 0 & blanks & just picking up the maximum sku (<=52).

Outputs: dropdown Filters are on Country Category, Manufacturer, Brand and aggregation done after applying each such filter.

Select:

Your CountrySPAIN

Your CategoryJUICES

Your ManufacturerA.J. BARD

Your BrandRUBICON

Calculations:

Data is grabbed from ItemWeekWise table

Country

Count the number of SKUs in the country with more than 13 weeks of data

For the selected week x compute cumulative values for below KPIs for each of the SKU in the country:

1. Sales Value with Innovation

2. Sales Units with Innovation

3. Weighted Distribution

Based on the above data calculate for each SKU:

4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution

5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution

"Find the below percentiles for each of the above 5 KPIs

90

50

10"

"Find the percent rank of the below values using the cumulative array for the respective KPI:

Cumulative Value Sales (entered by the user)

Cumulative Unit Sales (entered by the user)

Weighted Distribution (entered by the user)

Values Sales Velocity (entered by the user)

Unit Sales Velocity (entered by the user)"

Category

Count the number of SKUs in the category with more than 13 weeks of data

For the selected week x compute cumulative values for below KPIs for each of the SKU in the category:

1. Sales Value with Innovation

2. Sales Units with Innovation

3. Weighted Distribution

based on the above data calculate for each SKU:

4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution

5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution

"Find the below percentiles for each of the above 5 KPIs

90

50

10"

"Find the percent rank of the below values using the cumulative array for the respective KPI:

Cumulative Value Sales (entered by the user)

Cumulative Unit Sales (entered by the user)

Weighted Distribution (entered by the user)

Values Sales Velocity (entered by the user)

Unit Sales Velocity (entered by the user)"

Manufacturer

Count the number of SKUs in the manufacturer with more than 13 weeks of data

For the selected week x compute cumulative values for below KPIs for each of the SKU under the manufacturer:

1. Sales Value with Innovation

2. Sales Units with Innovation

3. Weighted Distribution

based on the above data calculate for each SKU:

Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution

4.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution

"5.Find the below percentiles for each of the above 5 KPIs

90

50

10"

"Find the percent rank of the below values using the cumulative array for the respective KPI:

Cumulative Value Sales (entered by the user)

Cumulative Unit Sales (entered by the user)

Weighted Distribution (entered by the user)

Values Sales Velocity (entered by the user)

Unit Sales Velocity (entered by the user)"

Brand

Count the number of SKUs in the brand with more than 13 weeks of data

For the selected week x compute cumulative values for below KPIs for each of the SKU under the brand:

1. Sales Value with Innovation

2. Sales Units with Innovation

3. Weighted Distribution

based on the above data calculate for each SKU:

4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution

5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution

"Find the below percentiles for each of the above 5 KPIs

90

50

10"

"Find the percent rank of the below values using the cumulative array for the respective KPI:

Cumulative Value Sales (entered by the user)

Cumulative Unit Sales (entered by the user)

Weighted Distribution (entered by the user)

Values Sales Velocity (entered by the user)

Unit Sales Velocity (entered by the user)"

This is just one output sheet, there are 5 more output sheets. LOL 🙂

Let me know if something is not clear.[/font]

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 9

think you are going to be busy here.....:w00t::w00t:

there are some regular posters on here that are far more experienced than I in BI...hopefully one will pick this up as well.

out of interest....when you have all the data together for each country...how many distinct products (descriptions) do you have?

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

sifar786 Right there with Babe Points: 750 More actions · Answer 10

Once i pivot the tbl_ItemWeekwise, i get around ~61,000 distinct sku's.

if you can refer my post to anyone with BI know-how that would be faster.

Thanks.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 11

sifar786 (7/6/2014)
[font="Verdana"]Hi,
Outputs: dropdown Filters are on Country Category, Manufacturer, Brand and aggregation done after applying each such filter.
Select:
Your CountrySPAIN
Your CategoryJUICES
Your ManufacturerA.J. BARD
Your BrandRUBICON
Calculations:
Data is grabbed from ItemWeekWise table
Country
Count the number of SKUs in the country with more than 13 weeks of data
For the selected week x compute cumulative values for below KPIs for each of the SKU in the country:
1. Sales Value with Innovation
2. Sales Units with Innovation
3. Weighted Distribution
Based on the above data calculate for each SKU:
4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution
5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution
"Find the below percentiles for each of the above 5 KPIs
90
50
10"
"Find the percent rank of the below values using the cumulative array for the respective KPI:
Cumulative Value Sales (entered by the user)
Cumulative Unit Sales (entered by the user)
Weighted Distribution (entered by the user)
Values Sales Velocity (entered by the user)
Unit Sales Velocity (entered by the user)"
[/font]

do have an example of the output you are looking to provide?

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day