-
How to acquire specific columns from CSV
Hi all,
Quite new to Lavastorm, I have been acquiring data from multiple CSV files each about 4gb in size. That takes a very long time to load, but I only need 4 out of 200 of the available columns from every csv. The 4 columns I am interested in all have the same name and data type. I am hoping that the load time will be shorter if I select less columns. Any advise would be appreciated
-
Hi,
There is currently no way to specify only specific columns to extract in the delimited & CSV nodes.
I have raised this as an enhancement request within our system, as issue# 5343.
However, I think that for a delimited or CSV format, you will find that the majority of time is spent on parsing the data as opposed to outputting it.
For fixed format files, then if you are skipping certain fields, then you don't need to parse them so this would be a significant performance improvement.
However, for a format whereby all of the fields need to be parsed anyway in order to determine their location within a record, I don't think this would make such a big difference.
Regards,
Tim.
-
Thank you Tim, I will give this a go. I am having to load 12 files each weighing approx. 4gb and join them to another 12 files each weighing about the same amount and out of the 200 columns that each category has, this becomes very time consuming. Hope that there will be an enhancement soon.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules