Results 1 to 3 of 3

Thread: How to acquire specific columns from CSV

  1. #1
    New Contributor
    Join Date
    Nov 2013
    Location
    Melbourne
    Posts
    4

    Default How to acquire specific columns from CSV

    Hi all,
    Quite new to Lavastorm, I have been acquiring data from multiple CSV files each about 4gb in size. That takes a very long time to load, but I only need 4 out of 200 of the available columns from every csv. The 4 columns I am interested in all have the same name and data type. I am hoping that the load time will be shorter if I select less columns. Any advise would be appreciated

  2. #2
    Lavastorm Employee
    Join Date
    Aug 2009
    Location
    Cologne
    Posts
    513

    Default

    Hi,

    There is currently no way to specify only specific columns to extract in the delimited & CSV nodes.
    I have raised this as an enhancement request within our system, as issue# 5343.

    However, I think that for a delimited or CSV format, you will find that the majority of time is spent on parsing the data as opposed to outputting it.
    For fixed format files, then if you are skipping certain fields, then you don't need to parse them so this would be a significant performance improvement.

    However, for a format whereby all of the fields need to be parsed anyway in order to determine their location within a record, I don't think this would make such a big difference.


    Regards,
    Tim.

  3. #3
    New Contributor
    Join Date
    Nov 2013
    Location
    Melbourne
    Posts
    4

    Default

    Thank you Tim, I will give this a go. I am having to load 12 files each weighing approx. 4gb and join them to another 12 files each weighing about the same amount and out of the 200 columns that each category has, this becomes very time consuming. Hope that there will be an enhancement soon.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •