Results 1 to 5 of 5

Thread: How to update self.outputs[0].metadata, or customise the core::Delimited File node

  1. #1

    Question How to update self.outputs[0].metadata, or customise the core::Delimited File node

    I am building a library node that reads input files with or without a head. For the files without a head, I use a python node to read in data rows; and for the files with a head, I use the core::Delimited File node. After that, I use a core::Cat node to concatenate the two streams as the output (there is only one stream that has data).

    This works fine for files that do have a head. However it has two errors for files without a head:
    1. Because there is no input to the Delimited File node, its output is empty (no metadata). That causes a failure in the core::Cat node.
    2. The Delimited File node is inside a composite, which defines FieldDelimiter and RecordDelimiter parameters. For files without a head, these two parameters are not needed and thus not defined. The Delimited File node triggers a “parameter <…> is not defined” error.

    To avoid the two errors, I try to build a python node to implement the same function as the Delimited File node (If there is any workaround, I shouldn’t build this node).

    In building the node, I have some difficulty in using and updating self.outputs[0].metadata.

    I added code under initialize(self):, to ensure the output pin has metadata even the input to the node had no data, as below:

    def initialize(self):
    ''' Called at node initialization, before first pump.'''
    super(BrainNode, self).initialize()

    om = self.newMetadata()
    om.append("FileName", "string")
    self.outputs[0].metadata = om

    However, after that I could not change metadata, where in the pump(self, quant) function, I wanted to add output columns to the metadata using those read from the first row of the input data.
    • I added self.outputs[0].metadata.append("Column1", "string") inside the pump function. I noticed len(self.outputs[0].metadata) became 2 after that, but Column1 did not appear in the output.
    • When I ppended column1 to another variable, say om2, and then self.outputs[0].metadata = om2, I received an error metadata already defined.
    Last edited by mzhao; 08-11-2010 at 08:56 AM.

  2. #2
    Lavastorm Employee
    Join Date
    Aug 2009
    Location
    Cologne
    Posts
    513

    Default

    Hey,

    First, I'm a bit confused about how this is meant to work...
    It is probably worth your while posting an example graph.

    You say:

    For files without a head, these two parameters are not needed and thus not defined. The Delimited File node triggers a parameter <> is not defined error.
    If the delimited file is to be used, it clearly needs these parameters defined.
    So is it the case that you don't want to use the delimited file node in this situation?

    And in this case, the delimited file node will also have no input?

    Therefore, my guess is that you don't actually want it to execute in this case, so you should set it to be disabled.

    In which case, are you really looking for a Cat function, or a bypass?
    I.e. is there any case with both "files with a head" and "files without a head". If not, then what you should do is set one node to be enabled if using a head, and the other to be disable & vice versa.

    Then pass these to a bypass node.

    How exactly you want these nodes to be enabled/disabled depends on how these should be implemented. It could be based on a parameter in the composite node. Alternately, if it's based on the number of records in an input, then a path enabler or something similar is probably the best bet. If it's based on the existence of an input "Input", then you can use the {{^Input>>^}}
    notation in the enabled parameter.

    Regarding the python code, once the metadata is assigned to an output, it cannot be modified. This is a sensible requirement, since you don't want to go changing the metadata midway through writing to an output.

    However, you can modify the metadata at any point prior to setting it to an output, and you only need to set the metadata prior to writing the first record. So clearly if you want to modify the metadata after reading the first row of input data, you shouldn't make the call
    self.outputs[0].metadata = om
    Until the metadata is finalized based on the input from the first row of data.

  3. #3

    Default

    Hi Tim,

    Thank you very much for your reply. All you suggestions are very useful, though I havent worked out the solution.

    I attach the graph here, together with a few test files. You have to run the graph on a Unix server because the python node uses the Unix tail command (the very reason to do this, otherwise I can use the existing core::Tail node which is too slow tailing a large file).

    1. Bypass: this looks exactly the function I need. Its help text reads The output of a bypass will be the first input that can be satisfied. However I couldnt work it out based on my understanding. Why it cannot pass the python node Get Tail No Head when it has no output?

    2. Parameter: It doesnt work in this case. In the Delimited File node if I set enabled to {{^FileHead^}}, for input without head the Bypass node wont pass (its input pin was red). I saw earlier when you disabled a node, the node taking input from it still executed. Probably this is designed only for some nodes?

    3. Metadata: I understand it better now. But I think I should do this only when Bypass does not work.
    Attached Files Attached Files

  4. #4

    Smile

    Hi Tim,

    I worked it out based on the information you provided. Attached is the graph that does what I need.

    Thanks a lot for your assistance.
    Attached Files Attached Files
    Last edited by mzhao; 08-18-2010 at 01:25 AM. Reason: Update the latest progress

  5. #5
    Lavastorm Employee
    Join Date
    Sep 2009
    Location
    Boston, Massachusetts, USA
    Posts
    50

    Default The FieldNames Parameter

    Quote Originally Posted by mzhao View Post
    I am building a library node that reads input files with or without a head.
    Dear Ming,

    How are you? It's been a while since we last spoke. Hope that all is going well with you.

    I happen to have just read this post--I just got back from holiday and am slowly getting back in the swing of things--and I wanted to point out a feature of the Delimited File node that is easy to overlook.

    From what you have said above, this composite seems to be designed to handle situations in which either the input file has a list of column names in the first line or the names are provided manually. From looking at the node you have posted, it seems that the user will know the format of the input file a priori and be able to set these column names manually in the node. It also seems that in the situation where no header is provided, all data will be in a single column.

    From those observations, I suggest that you look at the FieldNames parameter of the Delimted File node. If you provide a list of column names in this parameter, the Delimited File node will output every row in the input file, matching up the input data with the appropriate column names. However, if you leave this parameter blank, then the node will perform as you expect it. Please also note that in situations where there is only one input column, the FieldDelimeter and RecordDelimeter still need to be provided even though only the RecordDelimeter is used.

    You can see this behavior in action in the attached example.

    The trick to identifying this feature in our documentation is in the explanation of the FieldNames parameter. The documentation says, "Comma separated list of fields. By default, uses the first line of input file." The key is the phrase "by default". This indicates that providing this value inverts the default behavior of this node. Unfortunately, due to the length of the help, it's kinda up to you to figure out exactly how the behavior changes. Luckily, our documentation and our nodes are very consistent so once you figure out how this node behaves, the rest of the Input nodes (and maybe even a few other nodes) behave in very much the same manner with very similar documentation.

    Hope that helps!

    Rocco
    Attached Files Attached Files
    Last edited by rpigneri; 08-30-2010 at 06:03 PM. Reason: Edited for Wording

Similar Threads

  1. SAS database inputs and outputs
    By Iain Sanders in forum Data Acquisition
    Replies: 1
    Last Post: 05-09-2010, 10:55 AM
  2. Replies: 2
    Last Post: 03-26-2010, 03:11 PM
  3. Delimited File Fields
    By rhallmark3 in forum Data Acquisition
    Replies: 7
    Last Post: 03-03-2010, 08:56 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •