Results 1 to 6 of 6

Thread: Retrieving multiple attributes from an XML file

  1. #1
    Lavastorm Employee
    Join Date
    Jun 2009
    Location
    UK
    Posts
    18

    Default Retrieving multiple attributes from an XML file

    When using the XML parser node to extract data I’m able to retrieve all data elements from a file with the following content:

    <rootElement creationDate=”2013-10-13” sequenceNumber=”5” elementCount=”2”>
    <dataElement1> Value1 </dataElement1>
    <dataElement2> Value2 </dataElement2>
    </rootElement>

    As seen above the file contains a “header” record with multiple "root" attributes. Using the @attributeHandler logic (mentioned in the documentation) I’m able to retrieve one attribute, for example:

    @attributeHandler('/CPI/@sequenceNumber')
    def TypeHandler(attr):
    data['SequenceNumber_'] = attr.nodeValue

    When trying to retrieve more than one attribute results using similar code an error is produced “The attributeHandler decorator can be used on only one function”. Is there a straightforward way to retrieve all attributes?

    Many thanks

    Henk
    Henk Thomas

  2. #2
    Lavastorm Employee
    Join Date
    Aug 2009
    Location
    Cologne
    Posts
    513

    Default

    Hi,

    Someone might have better information on how to get this working.... however, from what I understand, you can't use multiple attributeHandler decorators as described in the XMLpy File tutorial where it states:
    the attributeHandler can only be used on a single function
    Therefore, I think that the best way around this is to simply process the element rootElement and retrieve the attributes and sub-elements from this rootElement.

    If rootElement is very large (e.g. contains a lot of sub-elements), then a different approach may be necessary to ensure that the entire contents of rootElement are not loaded into memory.

    However, assuming that this is not a problem, then you should be able to use the following code to read all the data from the XML snippet you have attached:

    Code:
    def Initialize():
        metadata = {}
        metadata['SequenceNumber_'] = 123
        metadata['CreationDate_'] = "123"
        metadata['ElementCount_'] = 1
        metadata['Value1'] = "V1"
        metadata['Value2'] = "V2"
        setMetadata(metadata)
        
    @elementHandler('/rootElement')
    def RootHandler(element):
        data = {}
        data['SequenceNumber_'] = element.sequenceNumber
        data['CreationDate_'] = element.creationDate
        data['ElementCount_'] = element.elementCount
        data['Value1'] = element.dataElement1
        data['Value2'] = element.dataElement2
        outputRecord(data,0)

    In this sample, the Initialize method will simply set the metadata to ensure that the correct types are used.
    The RootHandler method then handles the rootElement element.
    The sequenceNumber, creationDate and elementCount attributes are obtained from this element - as are the sub-elements dataElement1 and dataElement2.
    Then the record is written with this data.


    Hope this helps some,

    Tim.

  3. #3
    Lavastorm Employee
    Join Date
    Jun 2009
    Location
    UK
    Posts
    18

    Default

    Hi Tim,

    Many thanks for your reply. I’ve taken your suggestion and tried various variations. I can retrieve the root element (one record), but then fail to extract the data elements using the same node (should result in two records). Then I realized that the sample file provided has a slightly different structure than described in the specification (and initially mentioned):

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <rootElement creationDate="2013-01-31" sequenceNumber="15" elementCount="2">                  
                     <dataGroup>        
                               <dataElement1> abcdef </dataElement1>
                               <dataElement2> 123456 </dataElement2>
                               <dataElement3> a1zqs2xw </dataElement3>
                     </dataGroup>       
                     <dataGroup>        
                               <dataElement1> xwystu </dataElement1>
                               <dataElement2> 78692 </dataElement2>
                               <dataElement3> c3dev4fr </dataElement3>
                     </dataGroup>
    </rootElement>
    Where elementCount contains the number of “Orders” data elements.

    The number of data elements can be quite significant. You already mention the fact that you don’t want to load the entire content when extracting the root elements, I guess this could be an issue. That is why attributeHandler logic looked of interest, but indeed as mentioned can only be used on a single function.

    As far as I can see the same structure is used for all files, and the “rootElement” or header data is always found on the second line of each file. Hopefully this could simplify a possible solution.

    Attached is a small graph containing the node that I have used for testing.

    Many thanks,

    Henk
    Attached Files Attached Files
    Last edited by henk01; 04-03-2013 at 02:23 PM.
    Henk Thomas

  4. #4
    Lavastorm Employee
    Join Date
    Sep 2009
    Location
    Boston, Massachusetts, USA
    Posts
    50

    Default

    Dear Henk,

    You have unfortunately run into a limitation in the underlying engine that parses XML for the XMLpy File node. While you can typically use multiple attributeHandler function decorators (that fancy word is the technical term for the "@attributeHandler" thing) on a single function to parse multiple XML attributes, the Amara engine cannot handle attributeHandler's that point to different attributes on the same element.

    In light of that, I have attached a graph that outputs these three attributes each via their own XMLpy File node and then joins them all together into a single record. From what you have written above, it sounds like you want to then join these attributes with each record that is output from each dataGroup. The attached graph performs that join as well.

    Hopefully, this should be pretty close to your goal. If not, feel free to write back.

    Hope that helps,

    Rocco
    Attached Files Attached Files

  5. #5
    Lavastorm Employee
    Join Date
    Jun 2009
    Location
    UK
    Posts
    18

    Default

    Thanks Rocco,

    I thought just to double check whether there was somekind of useful trick "under the hood". The attached solution will work and I'll adopt/tweak the logic for the reading graph.


    Many thanks,

    Henk
    Henk Thomas

  6. #6
    Lavastorm Employee
    Join Date
    Aug 2009
    Location
    Cologne
    Posts
    513

    Default

    Hey,

    An old post I know...
    But just to let you know, there is a new XML node available in the Lavastorm Analytic Library v 2.16 called "XML Data" which allows you to retrieve this sort of format with no configuration.

    See attached example graph.

    Regards,
    Tim.
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •