Results 1 to 3 of 3

Thread: Retrieve data from a string using Regular Expression

  1. #1

    Exclamation Retrieve data from a string using Regular Expression

    Hi,

    I have a large string of text (c.32,000 characters), using regular expression, I would like to extract specific data from within the text into its own column - for example, I need to retrieve a date from the following sample text:

    Update Notes.
    The most recent update is listed first.

    Note Date =30/04/2018 16:20:15
    PLN01.02 B END

    I know what regular expression syntax is required, however I need to know how to get the date (and other data) out into it's own field.

    I hope this makes sense!

    Thanks

    James

  2. #2
    Lavastorm Employee
    Join Date
    Nov 2012
    Location
    Warrington, UK
    Posts
    244

    Default

    Assuming your text is in a field named 'Data', the following Filter node would extract the Date and its component day/month/year values to separate output fields:

    Code:
    node:Filter
    bretype:core::Filter
    editor:sortkey=5ba5087f376e5859
    input:@40fd2c74167f1ca2/=Static_Data.40fe6c55598828e5
    output:@40fd2c7420761db6/=
    prop:Script=<<EOX
    ## Specify the Regex pattern
    _pattern = "([0-3]\\d)/([0-1]\\d)/([1-2]\\d{3})"
    
    ## Search the input field named 'Data'
    matchesList = regexMatch(Data, _pattern)	
    
    ## How many matches were found?
    NumMatches = len(matchesList)
    
    ## Define the default for a 'not found' situation
    wantedText = str(null)
    theDay = str(null)
    theMonth = str(null)
    theYear = str(null)
    
    if NumMatches > 0 then {
    	## Get the whole match string for the first match
    	## Note the result of using regexMatch() or regexMatchI() is a list of all the matches found
    	## Each match is itself a list - the first item will be the entire consumed sub- string and 
    	## subsequent elements will be the result of each sub-capture group - i.e. where the pattern 
    	## contained parentheses e.g. "^(*foo )(bar .*)$" the first sub-capture group would be accessed
    	## using index 1 of the list and would contain the text that matched *foo 
    	wantedText = matchesList.getItem(0).getItem(0)
    	theDay = matchesList.getItem(0).getItem(1)
    	theMonth = matchesList.getItem(0).getItem(2)
    	theYear = matchesList.getItem(0).getItem(3)
    } 
    
    emit *, wantedText, theDay, theMonth, theYear
    
    EOX
    editor:XY=530,170
    end:Filter
    Also see the documentation on the regexMatch() and regexMatchI() functions in the help.

  3. #3

    Default

    That's perfect, works a charm, thank you!

    Quote Originally Posted by awilliams1024 View Post
    Assuming your text is in a field named 'Data', the following Filter node would extract the Date and its component day/month/year values to separate output fields:

    Code:
    node:Filter
    bretype:core::Filter
    editor:sortkey=5ba5087f376e5859
    input:@40fd2c74167f1ca2/=Static_Data.40fe6c55598828e5
    output:@40fd2c7420761db6/=
    prop:Script=<<EOX
    ## Specify the Regex pattern
    _pattern = "([0-3]\\d)/([0-1]\\d)/([1-2]\\d{3})"
    
    ## Search the input field named 'Data'
    matchesList = regexMatch(Data, _pattern)	
    
    ## How many matches were found?
    NumMatches = len(matchesList)
    
    ## Define the default for a 'not found' situation
    wantedText = str(null)
    theDay = str(null)
    theMonth = str(null)
    theYear = str(null)
    
    if NumMatches > 0 then {
    	## Get the whole match string for the first match
    	## Note the result of using regexMatch() or regexMatchI() is a list of all the matches found
    	## Each match is itself a list - the first item will be the entire consumed sub- string and 
    	## subsequent elements will be the result of each sub-capture group - i.e. where the pattern 
    	## contained parentheses e.g. "^(*foo )(bar .*)$" the first sub-capture group would be accessed
    	## using index 1 of the list and would contain the text that matched *foo 
    	wantedText = matchesList.getItem(0).getItem(0)
    	theDay = matchesList.getItem(0).getItem(1)
    	theMonth = matchesList.getItem(0).getItem(2)
    	theYear = matchesList.getItem(0).getItem(3)
    } 
    
    emit *, wantedText, theDay, theMonth, theYear
    
    EOX
    editor:XY=530,170
    end:Filter
    Also see the documentation on the regexMatch() and regexMatchI() functions in the help.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •