Thread: RegexSubstitute with group identifier followed by number

    I am processing text that contains some dates written e.g. 12MAY16 and I want to use regexSubstitute to convert those to e.g. 12MAY2016.

    DocNotes = DocNotes.regexSubstitute("(\\d{2})([A-Z]{3})(\\d{2})\\b","$1$2 20$3") is the closest I've come, but that adds a space e.g. 12MAY 2016.

    Then I tried DocNotes.regexSubstitute("(\\d{2})([A-Z]{3})(\\d{2})\\b","$1$2[2]0$3").replace("[2]","2")

    That seems to do the trick but it feels a bit kludgey. There must be a way to escape from the regex group identifier though, e.g. $1$2\20$3?

    Is there a general reference on how regex works in Brainscript?

    Just to forestall any questions on this, the text I am processing is free text where dates are generally formatted e.g. 12MAY2016. I am later locating all of these with regexMatch and converting them to date variables, but first I need to clean them all up to be in the same format.

    Lavastorm Employee gmullin's Avatar
    Join Date
    May 2014


    I didn't change your regex but what about if you just split it into 2 parts?

    _DocNotes1 = DocNotes.regexSubstitute("(\\d{2})([A-Z]{3})(\\d{2})\\b","$1$2")
    _DocNotes2 = DocNotes.regexSubstitute("(\\d{2})([A-Z]{3})(\\d{2})\\b","20$3")

    _DocNotes = _DocNotes1 + _DocNotes2

    emit _DocNotes as DocNotes

    Lavastorm Employee stonysmith's Avatar
    Join Date
    Nov 2006
    Grapevine Tx


    The regex in LAE is built upon pcre. There is documentation available at

