RegEx Questions

Discussion Forums discussion RegEx Questions

This topic contains 0 voices and has 6 replies.

Viewing 7 posts - 1 through 7 (of 7 total)
Author Posts
Author Posts
October 30, 2009 at 5:13 pm #718

NickDMax
Member

I was trying to update some of the wiki on Regular Expressions and found some oddities — Though first a question:

#1. What is the base for PN’s Find/Replace RegEx? Boost::XPressive? Boost::Regex?

Now The RegEx seems to support Look ahead/behind well enough:

‘(.*)(?=ton)’ — seems to work.

‘(?<=Apple)(.*)’ — seems to work.

but the (?!…) and (?<!…) non-capture groups really seem to be buggy. They Work in some situations but seem to fall apart when combined with greedy quantifiers.

‘(?<!H)(.*)’ will still match ‘Hello’ and

‘(?<!H)([elo]*)’ does not seem to match anything. (For example Bello Xello etc).

‘(.)(?!e)’ works fine

‘(.*)(?!e)’ Matches every line… (sort of makes sense since this regex matches to end of line charaters…but even things like ‘(.*)(?!e$)’ fail.

So I am looking for information on the RegEx library used so that I can get a feel for what should be updated in the Wiki.

October 31, 2009 at 5:32 pm #16857

simon
Key Master

Hi Nick,

We’re using Boost:XPressive for the library, and they’ve been very responsive with bugs in the past. It’s worth trying with just that library (if you’re C++ friendly) to check it’s an XPressive bug rather than a PN bug.

Simon.

November 2, 2009 at 5:13 pm #16858

NickDMax
Member

I had read in your blog that you were looking into XPressive so this is what I thought that you were using. I will try it out in XPressive and let you know what I find. I do happen to be C++ friendly, and roughly familiar with Boost::XPressive. This information should also help me in bettering the wiki documentation.

November 3, 2009 at 2:55 pm #16859

NickDMax
Member

Just a note: Don’t try to use the regex: “Q.*E” — this locks up. In fact from my tests I would say don’t use a */+ inside a quoted sequence. The following works “Q([.])E” — and will find any occurrences of “([.])” without having to use the regex “([.])”. This only seems to affect usage of */+ inside of the quoted sequence.

To test to ensure it was not a PN problem I used the following which locks up:

#include <iostream>
#include <boost/xpressive/xpressive.hpp>

using namespace boost::xpressive;

int main()
{
std::string hello( "hello .* world!" );

std::cout << "compile regex" << std::endl;
sregex rex = sregex::compile( "\Q.*\E" );
smatch what;

std::cout << "Get Iterator" << std::endl;
sregex_iterator cur( hello.begin(), hello.end(), rex );
sregex_iterator end;

std::cout << "begin search" << std::endl;
while( cur != end ) {
smatch const &what = *cur;
std::cout << "found: " << what[0] << 'n';
cur++;
}

return 0;
}

This locks up at the compile stage which explains why PN locks up given that regex.

Hopefully we can get Xpressive to address this because oddly enough, searching for a regular expression inside of code is something that may come up within a Programmer’s Notepad.

However Simon I would like to say that the addition of XPressive is outstanding! It would be great if we could use some of the features like named groups and perhaps plug-in formatting (esp. PyPN scripts if that would be possible).

I will try to update the wiki throughout this week (I think the bulk is already done).

November 3, 2009 at 3:22 pm #16860

NickDMax
Member

I created a Boost ticket #3586 for this.

November 3, 2009 at 3:22 pm #16861

simon
Key Master

Ah nice, I hadn’t come across quoted sequences before – useful, shame about the bug!

Named groups would be nice to support for search and replace, they’re used all over PN for the tool output parsing and the like.

Could you explain the plugin formatting further? I’m guessing you want some way to hand off the formatting of expressions in the replace box to a plugin? A scenario example would make this more clear.

November 4, 2009 at 8:12 am #16862

NickDMax
Member

Well for example today I needed to create a bunch of constants in some code. I generally like my constants to have all capital names. So I used a regex like this to get build the constants from an exported excel table:

Search: ^(w+).*$

Replace: Public Const 1 as String = “1″

Then I had to go an highlight each identifier and hit ctrl-shift-U to make it upper case.

It would have been nice to be able to have the replace string as something like:

Public Const $(u1) as String = “1″

I *think* that xpressive may actually support U L formatting expression using the format_perl formatted — but none of my attempts to get this to work have been successful and I can’t find an example.

If it is not feasible to add a little more functionality to the replacement format — then allowing PyPN to get search results as an object would allow a PyPN script to preform the actual replacement function. Right now one would have to use the Python RegEx — and I suppose there is nothing really wrong with that approach.

BTW: The Boost bug report has been resolved so in boost 41 the error should be removed — you were right those guys are on top of things.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.