Dandello Forum AdministratorYaBB Modder Offline I love YaBB 2.7! Posts: 2235 Location: The Land of YaBB Joined: Feb 12th, 2014 Gender: Mood: Annoyed Zodiac sign: Re: SVN 2.6.11 branch Reply #30 - Mar 22nd, 2015 at 2:03pm Mark & QuoteQuote Monni wrote on Mar 22nd, 2015 at 1:59pm:Clipping or truncating is bad... It should just leave out the post contents if the post is too long... Clipping or truncating can cause issues in parsing HTML when closing tags gets clipped out but opening tag doesn't... Very true - the truncating function is iffy. Perfection is not possible. Excellence, however, is excellent. WWW IP Logged
Dandello Forum Administrator Offline I love YaBB 2.7! Posts: 2235 Location: The Land of YaBB Joined: Feb 12th, 2014 Gender: Mood: Annoyed Zodiac sign: Re: SVN 2.6.11 branch Reply #31 - Mar 23rd, 2015 at 3:14pm Mark & QuoteQuote Okay - rolled back to the 'preformated, stripped text only' version of the e-mails. We'll come back to this when we start seriously looking at 2.6.2. The Code (Perl) $thismessage =~ s/<.*?>//g; in Post.pm should be okay in the short term because YaBB creates very simple html. So catching the extra (unmatched) sharp brackets after running through FromHTML should solve most of the current sharp brackets issues in e-mails. Perfection is not possible. Excellence, however, is excellent. WWW IP Logged
Monni Language Offline Min izāmō Posts: 413 Location: Kaarina, Finland Joined: Jul 16th, 2014 Gender: Mood: Frustrated Zodiac sign: Re: SVN 2.6.11 branch Reply #32 - Mar 23rd, 2015 at 6:26pm Mark & QuoteQuote Dandello wrote on Mar 23rd, 2015 at 3:14pm:Okay - rolled back to the 'preformated, stripped text only' version of the e-mails. We'll come back to this when we start seriously looking at 2.6.2. The Code (Perl) $thismessage =~ s/<.*?>//g; in Post.pm should be okay in the short term because YaBB creates very simple html. So catching the extra (unmatched) sharp brackets after running through FromHTML should solve most of the current sharp brackets issues in e-mails. The problem with that was that the .* part also matches ">" which results in everything stripped between two html tags. GTalk Skype/VoIP Facebook Twitter YouTube ICQ IP Logged
Dandello Forum Administrator Offline I love YaBB 2.7! Posts: 2235 Location: The Land of YaBB Joined: Feb 12th, 2014 Gender: Mood: Annoyed Zodiac sign: Re: SVN 2.6.11 branch Reply #33 - Mar 23rd, 2015 at 7:41pm Mark & QuoteQuote That's the code everybody uses as an example of what works with very simple html tags. Plus Code *? is a 'lazy' or 'non-greedy' quantifier in Perl. In this case the '>' marks the place it starts looking for matches before the '>', working backwards. Perfection is not possible. Excellence, however, is excellent. WWW IP Logged
Monni Language Offline Min izāmō Posts: 413 Location: Kaarina, Finland Joined: Jul 16th, 2014 Gender: Mood: Frustrated Zodiac sign: Re: SVN 2.6.11 branch Reply #34 - Mar 23rd, 2015 at 7:53pm Mark & QuoteQuote Dandello wrote on Mar 23rd, 2015 at 7:41pm:That's the code everybody uses as an example of what works with very simple html tags. Plus Code *? is a 'lazy' or 'non-greedy' quantifier in Perl. In this case the '>' marks the place it starts looking for matches before the '>', working backwards. It works when there is none stray < or >... But if there is even one stray >, it doesn't... it also fails miserably if there is no other characters between < and >, which is alternative way to say !=. http://regexr.com/ is the tool I use to check regex patterns for bugs... regex1.png ( 22 KB | 143 Downloads ) regex2.png ( 20 KB | 136 Downloads ) GTalk Skype/VoIP Facebook Twitter YouTube ICQ IP Logged
Dandello Forum Administrator Offline I love YaBB 2.7! Posts: 2235 Location: The Land of YaBB Joined: Feb 12th, 2014 Gender: Mood: Annoyed Zodiac sign: Re: SVN 2.6.11 branch Reply #35 - Mar 23rd, 2015 at 9:01pm Mark & QuoteQuote Which means that the issue with autolink urls will break it. Looking at how that section of code evolved I think we can probably remove the Code $thismessage =~ s/<.*?>//g; lines in while leaving the Code $thismessage =~ s/\[.*?\]//g; that's a few lines above it because the chances of an errant ']' in YaBB is smaller than an errant '>'. (Plus that's what's been used, supposedy successfully for ages, in PM notifications.) The supposed best solution is to use something like HTML:: Parser to remove the HTML tags. Perfection is not possible. Excellence, however, is excellent. WWW IP Logged
Monni Language Offline Min izāmō Posts: 413 Location: Kaarina, Finland Joined: Jul 16th, 2014 Gender: Mood: Frustrated Zodiac sign: Re: SVN 2.6.11 branch Reply #36 - Mar 23rd, 2015 at 9:39pm Mark & QuoteQuote Dandello wrote on Mar 23rd, 2015 at 9:01pm:Which means that the issue with autolink urls will break it. Looking at how that section of code evolved I think we can probably remove the Code $thismessage =~ s/<.*?>//g; lines in while leaving the Code $thismessage =~ s/[.*?]//g; that's a few lines above it because the chances of an errant ']' in YaBB is smaller than an errant '>'. (Plus that's what's been used, supposedy successfully for ages, in PM notifications.) The supposed best solution is to use something like HTML:: Parser to remove the HTML tags. It is not big trouble to fix both YaBB and HTML tag detection as long as we know which characters can appear at start of tag and which characters can appear at end of tag... This is how I changed the HTML tag detection to detect everything else except comments, because those have very specific restrictions that make the regex very long... GTalk Skype/VoIP Facebook Twitter YouTube ICQ IP Logged
Dandello Forum Administrator Offline I love YaBB 2.7! Posts: 2235 Location: The Land of YaBB Joined: Feb 12th, 2014 Gender: Mood: Annoyed Zodiac sign: Re: SVN 2.6.11 branch Reply #37 - Mar 24th, 2015 at 12:11am Mark & QuoteQuote Um, this doesn't test out as a good regex in Perl. Code ~<(([a-z]+|[A-Z]+)( ?/?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))>~ However this does:Code (Perl)~</?([A-Za-z](?>[^\s>/]*))(?>(??>[^>"']+)|"[^"]*"|'[^']*')*)>~ and it checks out at http://www.regexplanet.com/advanced/perl/index.html (The code is from O'Reilly's Regular Expressions Cookbook.) Perfection is not possible. Excellence, however, is excellent. WWW IP Logged
Monni Language Offline Min izāmō Posts: 413 Location: Kaarina, Finland Joined: Jul 16th, 2014 Gender: Mood: Frustrated Zodiac sign: Re: SVN 2.6.11 branch Reply #38 - Mar 24th, 2015 at 11:08am Mark & QuoteQuote Dandello wrote on Mar 24th, 2015 at 12:11am:However this does:Code (Perl)~</?([A-Za-z](?>[^s>/]*))(?>(??>[^>"']+)|"[^"]*"|'[^']*')*)>~ and it checks out at http://www.regexplanet.com/advanced/perl/index.html (The code is from O'Reilly's Regular Expressions Cookbook.) It's still wrong... Initial / can't be optional... it's either initial / or final / but not both... It is possible that it chokes on you because it still thinks / is special character even though it uses ~ as separator... Alternative version with escaped / : Code s/<(([a-z]+|[A-Z]+)( ?\/?|[^a-zA-Z<>][^<>]*[^\/<>]\/?)|\/([a-z]+|[A-Z]+))>//g GTalk Skype/VoIP Facebook Twitter YouTube ICQ IP Logged
Dandello Forum Administrator Offline I love YaBB 2.7! Posts: 2235 Location: The Land of YaBB Joined: Feb 12th, 2014 Gender: Mood: Annoyed Zodiac sign: Re: SVN 2.6.11 branch Reply #39 - Mar 24th, 2015 at 1:35pm Mark & QuoteQuote Code Sequence (?/...) not recognized in regex; marked by <-- HERE in m/(?isx)<(([a-z]+|[A-Z]+)( ?/ <-- HERE ?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))>/ Code Sequence (?\...) not recognized in regex; marked by <-- HERE in m/(?isx)<(([a-z]+|[A-Z]+)( ?\ <-- HERE /?|[^a-zA-Z<>][^<>]*[^\/<>]\/?)|\/([a-z]+|[A-Z]+))>/ It's the '?/?' that chokes. You can test it out for Perl here: http://www.regexplanet.com/advanced/perl/index.html Perfection is not possible. Excellence, however, is excellent. WWW IP Logged
Dandello Forum Administrator Offline I love YaBB 2.7! Posts: 2235 Location: The Land of YaBB Joined: Feb 12th, 2014 Gender: Mood: Annoyed Zodiac sign: Re: SVN 2.6.11 branch Reply #40 - Mar 24th, 2015 at 1:47pm Mark & QuoteQuote And so long as Code (Perl)</?([A-Za-z](?>[^s>/]*))(?>(??>[^>"']+)|"[^"]*"|'[^']*')*)> is set to global, it does work to remove just tags from code like this: Code (HTML)<p class="class">>>>test<<<</p><br /> Perfection is not possible. Excellence, however, is excellent. WWW IP Logged
Monni Language Offline Min izāmō Posts: 413 Location: Kaarina, Finland Joined: Jul 16th, 2014 Gender: Mood: Frustrated Zodiac sign: Re: SVN 2.6.11 branch Reply #41 - Mar 24th, 2015 at 2:19pm Mark & QuoteQuote Dandello wrote on Mar 24th, 2015 at 1:35pm: Code Sequence (?/...) not recognized in regex; marked by <-- HERE in m/(?isx)<(([a-z]+|[A-Z]+)( ?/ <-- HERE ?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))>/ Code Sequence (?...) not recognized in regex; marked by <-- HERE in m/(?isx)<(([a-z]+|[A-Z]+)( ? <-- HERE /?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))>/ It's the '?/?' that chokes. You can test it out for Perl here: http://www.regexplanet.com/advanced/perl/index.html So it doesn't like the literal space character... that can be fixed by using \s instead. GTalk Skype/VoIP Facebook Twitter YouTube ICQ IP Logged
Dandello Forum Administrator Offline I love YaBB 2.7! Posts: 2235 Location: The Land of YaBB Joined: Feb 12th, 2014 Gender: Mood: Annoyed Zodiac sign: Re: SVN 2.6.11 branch Reply #42 - Mar 24th, 2015 at 2:47pm Mark & QuoteQuote Okay. This one works according to http://www.regexplanet.com/advanced/perl/index.html Code (Perl)<(([a-z]+|[A-Z]+)(\s/?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))> and so does the one from Regular Expressions Cookbook page 426. Perfection is not possible. Excellence, however, is excellent. WWW IP Logged
Monni Language Offline Min izāmō Posts: 413 Location: Kaarina, Finland Joined: Jul 16th, 2014 Gender: Mood: Frustrated Zodiac sign: Re: SVN 2.6.11 branch Reply #43 - Mar 24th, 2015 at 3:36pm Mark & QuoteQuote Dandello wrote on Mar 24th, 2015 at 2:47pm:Okay. This one works according to http://www.regexplanet.com/advanced/perl/index.html Code (Perl)<(([a-z]+|[A-Z]+)(\s/?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))> and so does the one from Regular Expressions Cookbook page 426. if you leave out ? after \s, it fails on tags like <br/>, but works for <br />... on the other hand... YaBB chokes on \s when replying, strips off \, it's tricky because it has to match either 1 character or 2 characters because the next rule only matches 3 characters or more as it has to check the start of tag isn't mixed-case. GTalk Skype/VoIP Facebook Twitter YouTube ICQ IP Logged
Dandello Forum Administrator Offline I love YaBB 2.7! Posts: 2235 Location: The Land of YaBB Joined: Feb 12th, 2014 Gender: Mood: Annoyed Zodiac sign: Re: SVN 2.6.11 branch Reply #44 - Mar 24th, 2015 at 4:13pm Mark & QuoteQuote Okay according to http://www.regexplanet.com/advanced/perl/index.html Code (Perl)</?([A-Za-z](?>[^s>/]*))(?>(??>[^>"']+)|"[^"]*"|'[^']*')*)> and Code (Perl)<(([a-z]+|[A-Z]+)(\s?/?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))> behave the same even on some fairly tricky (and wrong) html. And YaBB thoroughly trashes what's inside the code div when converting to e-mail text. Tomorrow or so I'll look at true html e-mails again. Perfection is not possible. Excellence, however, is excellent. WWW IP Logged