Page Index Toggle Pages: 1 2 [3] 4  ReplyAdd Poll Send Topic
Very Hot Topic (More than 25 Replies) SVN 2.6.11 branch (Read 14718 times)
 
Paste Member Name in Quick Reply Box Dandello
Forum Administrator
YaBB Modder
*****
Offline


I love YaBB 2.7!

Posts: 2234
Location: The Land of YaBB
Joined: Feb 12th, 2014
Gender: Female
Mood: Annoyed
Zodiac sign: Virgo
Re: SVN 2.6.11 branch
Reply #30 - Mar 22nd, 2015 at 2:03pm
Mark & QuoteQuote  
Monni wrote on Mar 22nd, 2015 at 1:59pm:
Clipping or truncating is bad... It should just leave out the post contents if the post is too long... Clipping or truncating can cause issues in parsing HTML when closing tags gets clipped out but opening tag doesn't...
                     


Very true - the truncating function is iffy.
  

Perfection is not possible. Excellence, however, is excellent.
Back to top
WWW  
IP Logged
 
Paste Member Name in Quick Reply Box Dandello
Forum Administrator
*****
Offline


I love YaBB 2.7!

Posts: 2234
Location: The Land of YaBB
Joined: Feb 12th, 2014
Gender: Female
Mood: Annoyed
Zodiac sign: Virgo
Re: SVN 2.6.11 branch
Reply #31 - Mar 23rd, 2015 at 3:14pm
Mark & QuoteQuote  
Okay - rolled back to the 'preformated, stripped text only' version of the e-mails. We'll come back to this when we start seriously looking at 2.6.2.

The
Code (Perl)
Select All
    $thismessage =~ s/<.*?>//g; 

in Post.pm should be okay in the short term because YaBB creates very simple html. So catching the extra (unmatched) sharp brackets after running through FromHTML should solve most of the current sharp brackets issues in e-mails.
  

Perfection is not possible. Excellence, however, is excellent.
Back to top
WWW  
IP Logged
 
Paste Member Name in Quick Reply Box Monni
Language
***
Offline


Min izāmō

Posts: 413
Location: Kaarina, Finland
Joined: Jul 16th, 2014
Gender: Male
Mood: Frustrated
Zodiac sign: Pisces
Re: SVN 2.6.11 branch
Reply #32 - Mar 23rd, 2015 at 6:26pm
Mark & QuoteQuote  
Dandello wrote on Mar 23rd, 2015 at 3:14pm:
Okay - rolled back to the 'preformated, stripped text only' version of the e-mails. We'll come back to this when we start seriously looking at 2.6.2.

The
Code (Perl)
Select All
    $thismessage =~ s/<.*?>//g; 

in Post.pm should be okay in the short term because YaBB creates very simple html. So catching the extra (unmatched) sharp brackets after running through FromHTML should solve most of the current sharp brackets issues in e-mails.


The problem with that was that the .* part also matches ">" which results in everything stripped between two html tags.
  
Back to top
IP Logged
 
Paste Member Name in Quick Reply Box Dandello
Forum Administrator
*****
Offline


I love YaBB 2.7!

Posts: 2234
Location: The Land of YaBB
Joined: Feb 12th, 2014
Gender: Female
Mood: Annoyed
Zodiac sign: Virgo
Re: SVN 2.6.11 branch
Reply #33 - Mar 23rd, 2015 at 7:41pm
Mark & QuoteQuote  
That's the code everybody uses as an example of what works with very simple html tags. Plus
Code
Select All
*? 

is a 'lazy' or 'non-greedy' quantifier in Perl.  In this case the '>' marks the place it starts looking for matches before the '>', working backwards.
  

Perfection is not possible. Excellence, however, is excellent.
Back to top
WWW  
IP Logged
 
Paste Member Name in Quick Reply Box Monni
Language
***
Offline


Min izāmō

Posts: 413
Location: Kaarina, Finland
Joined: Jul 16th, 2014
Gender: Male
Mood: Frustrated
Zodiac sign: Pisces
Re: SVN 2.6.11 branch
Reply #34 - Mar 23rd, 2015 at 7:53pm
Mark & QuoteQuote  
Dandello wrote on Mar 23rd, 2015 at 7:41pm:
That's the code everybody uses as an example of what works with very simple html tags. Plus
Code
Select All
*? 

is a 'lazy' or 'non-greedy' quantifier in Perl.  In this case the '>' marks the place it starts looking for matches before the '>', working backwards.

It works when there is none stray < or >... But if there is even one stray >, it doesn't... it also fails miserably if there is no other characters between < and >, which is alternative way to say !=.

http://regexr.com/ is the tool I use to check regex patterns for bugs...
  

regex1.png ( 22 KB | 141 Downloads )
regex2.png ( 20 KB | 135 Downloads )
Back to top
IP Logged
 
Paste Member Name in Quick Reply Box Dandello
Forum Administrator
*****
Offline


I love YaBB 2.7!

Posts: 2234
Location: The Land of YaBB
Joined: Feb 12th, 2014
Gender: Female
Mood: Annoyed
Zodiac sign: Virgo
Re: SVN 2.6.11 branch
Reply #35 - Mar 23rd, 2015 at 9:01pm
Mark & QuoteQuote  
Which means that the issue with autolink urls will break it. Looking at how that section of code evolved I think we can probably remove the
Code
Select All
$thismessage =~ s/<.*?>//g; 

lines in while leaving the
Code
Select All
$thismessage =~ s/\[.*?\]//g; 

  that's a few lines above it because the chances of an errant ']' in YaBB is smaller than an errant '>'. (Plus that's what's been used, supposedy successfully for ages, in PM notifications.)

The supposed best solution is to use something like HTML:: Parser to remove the HTML tags.


  

Perfection is not possible. Excellence, however, is excellent.
Back to top
WWW  
IP Logged
 
Paste Member Name in Quick Reply Box Monni
Language
***
Offline


Min izāmō

Posts: 413
Location: Kaarina, Finland
Joined: Jul 16th, 2014
Gender: Male
Mood: Frustrated
Zodiac sign: Pisces
Re: SVN 2.6.11 branch
Reply #36 - Mar 23rd, 2015 at 9:39pm
Mark & QuoteQuote  
Dandello wrote on Mar 23rd, 2015 at 9:01pm:
Which means that the issue with autolink urls will break it. Looking at how that section of code evolved I think we can probably remove the
Code
Select All
$thismessage =~ s/<.*?>//g; 

lines in while leaving the
Code
Select All
$thismessage =~ s/[.*?]//g; 

  that's a few lines above it because the chances of an errant ']' in YaBB is smaller than an errant '>'. (Plus that's what's been used, supposedy successfully for ages, in PM notifications.)

The supposed best solution is to use something like HTML:: Parser to remove the HTML tags.




It is not big trouble to fix both YaBB and HTML tag detection as long as we know which characters can appear at start of tag and which characters can appear at end of tag... This is how I changed the HTML tag detection to detect everything else except comments, because those have very specific restrictions that make the regex very long...
  
Back to top
IP Logged
 
Paste Member Name in Quick Reply Box Dandello
Forum Administrator
*****
Offline


I love YaBB 2.7!

Posts: 2234
Location: The Land of YaBB
Joined: Feb 12th, 2014
Gender: Female
Mood: Annoyed
Zodiac sign: Virgo
Re: SVN 2.6.11 branch
Reply #37 - Mar 24th, 2015 at 12:11am
Mark & QuoteQuote  
Um, this doesn't test out as a good regex in Perl.
Code
Select All
~<(([a-z]+|[A-Z]+)( ?/?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))>~ 



However this does:
Code (Perl)
Select All
~</?([A-Za-z](?>[^\s>/]*))(?>(?Sad?>[^>"']+)|"[^"]*"|'[^']*')*)>~ 

and it checks out at http://www.regexplanet.com/advanced/perl/index.html (The code is from O'Reilly's Regular Expressions Cookbook.)
  

Perfection is not possible. Excellence, however, is excellent.
Back to top
WWW  
IP Logged
 
Paste Member Name in Quick Reply Box Monni
Language
***
Offline


Min izāmō

Posts: 413
Location: Kaarina, Finland
Joined: Jul 16th, 2014
Gender: Male
Mood: Frustrated
Zodiac sign: Pisces
Re: SVN 2.6.11 branch
Reply #38 - Mar 24th, 2015 at 11:08am
Mark & QuoteQuote  
Dandello wrote on Mar 24th, 2015 at 12:11am:
However this does:
Code (Perl)
Select All
~</?([A-Za-z](?>[^s>/]*))(?>(?Sad?>[^>"']+)|"[^"]*"|'[^']*')*)>~ 

and it checks out at http://www.regexplanet.com/advanced/perl/index.html (The code is from O'Reilly's Regular Expressions Cookbook.)


It's still wrong... Initial / can't be optional... it's either initial / or final / but not both...

It is possible that it chokes on you because it still thinks / is special character even though it uses ~ as separator...

Alternative version with escaped / :
Code
Select All
 s/<(([a-z]+|[A-Z]+)( ?\/?|[^a-zA-Z<>][^<>]*[^\/<>]\/?)|\/([a-z]+|[A-Z]+))>//g
  

  
Back to top
IP Logged
 
Paste Member Name in Quick Reply Box Dandello
Forum Administrator
*****
Offline


I love YaBB 2.7!

Posts: 2234
Location: The Land of YaBB
Joined: Feb 12th, 2014
Gender: Female
Mood: Annoyed
Zodiac sign: Virgo
Re: SVN 2.6.11 branch
Reply #39 - Mar 24th, 2015 at 1:35pm
Mark & QuoteQuote  
Code
Select All
Sequence (?/...) not recognized in regex; marked by <-- HERE in m/(?isx)<(([a-z]+|[A-Z]+)( ?/ <-- HERE ?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))>/ 


Code
Select All
Sequence (?\...) not recognized in regex; marked by <-- HERE in m/(?isx)<(([a-z]+|[A-Z]+)( ?\ <-- HERE /?|[^a-zA-Z<>][^<>]*[^\/<>]\/?)|\/([a-z]+|[A-Z]+))>/
  



It's the '?/?' that chokes.

You can test it out for Perl here: http://www.regexplanet.com/advanced/perl/index.html
  

Perfection is not possible. Excellence, however, is excellent.
Back to top
WWW  
IP Logged
 
Paste Member Name in Quick Reply Box Dandello
Forum Administrator
*****
Offline


I love YaBB 2.7!

Posts: 2234
Location: The Land of YaBB
Joined: Feb 12th, 2014
Gender: Female
Mood: Annoyed
Zodiac sign: Virgo
Re: SVN 2.6.11 branch
Reply #40 - Mar 24th, 2015 at 1:47pm
Mark & QuoteQuote  
And so long as
Code (Perl)
Select All
</?([A-Za-z](?>[^s>/]*))(?>(?Sad?>[^>"']+)|"[^"]*"|'[^']*')*)> 

is set to global, it does work to remove just tags from code like this:
Code (HTML)
Select All
<p class="class">>>>test<<<</p><br /> 

  

Perfection is not possible. Excellence, however, is excellent.
Back to top
WWW  
IP Logged
 
Paste Member Name in Quick Reply Box Monni
Language
***
Offline


Min izāmō

Posts: 413
Location: Kaarina, Finland
Joined: Jul 16th, 2014
Gender: Male
Mood: Frustrated
Zodiac sign: Pisces
Re: SVN 2.6.11 branch
Reply #41 - Mar 24th, 2015 at 2:19pm
Mark & QuoteQuote  
Dandello wrote on Mar 24th, 2015 at 1:35pm:
Code
Select All
Sequence (?/...) not recognized in regex; marked by <-- HERE in m/(?isx)<(([a-z]+|[A-Z]+)( ?/ <-- HERE ?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))>/ 


Code
Select All
Sequence (?...) not recognized in regex; marked by <-- HERE in m/(?isx)<(([a-z]+|[A-Z]+)( ? <-- HERE /?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))>/
  



It's the '?/?' that chokes.

You can test it out for Perl here: http://www.regexplanet.com/advanced/perl/index.html


So it doesn't like the literal space character... that can be fixed by using \s instead.
  
Back to top
IP Logged
 
Paste Member Name in Quick Reply Box Dandello
Forum Administrator
*****
Offline


I love YaBB 2.7!

Posts: 2234
Location: The Land of YaBB
Joined: Feb 12th, 2014
Gender: Female
Mood: Annoyed
Zodiac sign: Virgo
Re: SVN 2.6.11 branch
Reply #42 - Mar 24th, 2015 at 2:47pm
Mark & QuoteQuote  
Okay. This one works according to http://www.regexplanet.com/advanced/perl/index.html
Code (Perl)
Select All
<(([a-z]+|[A-Z]+)(\s/?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))> 



and so does the one from Regular Expressions Cookbook page 426.
  

Perfection is not possible. Excellence, however, is excellent.
Back to top
WWW  
IP Logged
 
Paste Member Name in Quick Reply Box Monni
Language
***
Offline


Min izāmō

Posts: 413
Location: Kaarina, Finland
Joined: Jul 16th, 2014
Gender: Male
Mood: Frustrated
Zodiac sign: Pisces
Re: SVN 2.6.11 branch
Reply #43 - Mar 24th, 2015 at 3:36pm
Mark & QuoteQuote  
Dandello wrote on Mar 24th, 2015 at 2:47pm:
Okay. This one works according to http://www.regexplanet.com/advanced/perl/index.html
Code (Perl)
Select All
<(([a-z]+|[A-Z]+)(\s/?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))> 



and so does the one from Regular Expressions Cookbook page 426.


if you leave out ? after \s, it fails on tags like <br/>, but works for <br />...

on the other hand... YaBB chokes on \s when replying, strips off \, it's tricky because it has to match either 1 character or 2 characters because the next rule only matches 3 characters or more as it has to check the start of tag isn't mixed-case. Wink
  
Back to top
IP Logged
 
Paste Member Name in Quick Reply Box Dandello
Forum Administrator
*****
Offline


I love YaBB 2.7!

Posts: 2234
Location: The Land of YaBB
Joined: Feb 12th, 2014
Gender: Female
Mood: Annoyed
Zodiac sign: Virgo
Re: SVN 2.6.11 branch
Reply #44 - Mar 24th, 2015 at 4:13pm
Mark & QuoteQuote  
Okay according to http://www.regexplanet.com/advanced/perl/index.html

Code (Perl)
Select All
</?([A-Za-z](?>[^s>/]*))(?>(?Sad?>[^>"']+)|"[^"]*"|'[^']*')*)>  


and
Code (Perl)
Select All
<(([a-z]+|[A-Z]+)(\s?/?|[^a-zA-Z<>][^<>]*[^/<>]/?)|/([a-z]+|[A-Z]+))>
  


behave the same even on some fairly tricky (and wrong) html.

And YaBB thoroughly trashes what's inside the code div when converting to e-mail text.

Tomorrow or so I'll look at true html e-mails again.
  

Perfection is not possible. Excellence, however, is excellent.
Back to top
WWW  
IP Logged
 
Page Index Toggle Pages: 1 2 [3] 4 
ReplyAdd Poll Send Topic
Bookmarks: del.icio.us Digg Facebook Google LinkedIn reddit Twitter Yahoo
SVN 2.6.11 branch

Please type the characters exactly as they appear in the image,
without the first 2 and last 2 characters.
The characters must be typed in the same order,
and they are case-sensitive.
Open Preview Preview

You can resize the textbox by dragging the right or bottom border.
Off Topic Comment Insert Spoiler
Insert Hyperlink Insert FTP Link Insert Image Insert E-mail Insert Media Insert Table Insert Table Row Insert Table Column Insert Horizontal Rule Insert Teletype Insert Code Insert Quote Edited Superscript Subscript Insert List /me - my name Insert Marquee Insert Timestamp No Parse
Bold Italicized Underline Insert Strikethrough Highlight
                       
Change Text Color
Insert Preformatted Text Left Align Centered Right Align
resize_wb
resize_hb







Max 5000 characters. Remaining characters:
Text size: %
More Smilies
View All Smilies
Collapse additional features Collapse/Expand additional features Smiley Wink Cheesy Grin Angry Sad Shocked Cool Huh Roll Eyes Tongue Embarrassed Lips Sealed Undecided Kiss Cry