So in short you want all HTML entities stripped except bold and italic.
What about entities for things like < (<) or > (>) or high-bit characters like the euro sign or TM etc. Should those be stripped as well or should they be de-encoded?
You’re a Mac guy right? How about BBEdit or the free version (TextWrangler)? If you need to use the regular expression within Servoy (presumably) then that will at least head you in the right direction. I know regular expressions pretty well but not HTML. BBEdit though is especially used by web designers, etc. and I’m sure will get you going.
/*
===============================================================================
Based on stripHTML (ASP script) by James Crooke
http://www.jamescrooke.co.uk/articles/regular-expression-asp-strip-tags/
Adapted for Servoy by Robert J.C. Ivens, ROCLASI Software Solutions
===============================================================================
*/
var sHTML = arguments[0],
sAllowTags = arguments[1],
aMatches = null,
sTagName = "";
sAllowTags = ("," + utils.stringReplace(sAllowTags, " ", "") + ",").toLowerCase();
aMatches = sHTML.match(/<(.|\n)+?>/g);
if( aMatches )
{
for ( var i = 0 ; i < aMatches.length ; i++ )
{
sTagName = aMatches[i].replace(/<(\/?)(\w+)[^>]*>/,"$2");
sTagName = "," + sTagName.toLowerCase() + ",";
if ( utils.stringPatternCount(sAllowTags, sTagName) == 0 )
{
sHTML = utils.stringReplace(sHTML, aMatches[i], "");
}
}
}
return sHTML;
Call this method like so ```
sHTML = myMethodName(sHTML, “b,i,br”);
But this still doesn't solve your HTML entity de-encoding problems. Also embedded CSS stylesheets are not filtered out.
Maybe for a next version <img src="{SMILIES_PATH}/icon_wink.gif" alt=";)" title="Wink" />
Hope this helps.
Okay, if you want to filter out the complete header (title, meta tags, etc) of a webpage, any embedded scripts and stylesheets add the following code right after the variables declaration.
sHTML = sHTML.replace(/<head>.+<\/head>/,""); // Strip the whole header
sHTML = sHTML.replace(/<script.*>.+<\/script>/g,""); // Strip any embedded JavaScript
sHTML = sHTML.replace(/<style.*>.+<\/style>/g,""); // Strip any embedded StyleSheets