Developer forum

Forum » Development » Bug in Dynamicweb.Core.Helpers.StripHtml

Bug in Dynamicweb.Core.Helpers.StripHtml

Kenneth Radoor
Reply

I dont know if this is a bug or an feature, but if the string that is going to be parsed with the method is not valid html, it vil throw an out of memory exeption.

if the string contains a < without a matching > it becomes an endless loop.

if the sting is "this will fail <3 with more", it fails

using StripHtmlAlternative will not fail, but remove everything after the < sign.

Its not an easy thing to strip html even harder if the string is not valid html, but maybe it would be more foolproof using "HTML Agility Pack"

 

/Kenneth

Replies

 
Nicolai Pedersen
Nicolai Pedersen
Reply

Hi Kenneth

That is the exact reason why there are 2 of them.

HTML Agility pack will not solve that - just make it slow and complicated. You just need to decide what kind of behavior you want, and you are very welcome to use html agility pack in your code instead.

BR Nicolai

 
Kenneth Radoor
Reply

I know this is really really hard and i will see if i can come up with some solution.

It was just a heads up for a possible issue in the DW core caursing an out of memory exeption.

StripHtmlAlternative will not throw that exception, but will strip every thing after the first unmatched <

 

/Kenneth

 
Nicolai Pedersen
Nicolai Pedersen
Reply

Hi Kenneth

ok, thanks. Here is the code - feel free to rewrite it.

public static string StripHtml(string html)
        {
            while (html.Contains("<"))
            {
                int index = html.IndexOf("<");
                int indexEnd = html.IndexOf(">", index);

                //We have a starttag, but no end tag...
                if (indexEnd == 0)
                {
                    html = html.Substring(0, index - 1);
                }

                else if (index >= 1)
                {
                    html = html.Substring(0, index) + html.Substring(indexEnd + 1);
                    //The tag is in the beginning of the string
                }
                else
                {
                    html = html.Substring(indexEnd + 1);
                }
            }

            return html;
        }

        
        public static string StripHtmlAlternative(string html)
        {
            return Regex.Replace(html, "<[^>]*>|<.*$", "", RegexOptions.IgnoreCase);
        }

 
Olga Shedko
Olga Shedko
Reply
This post has been marked as an answer

Hello Kenneth,

Responsible Developer has investigated this problem carefully and found a solution for its solving. TFS # 57373 has been created for that. Will be fixed to the next hot-fix release.

Thank you.

Best regards,

Olga | QA

 

Votes for this answer: 1

 

You must be logged in to post in the forum