Bug in Dynamicweb.Core.Helpers.StripHtml

Kenneth Radoor

Posted on 09/11/2018 00:23:50

I dont know if this is a bug or an feature, but if the string that is going to be parsed with the method is not valid html, it vil throw an out of memory exeption.

if the string contains a < without a matching > it becomes an endless loop.

if the sting is "this will fail <3 with more", it fails

using StripHtmlAlternative will not fail, but remove everything after the < sign.

Its not an easy thing to strip html even harder if the string is not valid html, but maybe it would be more foolproof using "HTML Agility Pack"

/Kenneth

Replies

Nicolai Pedersen

Posted on 09/11/2018 10:31:23

Hi Kenneth

That is the exact reason why there are 2 of them.

HTML Agility pack will not solve that - just make it slow and complicated. You just need to decide what kind of behavior you want, and you are very welcome to use html agility pack in your code instead.

BR Nicolai

Kenneth Radoor

Posted on 09/11/2018 11:22:40

I know this is really really hard and i will see if i can come up with some solution.

It was just a heads up for a possible issue in the DW core caursing an out of memory exeption.

StripHtmlAlternative will not throw that exception, but will strip every thing after the first unmatched <

/Kenneth

Nicolai Pedersen

Posted on 09/11/2018 12:08:14

Hi Kenneth

ok, thanks. Here is the code - feel free to rewrite it.

public static string StripHtml(string html)
        {
            while (html.Contains("<"))
            {
                int index = html.IndexOf("<");
                int indexEnd = html.IndexOf(">", index);

                //We have a starttag, but no end tag...
                if (indexEnd == 0)
                {
                    html = html.Substring(0, index - 1);
                }

                else if (index >= 1)
                {
                    html = html.Substring(0, index) + html.Substring(indexEnd + 1);
                    //The tag is in the beginning of the string
                }
                else
                {
                    html = html.Substring(indexEnd + 1);
                }
            }

return html;
}

        public static string StripHtmlAlternative(string html)
        {
            return Regex.Replace(html, "<[^>]*>|<.*$", "", RegexOptions.IgnoreCase);
        }

Olga Shedko

Posted on 12/11/2018 03:49:11

This post has been marked as an answer

Hello Kenneth,

Responsible Developer has investigated this problem carefully and found a solution for its solving. TFS # 57373 has been created for that. Will be fixed to the next hot-fix release.

Thank you.

Best regards,

Olga | QA

Votes for this answer: 1

Kristian Kirkholt

Posted on 11/12/2018 15:14:14

Hi Kenneth

The problem regarding #57373 "problem in Dynamicweb.Core.Helpers.StripHtml " has now been resolved in Dynamicweb version 9.5.6

To upgrade please choose this version from download:

http://doc.dynamicweb.com/releases-and-downloads/releases

Let me know if you need any more help regarding this

Kind Regards
Dynamicweb Support
Kristian Kirkholt

You must be logged in to post in the forum

Developer forum

Bug in Dynamicweb.Core.Helpers.StripHtml

Replies