How to read the Website content in c#?
Here is how you would do it using the HtmlAgilityPack. First your sample HTML:
var html = "<html>\r\n<body>\r\nbla bla </td><td>\r\nbla bla \r\n<body>\r\n<html>";
Load it up (as a string in this case):
var doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(html);
If getting it from the web, similar:
var web = new HtmlWeb(); var doc = web.Load(url);
Now select only text nodes with non-whitespace and trim them.
var text = doc.DocumentNode.Descendants() .Where(x => x.NodeType == HtmlNodeType.Text && x.InnerText.Trim().Length > 0) .Select(x => x.InnerText.Trim());
You can get this as a single joined string if you like:
String.Join(" ", text)