Extract the HTML from a page loaded in TWebBrowser

How can I get the HTML from a web page that I loaded in TWebBrowser? I want to clip some web contents?


You can use the Document property - it has a lot of interesting properties:

  • Document.All
  • Document.bgColor
  • Document.Body.innerHTML
  • Document.Body.Style.overflowX
  • Document.Body.Style.overflowY
  • Document.Body.Style.zoom
  • Document.cookie
  • Document.documentElement.innerHTML
  • Document.documentElement.innerText
  • Document.FileSize
  • Document.Frames
  • Document.Images
  • Document.LastModified
  • Document.Links
  • Document.Location.Protocol
  • Document.ParentWindow
  • Document.ParentWindow.ScrollBy(iX: Integer; iY: Integer)
  • Document.Selection
  • Document.Title
  • Document.URL

of which the Body.innerText will serve our purpose. The only limitation of this solution is that it is giving us the HTML as the web browser displays it - which may be different from what 'View Source' in Internet Explorer would show. If the original HTML file included javascript dynamically generating content like this:

<script language='JavaScript'>
document.write('Hello Visitor');

then the above function will show the output 'Hello Visitor' but not the original javascript. You need to take a look at the browser cache to get to the original file or use something other than TWebBrowser.

// tested with Delphi 6, should work in Delphi 5 as well
 procedure TForm1.WebBrowser1DocumentComplete(Sender: TObject;
   const pDisp: IDispatch; var URL: OleVariant);
   document : IHTMLDocument2;
   s : string;
   // extract the day's total earnings etc
   Document := Webbrowser1.Document as IHTMLDocument2;
   s := Document.Body.innerHTML;
   // process this string to extract contents

