Extract the HTML from a page loaded in TWebBrowser


How can I get the HTML from a web page that I loaded in TWebBrowser? I want to clip some web contents?


You can use the Document property - it has a lot of interesting properties:

  • Document.All
  • Document.bgColor
  • Document.Body.innerHTML
  • Document.Body.Style.overflowX
  • Document.Body.Style.overflowY
  • Document.Body.Style.zoom
  • Document.cookie
  • Document.documentElement.innerHTML
  • Document.documentElement.innerText
  • Document.FileSize
  • Document.Frames
  • Document.Images
  • Document.LastModified
  • Document.Links
  • Document.Location.Protocol
  • Document.ParentWindow
  • Document.ParentWindow.ScrollBy(iX: Integer; iY: Integer)
  • Document.Selection
  • Document.Title
  • Document.URL

of which the Body.innerText will serve our purpose. The only limitation of this solution is that it is giving us the HTML as the web browser displays it - which may be different from what 'View Source' in Internet Explorer would show. If the original HTML file included javascript dynamically generating content like this:

<script language='JavaScript'>
document.write('Hello Visitor');

then the above function will show the output 'Hello Visitor' but not the original javascript. You need to take a look at the browser cache to get to the original file or use something other than TWebBrowser.

// tested with Delphi 6, should work in Delphi 5 as well

procedure TForm1.WebBrowser1DocumentComplete(Sender: TObject;
  const pDisp: IDispatch; var URL: OleVariant);
  document : IHTMLDocument2;
  s : string;
  // extract the day's total earnings etc
  Document := Webbrowser1.Document as IHTMLDocument2;
  s := Document.Body.innerHTML;

  // process this string to extract contents

Content-type: text/html


2006-01-24, 10:59:02
[hidden] from United Kingdom  
Piotr Borowski
2006-04-14, 01:16:36
anonymous from Vietnam  
function GetBrowserHtml(const webBrowser: TWebBrowser): String;
strStream: TStringStream;
adapter: IStream;
browserStream: IPersistStreamInit;
strStream := TStringStream.Create('');
browserStream := webBrowser.Document as IPersistStreamInit;
adapter := TStreamAdapter.Create(strStream,soReference);
result := strStream.DataString;

2006-04-14, 01:17:44
anonymous from Vietnam  
<p>Trần quốc Trung</P
2007-02-08, 05:08:25
anonymous from Austria  
Only body element exists in documents which has body element, so that first script is full shit.
2007-02-08, 05:14:07
anonymous from Austria  
Result: WideString;
Doc: IHTMLDocument3;
Browser: TWebBrowser;
Doc := Browser.Document as IHTMLDocument3;
Result := Doc.documentElement.innerHTML;
... and this is all folks, guys! And no full code from C++ programmer - simply like as Delphi

2013-08-02, 02:57:58
anonymous from India  
TWebBrowser.Document loads html documnet.But i am using TIWURLWindow in place of TWebBrowser.Can any tell which property i can use in place of Document coz it doesnot work with TIWURLWindow?

2013-08-02, 16:18:41
anonymous from Canada  
2013-11-29, 08:13:01

2015-01-30, 01:26:23
anonymous from Netherlands  
