티스토리 뷰

I have a lot of experience in programming low-level MSHTML and I always see questions on how one can use MSHTML to parse HTML and then access elements via the DOM. 

내가 저수준 MSHTML 프로그래밍에 경험이 쫌 있는데, 늘 사람들이 HTML을 파싱하고 DOM을 통해 접근하기 위해서 어떻게 MSHTML을 사용하는지를 묻는것을 봐왔어.

Well, here it is. I use IMarkupServices provided by MSHTML. There is no need for an IOleClientSite or any sort of embedding. I think is is just about as light as anyone can get.

그래, 여기있어. 나는 MSHTML에서 제공하는 IMarkupServices 를 사용하거든. IOleClientSite이나 어떤 종류의 임베딩도 필요 없어. 나는 이게 누구나 할 수 있는 젤 간단한 방법이라고 생각해.

In future articles, I will be concentrating on the reuse of MSHTML in other aspects of programming. Such as using MSHTML as an editor, for example.

나중에, 내가 다른 측면의 프로그래밍에서 MSHTML의 재사용에 중점을 둘건데, 뭐 예를들자면 에디터 제작에 MSHTML을 사용하는 거 같은거야.

This code makes use of simple COM calls and nothing more. It can be easily adapted for ATL, MFC and VB, among other languages. Please don't ask me to provide samples in other languages. In order to build this you need the IE SDK

이 코드는 간단한 COM 호출외에는 없어. ATL, MFC 그리고 VB등에서 쉽게 사용가능하지. 제발 다른 언어 샘플을 나한테 요청하지 말아줘. 이거 빌드하려면 IE SDK필요한거 명심하고.


  1. /******************************************************************
  2.  * ParseHTML.cpp
  3.  *
  4.  *  ParseHTML: Lightweight UI-less HTML parser using MSHTML
  5.  *
  6.  *  Note: This is for accessing the DOM only. No image download,
  7.  *        script execution, etc...
  8.  *
  9.  *  8 June 2001 - Asher Kobin (asherk@pobox.com)
  10.  *
  11.  *  THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY
  12.  *  OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
  13.  *  LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR
  14.  *  FITNESS FOR A PARTICULAR PURPOSE.
  15.  *
  16.  *******************************************************************/
  17. 
    
  18. #include 
  19. #include 
  20. 
    
  21. OLECHAR szHTML[] = OLESTR("Hello World!");
  22. 
    
  23. int __stdcall WinMain(HINSTANCE hInst,
  24.                       HINSTANCE hPrev,
  25.                       LPSTR lpCmdLine,
  26.                       int nShowCmd)
  27. {
  28.   IHTMLDocument2 *pDoc = NULL;
  29. 
    
  30.   CoInitialize(NULL);
  31. 
    
  32.   CoCreateInstance(CLSID_HTMLDocument,
  33.                    NULL,
  34.                    CLSCTX_INPROC_SERVER,
  35.                    IID_IHTMLDocument2,
  36.                    (LPVOID *) &pDoc);
  37. 
    
  38.   if (pDoc)
  39.   {
  40.     IPersistStreamInit *pPersist = NULL;
  41. 
    
  42.     pDoc->QueryInterface(IID_IPersistStreamInit,
  43.                        (LPVOID *) &pPersist);
  44. 
    
  45.     if (pPersist)
  46.     {
  47.       IMarkupServices *pMS = NULL;
  48. 
    
  49.       pPersist->InitNew();
  50.       pPersist->Release();
  51. 
    
  52.       pDoc->QueryInterface(IID_IMarkupServices,
  53.                               (LPVOID *) &pMS);
  54. 
    
  55.       if (pMS)
  56.       {
  57.         IMarkupContainer *pMC = NULL;
  58.         IMarkupPointer *pMkStart = NULL;
  59.         IMarkupPointer *pMkFinish = NULL;
  60. 
    
  61.         pMS->CreateMarkupPointer(&pMkStart);
  62.         pMS->CreateMarkupPointer(&pMkFinish);
  63. 
    
  64.         pMS->ParseString(szHTML,
  65.                          0,
  66.                          &pMC,
  67.                          pMkStart,
  68.                          pMkFinish);
  69. 
    
  70.         if (pMC)
  71.         {
  72.           IHTMLDocument2 *pNewDoc = NULL;
  73. 
    
  74.           pMC->QueryInterface(IID_IHTMLDocument,
  75.                               (LPVOID *) &pNewDoc);
  76. 
    
  77.           if (pNewDoc)
  78.           {
  79.             // do anything with pNewDoc, in this case
  80.             // get the body innerText.
  81. 
    
  82.             IHTMLElement *pBody;
  83.             pNewDoc->get_body(&pBody);
  84. 
    
  85.             if (pBody)
  86.             {
  87.               BSTR strText;
  88. 
    
  89.               pBody->get_innerText(&strText);
  90.               pBody->Release();
  91. 
    
  92.               SysFreeString(strText);
  93.             }
  94. 
    
  95.             pNewDoc->Release();
  96.           }
  97. 
    
  98.           pMC->Release();
  99.         }
  100. 
    
  101.         if (pMkStart)
  102.             pMkStart->Release();
  103. 
    
  104.         if (pMkFinish)
  105.           pMkFinish->Release();
  106. 
    
  107.         pMS->Release();
  108.       }
  109.     }
  110. 
    
  111.     pDoc->Release();
  112.   }
  113. 
    
  114.   CoUninitialize();
  115. 
    
  116.   return TRUE;
  117. }
저작자 표시
신고
댓글
댓글쓰기 폼