excel-vba page automatically - Scraping data from website using vba





5 Answers

you can use winhttprequest object instead of internet explorer as it's good to load data excluding pictures n advertisement instead of downloading full webpage including advertisement n pictures those make internet explorer object heavy compare to winhttpRequest object.

with code examples

Im trying to scrape data from website: http://uk.investing.com/rates-bonds/financial-futures via vba, like real-time price, i.e. German 5 YR Bobl, US 30Y T-Bond, i have tried excel web query but it only scrapes the whole website, but I would like to scrape the rate only, is there a way of doing this?




Scraping data from website using vba

There are several ways of doing this. This is an answer that I write hoping that all the basics of Internet Explorer automation will be found when browsing for the keywords "scraping data from website", but remember that nothing's worth as your own research (if you don't want to stick to pre-written codes that you're not able to customize).

Please note that this is one way, that I don't prefer in terms of performance (since it depends on the browser speed) but that is good to understand the rationale behind Internet automation.

1) If I need to browse the web, I need a browser! So I create an Internet Explorer browser:

Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")

2) I ask the browser to browse the target webpage. Through the use of the property ".Visible", I decide if I want to see the browser doing its job or not. When building the code is nice to have Visible = True, but when the code is working for scraping data is nice not to see it everytime so Visible = False.

With appIE
    .Navigate "http://uk.investing.com/rates-bonds/financial-futures"
    .Visible = True
End With

3) The webpage will need some time to load. So, I will wait meanwhile it's busy...

Do While appIE.Busy
    DoEvents
Loop

4) Well, now the page is loaded. Let's say that I want to scrape the change of the US30Y T-Bond: What I will do is just clicking F12 on Internet Explorer to see the webpage's code, and hence using the pointer (in red circle) I will click on the element that I want to scrape to see how can I reach my purpose.

5) What I should do is straight-forward. First of all, I will get by the ID property the tr element which is containing the value:

Set allRowOfData = appIE.document.getElementById("pair_8907")

Here I will get a collection of td elements (specifically, tr is a row of data, and the td are its cells. We are looking for the 8th, so I will write:

Dim myValue As String: myValue = allRowOfData.Cells(7).innerHTML

Why did I write 7 instead of 8? Because the collections of cells starts from 0, so the index of the 8th element is 7 (8-1). Shortly analysing this line of code:

  • .Cells() makes me access the td elements;
  • innerHTML is the property of the cell containing the value we look for.

Once we have our value, which is now stored into the myValue variable, we can just close the IE browser and releasing the memory by setting it to Nothing:

appIE.Quit
Set appIE = Nothing

Well, now you have your value and you can do whatever you want with it: put it into a cell (Range("A1").Value = myValue), or into a label of a form (Me.label1.Text = myValue).

I'd just like to point you out that this is not how works: here you post questions about specific coding problems, but you should make your own search first. The reason why I'm answering a question which is not showing too much research effort is just that I see it asked several times and, back to the time when I learned how to do this, I remember that I would have liked having some better support to get started with. So I hope that this answer, which is just a "study input" and not at all the best/most complete solution, can be a support for next user having your same problem. Because I have learned how to program thanks to this community, and I like to think that you and other beginners might use my input to discover the beautiful world of programming.

Enjoy your practice ;)




This question asked long before. But I thought following information will useful for newbies. Actually you can easily get the values from class name like this.

Sub ExtractLastValue()

Set objIE = CreateObject("InternetExplorer.Application")

objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600

objIE.Visible = True

objIE.Navigate ("https://uk.investing.com/rates-bonds/financial-futures/")

Do
DoEvents
Loop Until objIE.readystate = 4

MsgBox objIE.document.getElementsByClassName("pid-8907-last")(0).innerText

End Sub

And if you are new to web scraping please read this blog post.

Web Scraping - Basics

And also there are various techniques to extract data from web pages. This article explain few of them with examples.

Web Scraping - Collecting Data From a Webpage




you can use winhttprequest object instead of internet explorer as it's good to load data excluding pictures n advertisement instead of downloading full webpage including advertisement n pictures those make internet explorer object heavy compare to winhttpRequest object.




Website data table scraper

If you want to use Excel functions to navigate the tables why not dump the tables first onto a worksheet this code works for me

Option Explicit

Sub Scraper()
    Dim appIE As Object

    ' As per http://.com/questions/27066963/scraping-data-from-website-using-vba

    Set appIE = CreateObject("internetexplorer.application")

    With appIE
       .Navigate "http://uk.investing.com/rates-bonds/financial-futures" 'Sample page
       .Visible = True
    End With

    Do While appIE.Busy
        DoEvents
        Application.Wait (Now + TimeValue("0:00:01")) 'If page not open, wait a second befor trying again
    Loop

    'Debug.Print TypeName(appIE.document)

    Dim doc As Object 'MSHTML.HTMLDocument
    Set doc = appIE.document

    '* appIE busy is good but you need to wait for the whole document to completely load and initialise so use this
    While doc.readyState <> "complete"
        DoEvents
    Wend

    '* we can select all the tables because they share the same CSS class name
    Dim tablesSelectedByClass As Object 'MSHTML.HTMLElementCollection
    Set tablesSelectedByClass = doc.getElementsByClassName("genTbl")

    '* you can change this, it was just convenient for me to add sheets to my workbook
    Dim shNewResults As Excel.Worksheet
    Set shNewResults = ThisWorkbook.Worksheets.Add

    Dim lRowCursor As Long  '* this controls pasting down the sheet
    lRowCursor = 1

    Dim lTableIndexLoop As Long
    For lTableIndexLoop = 0 To tablesSelectedByClass.Length - 1

        Dim tableLoop As Object 'MSHTML.HTMLTable
        Set tableLoop = tablesSelectedByClass.Item(lTableIndexLoop)

        If LenB(tableLoop.ID) > 0 Then  '* there are some extra nonsense tables, this subselects

            Dim sParentColumn As String, objParentColumn As Object ' MSHTML.HTMLSemanticElement
            Set objParentColumn = FindMyColumn(tableLoop, sParentColumn) '* need to understand is table on left hand or right hand side

            Dim vHeader As Variant: vHeader = Empty
            If sParentColumn = "leftColumn" Then
                '* tables on the left have a preceding H3 element with the table's description
                Dim objH3Headers As Object
                Set objH3Headers = objParentColumn.getElementsByTagName("H3")
                vHeader = objH3Headers.Item(lTableIndexLoop).innerText
            Else
                '* tables on the right have a hidden attribute we can use
                vHeader = tableLoop.Attributes.Item("data-gae").Value
                If Len(vHeader) > 3 Then
                    vHeader = Mid$(vHeader, 4)
                    Mid$(vHeader, 1, 1) = Chr(Asc(Mid$(vHeader, 1, 1)) - 32)
                End If
            End If

            '* tables on the right do not have column headers
            Dim bHasColumnHeaders As Boolean
            bHasColumnHeaders = (tableLoop.ChildNodes.Length = 2)

            Dim vTableCells() As Variant   '* this will be our table data container which we will paste in one go
            Dim lRowCount As Long: lRowCount = 0
            Dim lColumnCount As Long: lColumnCount = 0
            Dim lDataHeadersSectionIdx As Long: lDataHeadersSectionIdx = 0
            Dim objColumnHeaders As Object: Set objColumnHeaders = Nothing

            If bHasColumnHeaders Then

                Set objColumnHeaders = tableLoop.ChildNodes.Item(0).ChildNodes.Item(0)

                lRowCount = lRowCount + 1

                lDataHeadersSectionIdx = 1
            Else
                lDataHeadersSectionIdx = 0
            End If

            Dim objDataRows As Object 'MSHTML.HTMLElementCollection
            Set objDataRows = tableLoop.ChildNodes.Item(lDataHeadersSectionIdx).ChildNodes
            lColumnCount = objDataRows.Item(0).ChildNodes.Length

            lRowCount = lRowCount + objDataRows.Length

            ReDim vTableCells(1 To lRowCount, 1 To lColumnCount) As Variant

            '* we have them get the column headers
            Dim lColLoop As Long
            If bHasColumnHeaders Then
                For lColLoop = 1 To lColumnCount
                    vTableCells(1, lColLoop) = objColumnHeaders.ChildNodes.Item(lColLoop - 1).innerText
                Next
            End If

            '* get the data cells
            Dim lRowLoop As Long
            For lRowLoop = 1 To lRowCount - VBA.IIf(bHasColumnHeaders, 1, 0)
                For lColLoop = 1 To lColumnCount
                    vTableCells(lRowLoop + VBA.IIf(bHasColumnHeaders, 1, 0), lColLoop) = objDataRows.Item(lRowLoop - 1).ChildNodes.Item(lColLoop - 1).innerText
                Next
            Next

            '* paste our table description
            shNewResults.Cells(lRowCursor, 1).Value2 = vHeader
            lRowCursor = lRowCursor + 1

            '* paste our table data
            shNewResults.Cells(lRowCursor, 1).Resize(lRowCount, lColumnCount).Value2 = vTableCells
            lRowCursor = lRowCursor + lRowCount + 1
        End If

    Next

End Sub

Function FindMyColumn(ByVal node As Object, ByRef psColumn As String) As Object
    '* this code ascends the DOM looking for "column" in the id of each node
    While InStr(1, node.ID, "column", vbTextCompare) = 0 And Not node.ParentNode Is Nothing
        DoEvents
        Set node = node.ParentNode
    Wend
    If InStr(1, node.ID, "column", vbTextCompare) > 0 Then
        Set FindMyColumn = node
        psColumn = CStr(node.ID)
    End If


End Function

By the way, if you trade a lot the brokers get rich and you get poor, brokerage charges really impact in long run.




Related