html Python e BeautifulSoup, non trovando 'a'

2 Answers

Metodi di stringa Python

for item in html.split("<strong>"):
    if "class" in item and "inlinesave action" in item:
        url_with_junk = item.split('href="')[1]
        m = url_with_junk.index('">') 
        print url_with_junk[:m]
python html beautifulsoup

Ecco un pezzo di codice HTML (delizioso):

<a rel="nofollow" class="taggedlink " href="" >Generate Secure Links with Anonymous Referers &amp; Anti-Bot Protection</a>
<span class="saverem">
  <em class="bookmark-actions">
    <strong><a class="inlinesave action" href="/save?;title=Generate%20Secure%20Links%20with%20Anonymous%20Referers%20%26%20Anti-Bot%20Protection&amp;jump=%2Fdux&amp;key=fFS4QzJW2lBf4gAtcrbuekRQfTY-&amp;original_user=dux&amp;copyuser=dux&amp;copytags=web+apps+url+security+generator+shortener+anonymous+links">SAVE</a></strong>

Sto cercando di trovare tutti i link in cui class = "inlinesave action". Ecco il codice:

sock = urllib2.urlopen('')
html =
soup = BeautifulSoup(html)
tags = soup.findAll('a', attrs={'class':'inlinesave action'})
print len(tags)

Ma non trova nulla!

qualche idea?


Potresti avanzare in qualche modo usando il pyparsing:

from pyparsing import makeHTMLTags, withAttribute

htmlsrc="""<h4>... etc."""

atag = makeHTMLTags("a")[0]
atag.setParseAction(withAttribute(("class","inlinesave action")))

for result in atag.searchString(htmlsrc):
    print result.href

Fornisce (output di risultati lunghi tagliato a "..."):