+2 votes
in Programming Languages by (39.4k points)

I want to extract some texts from a webpage. The texts are within the span tag with itemprop="name". How to specify itemprop="name" in the BeautifulSoup function to get values?


<span itemprop="name">Stanford University</span>

1 Answer

+2 votes
by (281k points)
selected by
Best answer

You can use itemprop="name" as an argument of the find_all() function to search all span tags with itemprop="name".

Here is an example:

from bs4 import BeautifulSoup
import urllib.request as ur

url = full_url_of_the_webpage
req = ur.Request(url, None, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' })
rs = ur.urlopen(req)

soup = BeautifulSoup(rs, 'html.parser')
for sp in soup.find_all('span',itemprop="name"):

 To run this code, replace "full_url_of_the_webpage" with actual url of the page.