Jump to content

Extracting from BeatitfulSoup4 and storing as list elements in Python


dil_bert

Recommended Posts

hello dear community,


i am currently workin on a little python programme that does some extracting from BS4 and storing as list elements in Python.


As i am fairly new to Python i need some help with that.  Nonetheless, I'm trying to write a very simple Spider for web crawling. Here's my first approach:
I need to fetch the data out of this page: http://europa.eu/youth/volunteering/evs-organisation_en

Firstly, I do a view on the page source to find HTML elements? view-source:https://europa.eu/youth/volunteering/evs-organisation_en
i have to extract data wrapped within multiple HTML  tags from the above mentioned webpage using BeautifulSoup4.
I have to stored all of the extracted data in a list.  But I want each of the extracted data as separate list elements separated by a comma.

here we have the HTML content structure:

<div class="view-content">
            <div class="row is-flex"></span>
                 <div class="col-md-4"></span>
            <div class </span>
  <div class= >
    <h4 Data 1 </span>
          <div class= Data 2</span>
            <p class=
    <i class=
     <strong>Data 3 </span>
</p>    <p class= Data 4 </span>
          <p class= Data 5 </span>
                  <p><strong>Data 6</span>
        <div class=</span>
      <a href="Data 7</span>
  </div>
</div>


well an approach would be:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import urllib

my_url ='http://europa.eu/youth/volunteering/evs-organisation_en'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

cc = page_soup.findAll("td",{"class":""})

for i in range(10):
    print(cc[0+i].text, i)

guess i need some slight changes to code in order to get the thing  working.-


Code to extract:

for data in elem.find_all('span', class_=""):

This should give an output:

data = [ele.text for ele in soup.find_all('span', {'class':'NormalTextrun'})]
print(data)


Output: [' Data 1 ', ' Data 2 ', ' Data 3 ' and so forth]

question: / i need help with the extraction part...


love to hear from you

yours dilbert
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.