Extracting from BeatitfulSoup4 and storing as list elements in Python

dil_bert · January 29, 2018

hello dear community,

i am currently workin on a little python programme that does some extracting from BS4 and storing as list elements in Python.

As i am fairly new to Python i need some help with that. Nonetheless, I'm trying to write a very simple Spider for web crawling. Here's my first approach:
I need to fetch the data out of this page: http://europa.eu/youth/volunteering/evs-organisation_en

Firstly, I do a view on the page source to find HTML elements? view-source:https://europa.eu/youth/volunteering/evs-organisation_en
i have to extract data wrapped within multiple HTML tags from the above mentioned webpage using BeautifulSoup4.
I have to stored all of the extracted data in a list. But I want each of the extracted data as separate list elements separated by a comma.

here we have the HTML content structure:

<div class="view-content">
            <div class="row is-flex"></span>
                 <div class="col-md-4"></span>
            <div class </span>
  <div class= >
    <h4 Data 1 </span>
          <div class= Data 2</span>
            <p class=
    <i class=
     <strong>Data 3 </span>
</p>    <p class= Data 4 </span>
          <p class= Data 5 </span>
                  <p><strong>Data 6</span>
        <div class=</span>
      <a href="Data 7</span>
  </div>
</div>

well an approach would be:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import urllib

my_url ='http://europa.eu/youth/volunteering/evs-organisation_en'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

cc = page_soup.findAll("td",{"class":""})

for i in range(10):
    print(cc[0+i].text, i)

guess i need some slight changes to code in order to get the thing working.-

Code to extract:

for data in elem.find_all('span', class_=""):

This should give an output:

data = [ele.text for ele in soup.find_all('span', {'class':'NormalTextrun'})]
print(data)

Output: [' Data 1 ', ' Data 2 ', ' Data 3 ' and so forth]

question: / i need help with the extraction part...

love to hear from you

yours dilbert

Sign In

Extracting from BeatitfulSoup4 and storing as list elements in Python

Recommended Posts

dil_bert

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information