Jump to content
Sign in to follow this  
dil_bert

Extracting from BeatitfulSoup4 and storing as list elements in Python

Recommended Posts

hello dear community,


i am currently workin on a little python programme that does some extracting from BS4 and storing as list elements in Python.


As i am fairly new to Python i need some help with that.  Nonetheless, I'm trying to write a very simple Spider for web crawling. Here's my first approach:
I need to fetch the data out of this page: http://europa.eu/youth/volunteering/evs-organisation_en

Firstly, I do a view on the page source to find HTML elements? view-source:https://europa.eu/youth/volunteering/evs-organisation_en
i have to extract data wrapped within multiple HTML  tags from the above mentioned webpage using BeautifulSoup4.
I have to stored all of the extracted data in a list.  But I want each of the extracted data as separate list elements separated by a comma.

here we have the HTML content structure:

<div class="view-content">
            <div class="row is-flex"></span>
                 <div class="col-md-4"></span>
            <div class </span>
  <div class= >
    <h4 Data 1 </span>
          <div class= Data 2</span>
            <p class=
    <i class=
     <strong>Data 3 </span>
</p>    <p class= Data 4 </span>
          <p class= Data 5 </span>
                  <p><strong>Data 6</span>
        <div class=</span>
      <a href="Data 7</span>
  </div>
</div>


well an approach would be:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import urllib

my_url ='http://europa.eu/youth/volunteering/evs-organisation_en'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

cc = page_soup.findAll("td",{"class":""})

for i in range(10):
    print(cc[0+i].text, i)

guess i need some slight changes to code in order to get the thing  working.-


Code to extract:

for data in elem.find_all('span', class_=""):

This should give an output:

data = [ele.text for ele in soup.find_all('span', {'class':'NormalTextrun'})]
print(data)


Output: [' Data 1 ', ' Data 2 ', ' Data 3 ' and so forth]

question: / i need help with the extraction part...


love to hear from you

yours dilbert

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.