What is ICO OF KOREA?

I am guessing the term ICO, or “Initial Coin Offering”, is quite familiar to a lot of people at this point. ICO OF KOREA, however, is definitely something new for most of the people on medium. Korea…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Data Scraping in Python Using Beautiful Soup

Beautiful Soup is a python library that can be used as a way to scrap data by traversing the DOM tree. Today, I will just be discussing how to scrape data from websites statically. A suggested way to scrape data for websites that dynamically is to use both Selenium and Beautiful Soup.

In the terminal, I wrote the following for python version 3:

Pip3 install BeautifulSoup4

Pip3 install requests

Pip3 install bs4

I chose to scrape the AMC theater website for demonstration purposes ONLY. The data I will be trying to get is the title of the movie only. (i.e. Ant Man and the Wasp, Hotel Transylvania 3, skyscraper). Since this is for demonstration only, I don’t want to work with a lot of data. I will only be getting the names of the movies on the slideshow on the main page. There’s less than 20 movies shown.

slideshow on the main page

In my file I save the url by writing the following code:

Line 5 of my code parses the website into a format that beautiful soup can understand.

I see when I inspect Hotel Transylvania 3: Summer Vacation that it says its in an <li> tag.

Just to check if I get any data back, I want to print all the content with <li> tags. Below you can the overabundance of info. That’s not exactly what I want. Let’s get more specific.

Upon closer inspection I can see that each movie is in an <li> tag with a class name “slide”. I also notice that within the <li> tag there’s a <div> tag. Within the <div> tage, there’s an <a> tag that has the name of the movie! Below the movie is Skyscraper.

In my python file, I comment out the original print statement that included all the <li> tags. I first find all the <li> tags with the specific class of “slide” and call them slideElements. Then I loop through all the slideElements to find if the <a> tag has the attribute “aria-label”.

And when we run the file, here’s the result! Just what we wanted.

Add a comment

Related posts:

How to see saved WiFi passwords in Mi Phones

This simple hack that I came up with was because I had removed Ubuntu 16.04 and installed Ubuntu 18.04. So of course my PC didn’t have those saved passwords from before. But the thing is my cellphone…

Everything You Never Knew About Being A Firefighter

Starting in January 2012, I spent two and a half years as a firefighter in the Arlington County Fire Department, in Arlington, VA. I had no childhood dreams of being a firefighter. Growing up, I…

Headlights

Calvin drove to the Kilearney Hills with a medium sized pepperoni, obsessing over his asshole boss, Matt. “You gotta pick it up, you’re killing me here,” Matt had snarled as Calvin hurriedly grabbed…