I am guessing the term ICO, or “Initial Coin Offering”, is quite familiar to a lot of people at this point. ICO OF KOREA, however, is definitely something new for most of the people on medium. Korea…
Beautiful Soup is a python library that can be used as a way to scrap data by traversing the DOM tree. Today, I will just be discussing how to scrape data from websites statically. A suggested way to scrape data for websites that dynamically is to use both Selenium and Beautiful Soup.
In the terminal, I wrote the following for python version 3:
Pip3 install BeautifulSoup4
Pip3 install requests
Pip3 install bs4
I chose to scrape the AMC theater website for demonstration purposes ONLY. The data I will be trying to get is the title of the movie only. (i.e. Ant Man and the Wasp, Hotel Transylvania 3, skyscraper). Since this is for demonstration only, I don’t want to work with a lot of data. I will only be getting the names of the movies on the slideshow on the main page. There’s less than 20 movies shown.
In my file I save the url by writing the following code:
Line 5 of my code parses the website into a format that beautiful soup can understand.
I see when I inspect Hotel Transylvania 3: Summer Vacation that it says its in an <li> tag.
Just to check if I get any data back, I want to print all the content with <li> tags. Below you can the overabundance of info. That’s not exactly what I want. Let’s get more specific.
Upon closer inspection I can see that each movie is in an <li> tag with a class name “slide”. I also notice that within the <li> tag there’s a <div> tag. Within the <div> tage, there’s an <a> tag that has the name of the movie! Below the movie is Skyscraper.
In my python file, I comment out the original print statement that included all the <li> tags. I first find all the <li> tags with the specific class of “slide” and call them slideElements. Then I loop through all the slideElements to find if the <a> tag has the attribute “aria-label”.
And when we run the file, here’s the result! Just what we wanted.
This simple hack that I came up with was because I had removed Ubuntu 16.04 and installed Ubuntu 18.04. So of course my PC didn’t have those saved passwords from before. But the thing is my cellphone…
Starting in January 2012, I spent two and a half years as a firefighter in the Arlington County Fire Department, in Arlington, VA. I had no childhood dreams of being a firefighter. Growing up, I…
Calvin drove to the Kilearney Hills with a medium sized pepperoni, obsessing over his asshole boss, Matt. “You gotta pick it up, you’re killing me here,” Matt had snarled as Calvin hurriedly grabbed…