Now, we know that we need web scraping, let us start with web scraping. Today we will scrape very simple data using a very beginner friendly python library.
Today, we will be needing two python libraries at first let us install them using pip and then import them.
Here I am installing them together, you can install them one by one
pip install requests
pip install bs4
For me it is already installed so it is showing ‘already satisfied’. Your might take a few seconds.
And then we save the source code in a variable and then use the bs4 library to parse and extract the required data.
We used BeautifulSoup, which is a class in the bs4 library and it takes the source_code which we got as a response from the website.
Now we will use selector to find out the data we seek.
Let us say we will build a program which will extract the number of Covid-19 case from the worldometers.info website. for that we need to get the url to the page where we will get the data we want from and then use the requests library to send a HTTP request and the website will give us the source code.
Parsing the data
we can get it easily from the title and then split it to get the number.
Now let us say we are interested in the number of recovered cases which we can not scrape from title. So let us look for the data in the HTML source code or we can inspect the element from the browser and get the tag or class for the element.
This is bad, it has spited the number of total cases. Because from the source code we can see those both elements the same class and tag name. So the program has just gave us the first value it encountered.
So, Now you are familiar with scraping, it is your job to find out how you can scrape the number of the recovered cases. And that way you can scrape most of the websites. Thank you. Happy scraping!