Hi Pythonistas!
Today we will learn how to scrap HTML tables from any page with 3 lines of code. We have already published a post on web scraping please click here to visit. Today we are using pandas for this. We are going to print the first table from this link https://www.stackscale.com/blog/most-popular-programming-languages/. Let’s dive into the code.
Code
import pandas
dfs = pandas.read_html("https://www.stackscale.com/blog/most-popular-programming-languages/")
print(dfs[0]) # printing first table only, But we have all the tables in this array
Output
0 1 2
0 Position PYPL ranking September 2022 Stack Overflow’s Developer Survey 2022
1 #1 Python JavaScript
2 #2 Java HTML/CSS
3 #3 JavaScript SQL
4 #4 C# Python
5 #5 C/C++ TypeScript
6 #6 PHP Java
7 #7 R Bash/Shell
8 #8 TypeScript C#
9 #9 Go C++
10 #10 Swift PHP
We can see that data from the website and what we are printing is same. Now we can do our stuff with this data.
Hope this going to help you a lot. Please share your valuable suggestions with afsal@parseltongue.co.in.
Happy newyear everyone