Read Table from any webpage with 3 lines of code

Posted by Afsal on 30-Dec-2022

Hi Pythonistas!

Today we will learn how to scrap HTML tables from any page with 3 lines of code. We have already published a post on web scraping please click here to visit. Today we are using pandas for this. We are going to print the first table from this link https://www.stackscale.com/blog/most-popular-programming-languages/. Let’s dive into the code.

Code

import pandas
dfs = pandas.read_html("https://www.stackscale.com/blog/most-popular-programming-languages/")
print(dfs[0]) # printing first table only, But we have all the tables in this array

Output

        0                        1                                   2

0   Position  PYPL ranking September 2022  Stack Overflow’s Developer Survey 2022

1     #1                   Python                          JavaScript

2     #2                     Java                            HTML/CSS

3     #3               JavaScript                                 SQL

4     #4                       C#                              Python

5     #5                    C/C++                          TypeScript

6     #6                      PHP                                Java

7     #7                        R                          Bash/Shell

8     #8               TypeScript                                  C#

9     #9                       Go                                 C++

10   #10                    Swift                                 PHP

We can see that data from the website and what we are printing is same. Now we can do our stuff with this data.

Hope this going to help you a lot. Please share your valuable suggestions with afsal@parseltongue.co.in.

Happy newyear everyone