BeautifulSoup Web scraping Using Python

 

Web Scraping With Python

Learn Web scraping With Python. Practical Introduction


What does it mean to scrape the web? 

Web scraping is the method of extracting information and data from a website using bots. Web scraping, unlike screen scraping, which copies only the pixels displayed onscreen, removes the underlying HTML code and, with it, the data contained in a database. After that, the scraper will duplicate the entire website's content elsewhere.

In this article on Web Scraping with Python, you'll learn the basics of web scraping and see a demonstration of how to extract data from a website.

What is the use of Web Scraping

Web scraping is a technique for extracting vast amounts of data from websites. Web scraping is not illegal in and of itself, but it should be done ethically. Web scraping, when done correctly, will assist us in making the most of the internet.

What is the best way to scrape data from a website?

When you run the web scraping code, it sends a request to the URL you specified. The server sends the data in response to your request, allowing you to read the HTML or XML page. The code then parses the HTML or XML page, locating and extracting the data.

You must follow these simple steps to extract data using web scraping with Python:

  1. Locate the URL you want to scrape.
  2. Examining the Page
  3. Locate the information you want to extract and write the code.
  4. Execute the code to get the results.
  5. Save the data in the appropriate format.

Let's Move on to it

Python Libraries we use for web scraping
  • BeautifulSoup
  • Requests
Let's take a Look How to install these Libraries on our computer. We can install BeautifulSoup Libraries by typing the command

pip install beautifulsoup4

After installing that we can install requests by typing the following command

pip install requests

After installing the Libraries type the following commands in the python console

from bs4 import BeautifulSoup
import requests

After this, we can look at How can we extract an "URL". For this type

url = "yourtargeturl"

After this type

response = requests.get(url)

response

After this, we get an output like this 

<Response[200]>

The meaning of this response is the given URL is correct and We are allowed to access this page

After this next, we want to extract the source code of the webpage. We can extract the source code of the webpage by following the given command

source = response.text

Source

We can print data by typing "source". we store the source code of the webpage in a variable called "source".

To make it easy to navigate the data structure of the webpage. We have to pass the source code to BeautifulSoup. We can do it by typing the command 

soup = BeautifulSoup(data,'html.parser')

tags = soup.find_all('a')

It will extract the links from ahref

for tag in tags:
       print(tag.get('href'))

Let's Look on to web scarping on Titles

titles = soup.find_all("a",{"class":"result-title"})

for title in titles:
       print(title.text)

it will find all 'a' tag whose class is 'result-title'

Post a Comment

If you have any doubts. Let me know

Previous Post Next Post