Top 10 Programming Languages to Learn in 2019

ProgrammingPython

Web Scrapping using Python – Scraping Unsplash Photos

WebScrapping_new

In this Tutorial, we will learn about scrapping websites using Python and Selenium module. This Script and Technique will help you to scrap nearly all Websites. Works for all pages in unsplash.com

In the following section we will write a python script to scrap the download links of first 10 photos from a given category in Unsplash and store it in a text file.

What is Web Scrapping?

Web Scrapping Procedure
Web Scrapping Procedure

Web scraping a web page involves fetching it and extracting from it. Fetching is the downloading of a page (which a browser does when you view the page). Therefore, web crawling is the main component of web scraping, to fetch pages for later processing. Once fetched, then extraction can take place. The content of a page may be parsed, searched, reformatted, its data copied into a spreadsheet, and so on. Web scrapers typically take something out of a page, to make use of it for another purpose somewhere else. An example would be to find and copy names and phone numbers, or companies and their URLs, to a list (contact scraping).


Prerequistes

  1. Please ensure that you have selenium installed. If not, run “pip install selenium” to install the latest version.
  2. Firefox Browser
  3. You should also have placed the geckodriver.exe in the folder where you are writing the python script. It is necessary to use this driver.
  4. Python 3.x.x

Go to the official repository to download geckdriver if you don’t have it yet. Follow this link https://github.com/mozilla/geckodriver/releases

The folder structure should look like this:

folder-structure-2

Checking the Configuration

Download the full configuration from my github account.

Copy and run the following code:

from selenium import webdriver

browser = webdriver.Firefox()
url = "https://unsplash.com/search/photos/mountains/"
browser.get(url)

If you face any error please comment below. I will be happy to help. 😁

If everything went well you will see a firefox tab opening up and the given url will open.

Basic Web Scrapping Script Output

Basic Web Scrapping Script Output

Planning Our Script

Before we start I would like you to go to the website and inspect the source code. You will find an interesting thing that all download links have the title = “Download photo”. We will use this info to separate the download link from other links. This will be our flow for developing the Script.

  1. Search for all ‘a’ tags.
  2. Filter the tags having title = “Download photo”.
  3. Save the links in a text file
  4. Voila!! We are done
link-web-scrapping-2

Writing Our Script

Download the full configuration from my github account.

Code

from selenium import webdriver


def view_webpage(link_file):
    try:
        elem1 = browser.find_elements_by_tag_name('a')
    except:
        print('some error occured')
    try:
        for elem in elem1:
            if elem.get_attribute('title') == 'Download photo':
                print(elem.get_attribute('href'), file=link_file)
    except:
        print("No data in Element")

browser = webdriver.Firefox()
search_term = "mountains/"
url = "https://unsplash.com/search/photos/" + search_term
browser.get(url)
complete = False
# we will open the file in append mode
link_file = open("links.txt", mode="a+")

while not complete:
    view_webpage(link_file)
    complete = True
    
# Closing the file to save in drive
link_file.close()

Output

Voila!! It worked. Here are the links you will get in link_file.txt.


Stay tuned for my upcoming blog post to get the Improved Version of the Script at pyblog.in, New Script will let download as many photos you want and will support multi-threading.

If you get struck anywhere feel free to comment down below. I will be happy to help. 😁

This blog post is for educational purpose only.

Related posts
ProgrammingPythonPython Basic Tutorial

Mastering Print Formatting in Python: A Comprehensive Guide

ProgrammingPython

Global Variables in Python: Understanding Usage and Best Practices

ProgrammingPythonPython Basic Tutorial

Secure Your Documents: Encrypting PDF Files Using Python

ProgrammingPython

Creating and Modifying PDF Files in Python: A Comprehensive Guide with Code Examples

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.