Deploy containers globally in a few clicks

This Engineering Education (EngEd) Program is supported by Section. Instantly deploy containers across multiple cloud providers all around the globe.

Try It For Free

How to Build an Amazon Price Tracker using Python, Selenium, and Twilio

January 5, 2022

Checking regularly if the price of a particular product has gone down can be a bit tiring. This tutorial will guide readers on how to build an automated price tracker in Python.

We will use Selenium and BeautifulSoup libraries to retrieve the prices of a particular searched product and then compare it with the price that you expect the item to fall below.

If any of the prices are in your expected range, Twilio will send an SMS to notify you that there are products in your price range. The SMS will also contain the number of products in your price range.

We will also need a job that will run our script daily. Heroku scheduler enhances this process. This tutorial will show you how to set up a scheduled job with the Heroku scheduler. You can find the code on GitHub.

Prerequisites

To follow along with this tutorial, you will need:

  • Python installation.
  • A free Twilio account.
  • A basic understanding of web scraping.

Introduction to Selenium and Beautiful Soup

Selenium is an excellent tool for browser automation, automation testing, web scraping, as well as interacting with web pages. For example, it allows a program to interact with a browser, and scrap a web page. We will use Selenium combined with Beautiful Soup for web scraping.

Beautiful Soup is a Python package for parsing HTML and XML documents. We will use it to scrape a list of shoes on Amazon. Selenium web driver uses a real web browser to access a website. This activity simulates an ordinary user browsing instead of a bot. This is beneficial because some websites restrict unauthenticated users from web scraping activities.

Searching for a product on Amazon

The first step is to install Selenium, Webdriver manager, and Beautiful Soup using the below command:

$ pip install selenium beautifulsoup4 webdriver-manager

Next, create a Python file for your code and paste the imports below.

from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium import webdriver

If you search for a product on Amazon, you will notice that the search term is always embedded into the site’s URL. We can use this information to create a function that will search for air force 1. This is how the search link for air force 1 will appear in the browser: https://www.amazon.com/s?k=air+force+1&ref=nb_sb_noss_2

Let’s write a function to generate a URL for the search:

def get_url(search_text):
"""Generate a url from search text"""

    url = f"https://www.amazon.com/s?k={search_text}&ref=nb_sb_noss_1"

    # add page query for pagination

    url += "&page{}"

    return url

Extracting price from search results

To extract the price, we need to right-click on the Amazon search results page, and then click on Inspect and navigate to the Elements tab. Look for an HTML tag that is unique to each of the items. This will ease the extraction process.

When you click on a tag in the Elements tab, you will see that it highlights portions of the web page where the tag is located. We have to look even deeper to find the tag and element that uniquely identifies the prices. Based on the available fields, it appears that the tag <data-component-type> with its value of s-search-result is a good option to identify the item.

If you go even further into the tag, you will see a <span> tag that displays the prices, as shown here. We will use this attribute in our code.

To extract the prices, add the following code to your project:

def extract_record(single_item):
"""Extract and return data from a single item in the search"""

# because some products don't have prices you have to

# use try-except block to catch AttributeError

    try:

# Get product prices from page HTML

        price_parent = single_item.find(span, a-price)

        price = price_parent.find(span, a-offscreen).text

    except AttributeError:

        return

    return price

Retrieving all prices in the search results

To fetch all the prices, we will dive into the HTML elements again and retrieve the tag <data-component-type> with its value of "s-search-result".

Next, paste the code below.

def main(search_term, max_price):
"""This function will accept the search term and the maximum price you are expecting the product to be"""

# startup the webdriver

    options=Options()
    
  options.headless = True #choose if we want the web browser to be open when doing the crawling 
  driver = webdriver.Chrome('/home/muhammed/Desktop/dev/blog-repo/twilioXseleniumXpython/chromedriver',options=options)
    prices_list=[] # this will hold the list of prices

    url = get_url(search_term) # takes the search term to get_url() function above.

    for page in range(1, 5):
    
    """For loop to get each item in the first 5 pages of the search"""

        driver.get(url.format(page))
        soup = BeautifulSoup(driver.page_source, "html.parser") #retrieve and parse HTML text.
        results = soup.find_all("div", {"data-component-type": "s-search-result"}) #get all the attributes of each item

        for item in results:
            record = extract_record(item) #takes each item to extract_record() function above to get the prices
            if item:
                prices_list.append(item)

The above code will go through the first 5 pages of the search results and fetch the prices.

Fetching prices that meet the specified budget

To retrieve prices that are less than or equal to the budget, you have to use a loop statement. The above code will display prices. However, those figures have characters such as $ and , which makes it impossible to do a comparison.

To eliminate these symbols, add the following code in the main() function:

new_prices= [s.replace("$", "").replace(",","") for s in prices_list] # goes through the `price_list` and eliminate the symbols
new=[] #this will contain the new list of prices ready for comparison.

prices_float = [float(i) for i in new_prices]  # converts the value from a string to float so that the comparison can be done

for i in prices_float:

“””For loop to handle the comparison”””

    if i <= max_price:

        new.append(i)

Adding Twilio Programmable SMS

Access your Twilio credentials

Your Account SID and Auth Token will enable you to connect to the Twilio API. These credentials can be found on Twilio’s console page.

Note: Your account SID and Auth Token must always be hidden!

Handling the alert

In this section, we will need the Twilio package for Python which allows one to use the Twilio Programmable SMS API to send and receive SMS messages.

Run the command below to install the Twilio package locally:

$ pip install twilio

Next, import the Twilio client using the following code:

from twilio.rest import Client

Next, navigate to the Twilio console page and get your Twilio phone number.

Then, paste the below code in to the main() function:

client.messages.create(

# To send SMS to mobile phone

to="+2348888888", # Your phone number, don’t forget to add the country code. This should be hidden

from_="+14688888", # Your Twilio phone number. This should be hidden

body=f"There are {len(new)} air force 1s within budget, ${max_price}" # message that will be sent to your mobile phone

)

Testing

To test the application, just input your price range, the item you want to search, then run the program and it will do the rest. Add the values that will be passed in the main("air force 1", 300) function at the end of your code and then run it.

An SMS will be sent to your phone.

Note that I built this project using a free Twilio phone number that’s why it has Sent from Twilio trial account in the text.

Adding a scheduled job

At the moment, we have to constantly go to the terminal to run the script which can be a bit annoying. Well, in this section I will show you how to create a scheduled job with Heroku so that Heroku can run your script automatically.

To do this, follow the steps below:

Create an account with Heroku if you don’t have one already. Then install the Heroku-CLI on your local machine.

Next, create a requirements.txt file. This is where we will put a list of all the dependencies required to run the project. In the requirements.txt file, paste the statement below.

selenium==4.1.0
beautifulsoup4==4.10.0
webdriver-manager==3.5.2
twilio==7.3.2
bs4==0.0.1

Then, create another file runtime.txt where you will specify your Python’s version, as shown below:

python-3.8.10

Navigate to Heroku Dashboard and create an app for the project by clicking the New button on the dashboard page. Remember to give it a name. Next, log in to Heroku-CLI with $heroku login then initialize a git repository for your project.

Link the project to a remote repository on Heroku. You can do that using the following commands:

$ git init
$ heroku git:remote -a <name-of-your-heroku-app>

Navigate to the Heroku dashboard and click on your app then on the “Add buildpack” button. Next, press the Python button to add it to the build pack. You will need to add buildpacks for the Chromedriver and Headless Google Chrome.

Click on the “Add buildpack” button then paste the following links and save changes:

The buildpacks are:

build-pack.png

We can now publish the application by committing code to the repository and deploying it to Heroku using Git.

$ git add .
$ git commit -m "initial commit"
$ git push heroku master

If it runs successfully, you will see the following output in your terminal:

heroku-push.png

On your terminal, run $ heroku run bash. This command allows us to work with the project that we just deployed on Heroku. To check if you are on the right track, you can proceed to run the script.

To add the scheduler, on the Heroku Dashboard, click on the Resources tab then press on the Find more add-ons button:

scheduler

Then press on Install Heroku Scheduler and follow the prompts to complete the process.

install-scheduler

Once you have added the Scheduler, click on it then on the Create job button. You will be directed to a form where you will fill in the time you want your code to execute and also the actual command that should be executed.

heroku-job

Conclusion

In this tutorial, you have learned how to scrape prices of items from Amazon, as well as how to use Twilio Programmable SMS. We also built software that alerts you in case certain prices drop. It counts the number of products that are within your price range and sends an alert to your phone daily.

Hopefully, you should be able to integrate the knowledge gained from this tutorial into your future projects.

Happy coding!


Peer Review Contributions by: Wanja Mike