PHP Login Script HTML Parser Login PHP Leitor HTML

Tested Python PHP Login Script & HTML Parser

This week’s sample code is a simple Python PHP login script and HTML parser (all in one) for you to expand upon it’s initial purpose into a fully functional program whenever needed!

 

 

Attachment Content

The zip file you can download at the end of this post contains one file:

  1. stackoverflow-login-and-parse.py: A script to log into Stackoverflow and print the user’s avatar image address.

Before You Start Using The Script

Prior to its first run, you should add your user’s credentials to the code. You can check below where to do this:

If you don’t have an account yet, you can create one in the sign up page and also set your avatar via Gravatar.

General Overview

The goal is to demonstrate how you can create a PHP login script that doubles as an HTML parser.

To achieve this in the simplest manner possible, Stackoverflow’s website is used as the target to a POST request containing all necessary data in it.

The second part searches for the user’s image as a way to show the user has been authenticated. In this example, the code yield one result which is displayed in the console running the sample.

PHP Login Script

In order to interact with an URL, the sample uses Python’s request module. You can find its documentation here.

What’s nice about this module is that once you start a session, you can request any page you want sending either a GET or a POST method, each one with all necessary parameters as you would do in any browser.

Simple as it may seem, it can be quite tricky to find out what data to actually send in the POST method and to which URL…

To be sure you are issuing the request to the correct address, the easiest way is to use some form of network analysis so you can decode the POST header and copy its information to your program. A step-by-step example for Google Chrome follows.

Getting the POST data

Using Google Chrome, you can enable the network analysis tool tool pressing F12. You’ll need to select the Network tab (number 1 in the next image), start recording (either clicking the circle or pressing CTRL-E — number 2) and enabling Preserve Log just to be sure you don’t miss anything in case the page reloads or gets redirected (number 3).

 
Tested Python PHP Login Script & HTML Parser
 

Now that all communication being exchanged will be captured, fill in the user’s credentials and hit the Log In button. As soon as you do this, everything Chrome does gets logged. As a result, it’s just a matter of finding the POST request with the credentials you just entered. In our case, it’s inside Login (number 1 in the next image). Be sure to go to the Headers tab (number 2) and right away you’ll see the URL you’ll be using in your program (number 3).

 
Tested Python PHP Login Script & HTML Parser
 

Finally, scroll down to o see all data being sent. The POST parameters work as a key-value pair and for this Python code you’ll just need to fill all relevant information inside a dictionary. In this example, you’ll only be needing email and password keys (labeled Data in the next image), but keep in mind every website can have its own unique implementation.

 
Tested Python PHP Login Script & HTML Parser
 

This is all the information you’ll need, but some websites may require more GET/POST request to log in. Once you get the handle of the network analysis tool, it’s just building your code to follow the same steps your browser does!

HTML Parser

Now that your program has logged in, you’ll have access to all pages you need. To confirm, the script is tasked to find your user’s image in order to show that it worked.

Unlike my Perl sample that used regular expressions to read a file, the script’s HTML parser uses the beautifulsoup4 module to simplify the entire HTML page analysis. You can find its documentation here

This module is really amazing! It lets you search an entire HTML page for specific tags while also returning all keyword data it has.

In the image search, the code iterates through all img tags using a for loop. Each iteration searches for the word avatar inside the tag’s class keyword and prints its src keyword if found.

It’s that easy!

Final Words

The attached Python PHP login script and HTML parser may be used and modified at your will, except for commercial use.

Don’t forget to leave any questions in the comments, in case you need some help, and good luck!

 

 

Download Attachments

Leave a Reply