Wordpress plugin downloader

Recently I wrote some code to compile me a list of all wordpress plugins listed on the plugin directory on wordpress.org. Afterwards another script fetches the plugin's zipfile and extracts it into a directory. I thought this might be useful to some of you, so I'll share the sourcecode.

Code

Disclaimer: This is really quick & dirty code. You should neither write code like this nor try to learn something from it ;)

The sourcecode can be cloned from GitHub.com. The requirements are minimal:

  • Python3
  • Requests module (pip install requests)

The repo contains the following two scripts:

wp-plugins.py

This one goes through the list of plugin tags and compiles a list of linked plugins. The result is a file called plugins.txt with the following information about a plugin:

  • Name
  • URL
  • Install count

All entries are separated by ||. For example: NAME||URL||INSTALLCOUNT

wp-download.py

This reads a plugin URL line by line from a file urls.txt and downloads and extracts the plugin to the folder downloaded/. To create urls.txt, you can use the following one-liner:

cat plugins.txt | sort | uniq | cut -d"|" -f 3 > urls.txt

or the cleaner approach:

grep -oP '(?<=\|\|).*(?=\|\|)' plugins.txt | sort | uniq > urls.txt

Another method

The advantage of the two scripts is, that they download the latest version of the plugin. However, there's probably a more easy/convienient way to download all wordpress plugins: Directly from the svn repository.

It might even be a better idea to compile the list of plugin urls from the index page instead of scraping all tags pages. But that's left as an excercise to the reader.

Numbers

The unique list of urls is around 30k lines long. I somewhere read that there are around 40000 plugins in total, so the tags page misses some tags or plugins. While downloading the plugins my estimation of the total space consumption is around 40-50 GB.

Data

I've included some *.example data files in the repository:

  • plugins.txt.example: Output of wp-plugins.py. It contains duplicates.
  • popular.txt.example: Output of a modified wp-plugins.py script with the most popular plugins.
  • urls.txt.example: List of unique URLs to wordpress plugins.

Analysing the plugins

Since you have the PHP sourcecode in the downloaded/ folder, you can do some static code analysis. I didn't dig into that topic too much, but I found grepbugs.com to be very useful. Using some SQL-injection greps on the whole folder, I quickly identified a bunch of vulnerabilities.

There's also a static code analyzer for PHP called RIPS, but I didn't test it.

Another interesting tool is Sgrep which seems to make grepping code easier, but a first compilation process failed with some errors and I'm not in the mood to debug this right now.

-=-