Recently I wrote some code to compile me a list of all wordpress plugins listed on the plugin directory on wordpress.org. Afterwards another script fetches the plugin's zipfile and extracts it into a directory. I thought this might be useful to some of you, so I'll share the sourcecode.
Disclaimer: This is really quick & dirty code. You should neither write code like this nor try to learn something from it ;)
The sourcecode can be cloned from GitHub.com. The requirements are minimal:
- Requests module (
pip install requests)
The repo contains the following two scripts:
This one goes through the list of plugin tags and compiles a list of linked plugins. The result is a file called
plugins.txt with the following information about a plugin:
- Install count
All entries are separated by
||. For example:
This reads a plugin URL line by line from a file
urls.txt and downloads and extracts the plugin to the folder
downloaded/. To create
urls.txt, you can use the following one-liner:
cat plugins.txt | sort | uniq | cut -d"|" -f 3 > urls.txt
or the cleaner approach:
grep -oP '(?<=\|\|).*(?=\|\|)' plugins.txt | sort | uniq > urls.txt
The advantage of the two scripts is, that they download the latest version of the plugin. However, there's probably a more easy/convienient way to download all wordpress plugins: Directly from the svn repository.
It might even be a better idea to compile the list of plugin urls from the index page instead of scraping all tags pages. But that's left as an excercise to the reader.
The unique list of urls is around 30k lines long. I somewhere read that there are around 40000 plugins in total, so the tags page misses some tags or plugins. While downloading the plugins my estimation of the total space consumption is around 40-50 GB.
I've included some
*.example data files in the repository:
plugins.txt.example: Output of
wp-plugins.py. It contains duplicates.
popular.txt.example: Output of a modified
wp-plugins.pyscript with the most popular plugins.
urls.txt.example: List of unique URLs to wordpress plugins.
Analysing the plugins
Since you have the PHP sourcecode in the
downloaded/ folder, you can do some static code analysis. I didn't dig into that topic too much, but I found grepbugs.com to be very useful. Using some SQL-injection greps on the whole folder, I quickly identified a bunch of vulnerabilities.
There's also a static code analyzer for PHP called RIPS, but I didn't test it.
Another interesting tool is Sgrep which seems to make grepping code easier, but a first compilation process failed with some errors and I'm not in the mood to debug this right now.