Generating random alphanumeric profanity free codes using pthreads in PHP

A friend of mine recently forwarded me an offer he received for the generation of 50 million random codes: alphanumeric, 10 characters long, unique and not containing any profane words. Price tag: same as a brand new mid-range car. Lol. Hold my beer, I’ll do this. 🙂

Not too long after writing the first line, the SQLite table filled with codes in a breeze…. until I cranked up the profanity check from a couple of test badwords to a real world scenario list of around 2500 finest German swear words. It absolutely killed the performance, while the CPU utilization did not even hit the 30% mark. Being to lazy to rewrite everything in e. g. Java, I started to take a look at ways to bring multi-threading to PHP. This led me to pthreads, a project providing multi-threading based on Posix Threads. Motivation follows action, action follows laziness and voilà: the code generator is now able to utilize all available processing power. Combined with a few tweaks of the bad word dictionary, it dramatically reduced the time needed to finish the job. A test run on my old i7 4something took two and a half hours (using this English profanity list and requiring a minimum Shannon entropy of theoretically 2.2 bits per character).

The whole project and its output can be downloaded below. Make sure to install pthreads first. The script configuration is done in the Config.php. Also note that pthreads projects can be run via CLI only.

A couple of learnings made:

* Use a multi-threading language in the first place when thinking about solving highly repetitive tasks.
* Use random_int() instead of rand(). Using rand() will quickly lead you to duplicate codes as it does not generate cryptographically secure values.
* Create objects, that need to be passed into a pthreads worker, in the calling context and keep a reference. Objects created in a thread scope constructor will be destroyed to avoid memory issues.
* Combining multiple SQL INSERTs to one transaction will take way less time than inserting one by one.
* Having an idea about the statistical probability of hitting a duplicate code or unwanted word, helps balancing out the efforts taken to avoid them. Keep in mind that every constraint will make it easier to guess a code.

DOWNLOAD: CodeGenerator Project
DOWNLOAD: 50 mio codes (1.1 GB, zipped)

The easiest way to send basic HTTP POST or GET requests using PHP

The easiest way to send basic HTTP POST or GET requests is using PHP’s built in file_get_contents() function in conjunction with HTTP context options:

Further reading:

file_get_contents
HTTP context options

Stepscout: A tool to search for jobs and apartments at the same time

I think living near one’s workplace is a great benefit for a good work-live balance. If you are living in your current city for a while, you probably already know where the best places to live and work are. But if you are starting to search for one or the other, you might ask yourself which neighborhood is the best to reduce the daily commute to a minimum. To find answers to this question I build Stepscout. It leverages the Stepstone and ImmobilienScout24 API’s to find jobs & apartments in one go and marks them together on a Google Map. This visualization helps to find job clusters, apartment clusters or ideally job-apartment-clusters.

The trick in this project was to make use of Google’s Places API to search for latitude and longitude of every company in the Stepstone result response. Even though Stepstone’s response JSON contains fields for geographic coordinates, they are empty or filled with generic values for the most part.

If you are living in Germany, you can check out the tool here or get a first impression below.

Boilerplate for a basic PHP cURL POST or GET request with parameters on Apache

cURL is a library for transferring data using various protocols – in this case most importantly HTTP POST and GET. PHP installed on a Linux distribution or as part of XAMPP uses libcurl. If you haven’t enabled cURL yet, open your php.ini and remove the semicolon at the beginning of this line:

You will find the location of your php.ini in the output’s first line when running

on the command line or by using the XAMPP control panel on a Windows machine. Click the ‘Config’ button next to the Apache module and select ‘PHP (php.ini)’ from the context menu. Save the changes and restart Apache – either by pressing ‘Stop’ & ‘Start’ on the XAMPP control panel or by using the Linux command line:

If cURL for PHP isn’t installed, run

 prior to the step above.

You’ll find further information on how to use cURL here: http://php.net/manual/en/book.curl.php

This boilerplate wraps cURL in a simple function with four parameters: request type, url, parameters and headers. The first snippet contains comments for every step. The second snippet is exactly the same code but without any comments.

Commented boilerplate:

 

And the raw template without comments: