I’m pretty proud of this one since is probably the first piece of code that I’m sharing.
I recently needed to implement stemming for a search angine in multiple languages. Stemming means reducing the derivated words in a language’s dictionary to the same form (ie. install, installing, installed would all become “instal”). It’s not something very strict; you just apply a set of rules so you match words as better as possible, withouth mixing different meanings on the same “stem”.
I started looking for implementations for each language until I reached German: none found (probably I should’ve searched in german).
I knew about the Porter stemming algorithm and so, I used its algorithm for the German language to create this class.
There is also a PHPUnit test that passes but that doesn’t mean the stemming is perfect, is just accurate “enough” – it can’t be perfect!