Hyphenate texts with PHP

Hyphenation is something not widely used on the internet.

But sometimes it is something you need to do in Browser-based applications.

Just think of a PDF-File created on the fly by an application and the text is hyphenated in very strange ways or not at all. All just because of an algorithm that might hyphenate english texts in one way or an other but certainly not german, french or other texts.

But it is possible with a bit of LaTeX (don’t worry, you need no kowledge of that whatsoever)

A long time ago Franklin Marc Liang has writen a thesis about Word Hyphenation by Computers and this algorithm is the one I have adapted to PHP.

I would not have stumbled over that without the code from Mathias Nater<mnater@mac.com> who adapted the mentioned algorithm for JavaScript.

So here you can get a Hyphenation-Algorithm for PHP based on the TeX-Hyphenation files.

Basicallly the hyphenation works as shown in the example.

include_once 'Org/Heigl/Hyphenator.php';
// Create an Instance for the locale of your choice. 
// Note that a hyphenation-file has to be present
// in the folder /Org/Heigl/Hyphenator/parsedFiles/ 
// for that locale!
$hyphenator = Org_Heigl_Hyphenator::getInstance ( 'de_DE' );   
// What character shall be used as Hyphenation-Character.   
// This defaults to ASCII 173.  
$hyphenator -&gt; setHyphen ( '-' )   
// How many Characters have to stay to the right of the 
// hyphenation character.              
            -&gt; setRightMin ( 2 )   
// What characters are trated in a special way.   
            -&gt; setSpecialChars ( 'äüöß' );  

$string = 'This is the String you want to be hyphenated';    

$hyphenatedString = $hyphenator -&gt; hyphenate ( $string );  

Alternatively you can simply use

$hyphenatedString = Org_Heigl_Hyphenator::parse ( $string );

2 thoughts on “Hyphenate texts with PHP

  1. I like this very much and would like to use it on my page. Unfortunately, I experience a problem with html tags in texts being destroyed sometimes by using the hyphenator. Is there a way to prevent this?

    1. Currently the Hyphenator expects a plain text to Hyphenate. It’s aim is to hyphenate texts for PDF-creation. So there is no way of preventing hyphenation of HTML-tags other than setting the tresholds so high that even ‘textarea’ will not be hyphenated. But that can only be a workaround.

      I could think of a way to provide a ‘markup-hyphenation’-method of some kind in a later version of the Hyphenator.

      But on the other hand you could have a look at http://code.google.com/p/hyphenator which can apply hyphenation to HTML-Pages (and is the base of Inspiration for this Hyphenator).

Comments are closed.