Recently, I've found myself having to generate dynamic report pages in PDF format using PHP. Working with PDF files has been a pain in the ass at best, so when the time came to wrap PDF reports into the new version of my company's software, I decided to build my own PDF class to encapsulate another library. The basic library choices were Zend_PDF, since we are already using Zend Framework, or FPDF, was was what was chosen in the current version. I ended up choosing Zend_PDF since we got it for free without further includes, but honestly the underlying library doesn't matter so much so long as it provides a means of drawing images and text.
So, the main problem faced when generating PDF reports is layout. That is to say, there is no layout management in PDF libraries. What you are given is a blank canvas to draw on. There are no niceties like tables or container classes that you get in HTML. Since I was laying out very tabular data, I really wanted some way to manage the layout in a generic way without having to code for each specific report.
To achieve this effect, I build essentially four classes: PDF, Table, Row, Column. They existed in the following relationships: PDF consists of tables. Tables consist of rows. Rows consist of columns. Columns have text inside them. By implementing iterators for tables and rows, it is very easy to loop over the structure to place the rows and columns. Tables store an array of rows, rows store an array of columns, and columns store their individual text and options. To make your classes iterable, have them implement IteratorAggregate and provide a function that returns
an ArrayIterator built from your inner array:
class Pdf_Row implements IteratorAggregate
{
...
/**
* Allows for iteration over individual columns
*
* @return Iterator
*/
public function getIterator()
{
return new ArrayIterator($this->_cols);
}
....
}
And you can then put your rows in a foreach loop to loop over your columns.
By having each element return $this, you get a nice fluent API. To populate a new table, all you need to do is this:
$table->addRow()
->addCol('Column 1')
->addCol('Column 2');
To add a second row, simply repeat that process. This allows PDF generation in a style already familiar to all HTML developers, the table. There is a ton of customization that could be done here, basically implementation of all the same options you get with an HTML table, but for my own purposes, all I really needed was the ability to make text wrap, which I will describe a little later. So, I now have a table structure I can loop over and do the layout. But I still just have a blank canvas, so I'm going to have to employ some math.
We need to know a few things to do the layout:
For each table, what is the highest column count for an individual row?
What size paper are we using? (Usually A4, but not always).
We are working under the assumption that we are building reports on paper, so we will want margins around the page. How big do we want our margins?
What is the size of the font we are using?
Once we have these, we can begin to layout our table. We will use the Zend_PDF library's drawText function to put our text on the page. One thing I found odd when working with Zend_PDF was they start their upper left corner at (0, $pageHeight), as opposed to (0, 0). What this means is, in order to traverse down the page, we have to subtract (tend towards 0) from the line pointer, instead of adding to it. Another thing to realize is that Zend_PDF uses points for its units, not pixels, so all units have to be converted into points.
So let's begin our layout. We will be using A4 paper size, Times_Roman font, size 10.
We start by creating a Zend_Pdf object, which is the base class we will be working with.
// Creat the PDF Object.
$pdf = new Zend_Pdf();
We want a 1 inch margin around the page, so we convert 1 inch into 72 points.
if(empty($this->_margin)) {
$this->setMargin(75);
}
Initialize the pages to zero, and create the first page.
// First page
$currentPage = 0;
$pdf->pages[$currentPage] = $pdf->newPage($this->_paperSize);
Now, we need to track the cursor, so we store it in $currentHeight. As mentioned above, the top left corner is not (0,0), but is actually (0, 842). The bottom right corner is at (595, 0).
// Move pointer to the top of the page.
$currentHeight = $this->_maxHeight;
Here I am adding a header image to the very top of the page. Think of it like putting your company's letterhead on. $this->_imageHeader contains the path to the image to load.
if(!empty($this->_headerImage)) {
$image = Zend_Pdf_Image::imageWithPath($this->_headerImage);
As mentioned earlier, all units are in points, and we retrieve the image dimensions in pixels, so we need to convert them. Since there are 72 points per inch, and roughly 96 pixels per inch, we get the ratio 72:96 points:pixels, which is 3:4, or 0.75. This gives us the conversion units to convert from pixels to points.
// Convert from pixels to points.
$height = $image->getPixelHeight() * 0.75;
$width = $image->getPixelWidth() * 0.75;
To place the image, I wanted it centered on the X axis, so I had to find two things: The Length of the image on the X axis, and the total width of the page on the X axis. Centering is achieved by finding the difference, and adding half of that distance onto the left-side of the image as a margin. So if the page is 100x100pts, and the image is 50x50pts wide, half the difference is 25pts, so the image would be centered at the top from (25,100) to (75,50).
// Parameters go in: Left, Bottom, Right, Top : X1, Y2, X2, Y1
// The offset is how far to shift the image right from 0 to achieve centering on the X axis.
$offset = ($this->_maxWidth - $width) / 2;
$x1 = $offset + 0; $y1 = $this->_maxHeight - $this->_margin;
$x2 = $offset + $width; $y2 = $y1 - $height;
// Draw the header.
$pdf->pages[$currentPage]->drawImage($image, $x1, $y2, $x2, $y1);
We want to move the current line pointer down below the image we just drew. We also want a small buffer between it, so we double the line high movement to give us a padding.
$currentHeight = $y2 - ($this->_fontSize*2);
} else {
// If no header, set the first line below the margin.
$currentHeight = $this->_maxHeight - $this->_margin;
}
Now we come to the layout of the tables themselves. To calculate the width of each individual cell in the table, we take the highest number of max columns and divide the writable area (width - margin space) by it. This gives us the width of each column. ie. 595 - 72 (left margin) - 72 (right margin) = 451 writable area. If our largest row has 6 columns, then each column is 75.16 points wide. Therefore, we begin at the start of the margin, and simply move 75.16 points to the right for each subsequent column.
foreach($this->_model as $table) {
// Find the highest column count for all rows.
$maxCols = $table->getMaxCols();
foreach($table as $row) {
$x = $this->_margin;
NumTempLines comes into play when you have to wrap the text in a cell, which I will explain below.
$numTempLines = 0;
foreach($row as $col) {
Set the font we are using to draw the text in this cell.
if($col->isBold() == true) {
$font = $this->_fontBold;
} else {
$font = $this->_font;
}
This offset is the width of our cell as explained above.
// How far to move it on the X axis for the next column.
$offset = ($this->_maxWidth - ($this->_margin*2)) / $maxCols;
Here we pass in the column text to see if it needs to be wrapped.
// Wrap the text if necessary
$text = $this->_wrapText($col->getText(), $offset, $font, $this->_fontSize);
// Set the font to be used.
$pdf->pages[$currentPage]->setFont($font, $this->_fontSize);
Store the current height so we don't lose it.
$tempHeight = $currentHeight;
When we wrap the text, we have to allow room below the current line to accommodate the new lines created in the wrapping process. To do that, we need to insert those rows under us. We store the number of new rows we created in $numTempLines so that, once we are done drawing this row, we can move the line pointer by that many rows down, instead of just one, so that we aren't drawing the next row on top of our wrapped text.
// If there is more than one line returned from the wrap...
if(count($text) > $numTempLines) {
$numTempLines = count($text);
}
// Draw the text.
foreach($text as $line) {
$pdf->pages[$currentPage]->drawText($line, $x, $tempHeight);
$tempHeight -= $this->_fontSize;
}
Move the x-axis cursor to the next cell. This is how we keep columns lined up.
// Move the x-axis cursor.
$x += $offset;
}
If we had to wrap columns, we move the line pointer by as many rows as we had to move. If not, we move it one row down.
// Did we have to wrap any columns? If so, move the next row that much.
if($numTempLines > 0) {
$currentHeight -= ($this->_fontSize * $numTempLines);
} else {
$currentHeight -= $this->_fontSize;
}
Because we won't always know how many lines we have to insert due to the fact that we are dealing with dynamic data, we need to make sure we are starting a new page when we hit the bottom page boundary. We want to perform this check every time we move the line pointer, or we risk drawing more text than a page can handle, and we cause problems. We want to start a new page every time our line counter gets down to the bottom margin. We increment the page counter, reset the line pointer, and continue to draw in the PDF.
// Wrap the page if necessary.
if($currentHeight <= $this->_margin) {
$currentPage++;
$pdf->pages[$currentPage] = $pdf->newPage($this->_paperSize);
$currentHeight = $this->_maxHeight - $this->_margin;
}
}
}
Finally, all our tables have been drawn, so we just have to write it out to a file, and we're done.
// Save it.
$pdf->save('test.pdf');
}
So that's how we take a blank canvas and use math to lay a table on top of it. Now, to build a new report, all I have to do is this:
$table->addRow()
->addCol('Name')
->addCol($someName)
->addCol('Date')
->addCol($someDate);
$table->addRow()
->addCol('Age')
->addCol($someAge)
->addCol('Phone')
->addCol($somePhone);
And I get a 2 row, 4 column table layout properly spaced and with text wrapping, if needed. Further, if I decide down the road to do with a different PDF library, all I have to change is the above build function to change the function calls to the new library. The interface stays the same. (We would want to also make sure to find out where the new library defines the top-left corner of the page, whether (0,0) or (0,height).)
Now, I promised to discuss how wrapping was achieved, so here it is:
/**
* Wraps the given text to the colWidth provided.
*
* @param string text - The text to wrap
* @param int colWidth - The width of a column
* @param object font - The font to use.
* @param int fontSize - The font size in use.
*
* @return array - An array of wrapped text, one line per row.
*/
private function _wrapText($text, $colWidth, $font, $fontSize)
{
$characters = array();
Obviously, if the string is empty, we are done here.
if(strlen($text) == 0) {
return array();
}
What we want here is the ascii value of each character in the string, pushed into an array.
// Collect information on each character.
$characters = array_map('ord', (array) $text);
Now, to find out how wide our characters are, we need information about the font being used. This is stored in the Zend_Font_* object being used as the font, so we can query it. Glyphs are the internal representations of the characters within the object. So we need the glyph numbers for each character. Once we have those, we can find the widths of each one. This returns an array of the widths of every character in the array. Finally, we need to know how to use those widths, since the units are not in points, so we have to do some conversions. The only number they give us is the units per em, so we have to make due with that. A quick function call gives us that number.
// Find out the units being used for the current font.
$glyphs = $font->glyphNumbersForCharacters($characters);
$widths = $font->widthsForGlyphs($glyphs);
$units = $font->getUnitsPerEm();
Armed with those numbers, we can do some math. Yay! A quick array_sum gives us the total width of the string in glyph-units (for lack of a better term), which we can convert into Em(is Ems the plural of that? Or is there even a plural?) by division, which gives us the width in Em. We then convert into Points by multiplying Em by the point size of the font, and round to the nearest integer. We now have the length of the string in points.
// Calculate the length of the string.
$length = intval(((array_sum($widths) / $units) * $fontSize) + 0.5);
Having the length of the whole thing is nice, but we also want the average length of a single character. This way, l and W don't throw us of too badly, because both are given the same allotment of space.
// Find out the average length of an individual character.
$avg = intval(($length / strlen($text)) + 0.5);
Now we need to decide how many characters can fit on a single line in the sell. We take the total width of the cell and divide by the average character width, and thus we know how many characters per line.
// How many characters to wrap at, given the size of the cell.
$numToWrap = $colWidth / $avg;
PHP has a built in function wordwrap, which will give you a wrapped string back if you provide an initial string and the number of characters to wrap at. Since that's what we just calculated, we are now armed to wrap the text. We want it in the form of an array though, hence the explode call.
$newText = explode('
', wordwrap($text, $numToWrap, '
'));
Finally, we return our newly wrapped text. Congratulations, you've just implemented text wrapping.
return $newText;
}
A note on changing libraries when doing text wrapping: We are using the Zend_Font_* functions glyphNumbersForCharacters and widthsForGlyphs. If the new library doesn't provide similar information, the best you can do is guess at the average width. It could be tested empirically, if needed.
So, I now have a MUCH easier to work with wrapper for the Zend_Pdf library, which can be generically applied to many different situations. There is still a lot I could add to it, add more options to each cell, etc, which I might add as I come up with a need. It's kinda a pain working with a blank canvas at first, until you realize that it actually gives you incredible flexibility.
Doing a project like this definitely makes you appreciate what the people who wrote your HTML renderer went through.
Bravo, nice work.