php and MongoDB: A ‘find $exists’ Mess of My Own

File this one under “Easy to Overlook … Duh”

Using php with MongoDB to process CSV files, and I want to find all the documents that do not have a particular field. (Inbound, I’m naming the fields a particular way. I’m also using ruby inbound, don’t ask, at least not yet…).

So, I want all documents where a field called UKNOWN_0064 does not exist. So I write:

$m = new Mongo();
$orders_in = $m->$mongo_db->$mongo_coll->find(array("UKNOWN_0064"=>array('$exists'=>'false')));

But I get everything that does have UKNOWN_0064 and none that don’t!

Anyway, you guessed it … the problem is


Really needs to be


And so the correct code is:

$m = new Mongo();
$orders_in = $m->$mongo_db->$mongo_coll->find(array("UKNOWN_0064"=>array('$exists'=>false)));

And here I had been feeling so smart to remember to use single tic to pass $exists – and yet blinded by my own ego to miss not enclosing the boolean false.

On a side note, if anyone knows a nicer way to play with MongoDB than all this array business, let me know. It’s clunky to me.

The Core Problem With Magento

I stumbled across this bit while looking for the Mythical Man-Month, actually. This passage describes exactly what’s wrong with Magento. I suspect eBay will fix it – or buy yet a third commerce company:

“[D]evelopers are sometimes tempted to bypass the RDBMS, for example by storing everything in one big table with two columns labelled key and value. While this entity-attribute-value model allows the developer to break out from the rigid structure imposed by a relational database, it loses out on all the benefits, since all of the work that could be done efficiently by the RDBMS is forced onto the application instead. Queries become much more convoluted, the indexes and query optimizer can no longer work effectively, and data validity constraints are not enforced. Such designs rarely make their way into real world production systems, however, because performance tends to be little better than abysmal, due to all the extra joins required.” — (“Inner-Platform Effect“)

If that doesn’t describe the exact problem, I don’t know what does. Everything else can be fixed: documentation improved, extensions can become simpler to create, and with every release, bugs are shaken out.

I suspect the inner core of Magento will have to be migrated to a VM-based solution and much of the database normalized before it can be effectively used as/ as part of eBay’s X.commerce platform. Read about how Twitter was forced to abandon Ruby and go to Scala for many of its requirements here: Twitter on Scala. I see no reason not to go this route: the product is clearly destined to be a SaaS offering.

EAV modeling does provides a flexible way of giving meta-data shape and form to the wide variety of sale-able products in the world. It’s an attempt to be an all-describing and all-encompassing, flexible solution for everything, everywhere.

Alas, it’s heavy. In this particular case it’s in PHP (argue amongst yourselves if you must). And applying EAV to things like customers, SKUs and orders was – and is – a mistake – hence we have constructs like the “flat” tables that must be indexed, for example, just to make the thing perform reasonably well.

Why not write an abstraction layer that can build normalized database table structures and preserve the performance advantages? Seems doable to me. Perhaps the era of always running interpreted code – instead of intermediate compiling – has spoiled us to the point that we’d sacrifice performance rather than wait for a build (as of said tables) – or do we just not even have the notion because we’ve never compiled/ built before?Some things are immutable. The Internet detests that notion. However, performance and reliability often like things that stay the same, even if we have to wait a little for the compiler.

Convert csv to xls (excel) in php

Luckily for me, Ingatius Teo has done all of the heavy lifting, I just had to add a little code around it. Here’s the excel.php class:

When faced with legacy systems, it’s nearly inevitable you’ll come across csv files. In fact, when you’re moving data from anywhere to anywhere, you’re likely to run into csv files.

The challenge I faced here was providing a list of new incoming customers on a daily basis, so that their data could be reviewed in a simpler manner – a spreadsheet to check that fields such as customer classification and email address were being filled in appropriately.

The code shown here is really simple. You include the excel.php class; by including that file you register a stream wrapper for protocol xlsfile – that’s the last thing the excel.php does.

After that, it’s all you. In this working example below, I am running the php program from a command line. Argument 1 is the input file, a tab separated csv file, and argument 2 is the xls file to be written. The code expects the first row to be column headings.

First, we read the first row to set the column headers. Then we reach each line and form an associative array that will be an element of the Big Array ($xlsArray) that contains the data that will become the xls file.

Finally, we write the Big Array, and *poof* we’re done!

/* Get the excel.php class here: */
if( empty($inputFile) || empty($xlsFile) ) {
    die("Usage: ". basename($argv[0]) . " in.csv out.xls\n" );
$fh = fopen( $inputFile, "r" );
if( !is_resource($fh) ) {
    die("Error opening $inputFile\n" );
/* Assuming that first line is column headings */
if( ($columns = fgetcsv($fh, 1024, "\t")) == false ) {
    print( "Error, couldn't get header row\n" );
$numColumns = count($columns);
/* Now read each of the rows, and construct a
    big Array that holds the data to be Excel-ified: */
$xlsArray = array();
$xlsArray[] = $columns;
while( ($rows = fgetcsv($fh, 1024, "\t")) != FALSE ) {
    $rowArray = array();
    for( $i=0; $i<$numColumns;$i++ ) {
        $key = $columns[$i];
        $val = $rows[$i];
        $rowArray["$key"] = $val;
    $xlsArray[] = $rowArray;
/* Now let the excel class work its magic. excel.php
    has registered a stream wrapper for "xlsfile:/"
    and that's what triggers its 'magic': */
$xlsFile = "xlsfile://".$xlsFile;
$fOut = fopen( $xlsFile, "wb" );
if( !is_resource($fOut) ) {
    die( "Error opening $xlsFile\n" );
fwrite($fOut, serialize($xlsArray));

Adventures in SOAP and WSDL

Got my first real SOAP challenge: I’ve got to integrate Cybersource’s payment system into our newish RHEL5 servers. The challenge is that our OS is now 64-bit, and we were using their Simple Order 32-bit API in the form of php and perl extensions.

Bzzt. No go. Wisely, Cybersource has decided to go the Web Services direction, so the implementation isn’t all that tricky. Although Cybersource says the SOAP kit is supported by php 5.2.1 and above, 5.1.6 seems to be working just fine for me – at least to their test servers.

Cybersource’s docs say that you should do a php -i | grep soap to see if your php is enabled. Don’t do that. Instead yum install php-soap. Same for php-xml. Install those two and you should be fine.

Another thing I found challenging is figuring out what functions and types are available in the Cybersource interface. There is no documentation – only WSDL for that. So I wrote a quick-n-dirty ditty to dump the functions and types of their WSDL.

Of course 30 minutes later I found out I was going to have to cobble together a SOAP client for a Gift/ Loyalty Card service that we’re implementing.

The long and short of it is, I couldn’t find an easy way to gimme the WSDL details. It’s probably been invented somewhere else, but I needed the practice anyway, so here’s the MXWest gimme-gimme page that will dump WSDL for you. Might even be helpful if you’re writing your own WSDL as it should complain about errors.

Bang it here for Gimme-Gimme a WSDL Dump.

P.S. Yes, I stole Firefox’s background for this little fellow. But it had balloons, and my company is sponsoring the New Jersey Festival of Ballooning this weekend. Tweet me if you’re there @MXWest and stop by the booth!