The USA's healthcare.gov site and LAMP

The USA's health care exchange site, healthcare.gov, has had well-publicized initial woes.

The New York Times has said one of the problems was the government's choice of DBMS, namely MarkLogic. A MarkLogic employee has said that "If the exact same processes and analysis were applied to a LAMP stack or an Oracle Exa-stack, the results would have likely been the same."

I don't know why he picked Exastack for comparison, but I too have wondered whether things would have been different if the American government had chosen a LAMP component (MySQL or MariaDB) as a DBMS, instead of MarkLogic.

What is MarkLogic?

The company is a software firm founded in 2001 based in San Carlos California. It has 250 employees. The Gartner Magic Quadrant classes it as a "niche player" in the Operational DBMS Category.

The product is a closed-source XML DBMS. The minimum price for a perpetual enterprise license is $32,000 but presumably one would also pay for support, just as one does with MySQL or MariaDB.

There are 250 customers. According to the Wall Street Journal "most of its sales come from dislodging Oracle Corp."

One of the customers, since 2012 or before, is CMS (the Centers for Medicare and Medicaid), which is a branch of the United States Department of Health and Human Services. CMS is the agency that built the healthcare.gov online portal.

Is MarkLogic responsible for the woes?

Probably MarkLogic is not the bottleneck.

It's not even the only DBMS that the application queries. There is certainly some contact with other repositories during a get-acquainted process, including Oracle Enterprise Identity management -- so one could just as easily blame Oracle.

There are multiple other vendors. USA Today mentions Equifax, Serco, Optum/QSSI, and the main contractor
CGI Federal.

A particular focus for critics has been a web-hosting provider, Verizon Terremark. They have been blamed for some of the difficulties and will eventually be replaced by an HP solution. HP also has a fairly new contract for handling the replication.

Doubtless all the parties would like to blame the other parties, but "the Obama administration has requested that all government officials and contractors involved keep their work confidential".

It's clear, though, that the site was launched with insufficient hardware. Originally it was sharing machines with other government services. That's changed. Now it has dedicated machines.

But the site cost $630 million so one has to suppose they had money to buy hardware in the first place. That suggests that something must have gone awry with the planning, and so it's credible what a Forbes article is saying, that the government broke every rule of project management.

So we can't be sure because of the government confidentiality requirement, but it seems unlikely that MarkLogic will get the blame when the dust settles.

Is MarkLogic actually fast?

One way to show that MarkLogic isn't responsible for slowness, would be to look for independent confirmations of its fastness. The problem with that is MarkLogic's evaluator-license agreement, from which I quote:

...
MarkLogic grants to You a limited, non-transferable, non-exclusive, internal use license in the United States of America
...
[You must not] disclose, without MarkLogic's prior written consent, performance or capacity statistics or the results of any benchmark test performed on Software
...
[You must not] use the Product for production activity,
...
You acknowledge that the Software may electronically transmit to MarkLogic summary data relating to use of the Software

-- http://developer.marklogic.com/products

These conditions aren't unheard of in the EULA world, but they do have the effect that I can't look at the product at all (I'm not in the United States), and others can look at the product but can't say what they find wrong with it.

So it doesn't really matter that Facebook got 13 million transactions/second in 2011, or that the HandlerSocket extension for MySQL got 750,000 transactions/second with a lot less hardware. Possibly MarkLogic could do better. And I think we can dismiss the newspaper account that MarkLogic "continued to perform below expectations, according to one person who works in the command center." Anonymous accounts don't count.

So we can't be sure because of the MarkLogic confidentiality requirements, but it seems possible that MarkLogic could outperform its SQL competitors.

Is MarkLogic responsible for absence of High Availability?

High Availability shouldn't be an issue.

At first glance the reported uptime of the site -- 43% initially, 90% now -- looks bad. After all, Yves Trudeau surveyed MySQL High Availability solutions years ago and found even the laggards were doing 98%. Later the OpenQuery folks reported that some customers find "five nines" (99.999%) is too fussily precise so let's just round it to a hundred.

At second glance, though, the reported uptime of the site is okay.

First: The product only has to work in 36 American states and Hawaii is not one of them. That's only five time zones, then. So it can go down a few hours per night for scheduled maintenance. And uptime "exclusive of scheduled maintenance" is actually 95%.

Second: It's okay to have debugging code and extra monitoring going on during the first few months. I'm not saying that's what's happening -- indeed the fact that they didn't do a 500-simulated-sites test until late September suggests they aren't worry warts -- but it is what others would have done, and therefore others would also be below 99% at this stage of the game.

So, without saying that 90 is the new 99, I think we can admit that it wouldn't really be fair to make a big deal about some LAMP installation that has higher availability than healthcare.gov.

Is it hard to use?

MarkLogic is an XML DBMS. So its principal query language is XQuery, although there's a section in the manual about how you could use SQL in a limited way.

Well, of course, to me and to most readers of this blog, XQuery is murky gibberish and SQL is kindergartenly obvious. But we have to suppose that there are XML experts who would find the opposite.

What, then, can we make out of the New York Times's principal finding about the DBMS? It says:

"Another sore point was the Medicare agency’s decision to use database software, from a company called MarkLogic, that managed the data differently from systems by companies like IBM, Microsoft and Oracle. CGI officials argued that it would slow work because it was too unfamiliar. Government officials disagreed, and its configuration remains a serious problem."

-- New York Times November 23 2013

Well, of course, to me and to most readers of this blog, the CGI officials were right because it really is unfamiliar -- they obviously had people with experience in IBM DB2, Microsoft SQL Server, or Oracle (either Oracle 12c or Oracle MySQL). But we have to suppose that there are XML experts who would find the opposite.

And, though I think it's a bit extreme, we have to allow that it's possible the problems were due to sabotage by Oracle DBAs.

Yet again, it's impossible to prove that MarkLogic is at fault, because we're all starting off with biases.

Did the problem have something to do with IDs?

I suspect there was an issue with IDs (identifications).

It starts off with this observation of a MarkLogic feature: "Instead of storing strings as sequences of characters, each string gets stored as a sequence of numeric token IDs. The original string can be reconstructed using the dictionary as a lookup table.”

It ends with this observation from an email written on September 27 2013 by a healthcare.gov worker: "The generation of identifiers within MarkLogic was inefficient. This was fixed and verified as part of the 500 user test."

Of course that's nice to see it was fixed, but isn't it disturbing that a major structural piece was inefficient as late as September?

Hard to say. Too little detail. So the search for a smoking gun has so far led nowhere.

Is it less reliable?

Various stories -- though none from the principals -- suggest that MarkLogic was chosen because of its flexibility. Uh-oh.

The reported quality problems are "one in 10 enrollments through HealthCare.gov aren't accurately being transmitted" and "duplicate files, lack of a file or a file with mistaken data, such as a child being listed as a spouse."

I don't see how the spousal problem could have been technical, but the duplications and the gone-missings point to: uh-oh, lack of strong rules about what can go in. And of course strong rules are something that the "relational" fuddy-duddies have worried about for decades. If the selling point of MarkLogic is in fact leading to a situation which is less than acceptable, then we have found a flaw at last. In fact it would suggest that the main complaints so far have been trivia.

This is the only matter that I think looks significant at this stage.

How's that hopey-changey stuff working out for your Database?

The expectation of an Obama aide was: "a consumer experience unmatched by anything in government, but also in the private sector."

The result is: so far not a failure, and nothing that shows that MarkLogic will be primarily responsible if it is a failure.

However: most of the defence is along the lines of "we can't be sure". That cuts both ways -- nobody can say it's "likely" that LAMP would have been just as bad.

, December 8, 2013. Category: MySQL / MariaDB, NoSQL.

About pgulutzan

Co-author of four computer books. Software Architect at MySQL/Sun/Oracle from 2003-2011, and at HP for a little while after that. Currently with Ocelot Computer Services Inc. in Edmonton Canada.