A blast from my past, only it never went away

Recently I had occasion to take a quick look at MUMPS. I remembered the name from my undergraduate days, but not what it was about – some database thing? Yeah. It’s a database with integrated language,
developed in the 1960s.

MUMPS-the-language seems archaic. MUMPS-the-database is a NoSQL database from before RDBMS was invented, capable of handling big data since before big data was a thing. Systems based on and evolved from it implement enormous health care records and financial systems, and have been doing it for a long time. And it looks like one can now use the database without having to (directly) use the language.

And because testing is what I do, that led me to think a bit about testing that kind of system. It would be a different domain from the carrier-grade telecoms systems I’ve been working with, but there would be some commonalities, particularly the importance of stress testing and negative scenarios, and I’m sure there there would be elusive intermittent bugs to hunt. I’m used to the idea that “this causes a crash once in umpteen gazillion times? that means it happens 3 times a week, so find it!”. I could get into that.

SQGNE talk on Nov 8

I go to the Software Quality Group of New England talks sometimes. I was originally a little disappointed in this one Avoid Embarrassing Deployments To Guarantee Continuous Availability, I was expecting to hear advice about preventing regressions from getting deployed, and what I got was a description of the tools one company uses to try to do that.

Then I thought more about it, and decided that the speaker was taking it for granted that the advice was “have a good set of automated regression
tests, run them early and often”, and “you need to know how good your tests are, and what is and is not currently broken, and what the trends are”. If your response to that is “I wish I had that information”, you might be daunted by the prospect of getting it. He was showing us that an organization can get all that information, it isn’t that hard, you need tools like this and this and this, and you put them together. Here’s a working example.

Sometimes that’s the kick someone needs in order to start solving a problem for their own environment.

Understand what you are and are not simulating

If a customer has a history of upgrade problems, we might decide to collect some configurations from live sites, load them up in our lab, and practice the upgrade before doing the real one – that should help, right?

It does help some. There are 2 parts to an upgrade success/failure result: there’s “does the upgrade process run ok, nothing aborts or makes a process crash or a system reboot when it isn’t supposed to”, and then there’s “afterwards, do all the call/data flows that used to work, still work”.

Our lab is really not set up to be able do a complete simulation of the customer environment, able to use the customer IP interfaces unchanged and with test equipment that can run calls using the IPs of actual peer nodes in that network, much less produce call flows (message sequences) that look like the ones presented to the live system. We load the customer database and then add some additional configuration so the test tools can run test calls using the local lab network addresses. This does pretty well for finding problems with the actual upgrade process – stuff like “this system has some corrupted configuration data that needs to be fixed before we try to upgrade’, and “this system is using feature X, that needs to be disabled before the upgrade and re-enabled afterwards”.

But if the upgrade on the live system runs ok, but some calls fail afterwards, this approach would not be expected to find that during the practice run. The calls we had up during the practice upgrade do not look much like the calls that come into the live system. The test calls use just a few basic call flows and have a simple set of INVITE headers and are processed by the basic configuration settings used for a lab network. The live calls have more varied headers and message sequences, and those headers and call flows interact with the customer’s network-specific configuration settings – which can get somewhat, um, exotic.

And, this testbed looks attractive for investigating the failed-calls problem, because it already has the customer database loaded, but it may not really be good for that. Could we adapt the configuration and test tools for the problem investigation? Maybe yes, but maybe a different testbed, perhaps one already used for ongoing support for this customer, would be better.

Building a simulation of a customer’s live traffic is hard, you have to know what their call flows and headers look like. Some customers don’t themselves have a complete picture of that information. And then you have to program the test tools … I don’t know if building such a simulation would be worthwhile. For a customer for which we had a good understanding of their call flows (and they didn’t have huge variation) – maybe that would produce some insight into how to prevent those post-upgrade failures.

It would probably be fun to build.

From the blog-hiatus period: A Reminder

Code written by testers does not ship. That doesn’t mean one ought to ignore basic software engineering principles.

We’d lost a person from our group to a layoff. Here Meredith, run this person’s module for the current release final regression, the test environment is all set up. Fine, no problem. All test cases passed.

Now run that module as early regression on the next release, and get that module packaged and submitted so it can join the set of automated test modules that get run by the regression test team. First try: some test case failures, oops, feature interaction, I’ll have to do some maintenance as well as the packaging. OK, that happens.

I then made several distressing discoveries … my lost coworker had not been the first person responsible for this module, and I think that what happened was that she had inherited a mess from her inadequately trained and/or inadequately supervised predecessor, and she had made it work for the initial release of the feature but had not had time to fix it properly. A lot of variables were hardcoded instead of being read from the in-house automated testing framework, so the module was tied to a particular testbed. I started fixing that … and discovered another reason for the module being tied to this testbed: this module, which tested a media feature, had been implemented using a special version of sipp that had been built to include a specially hacked version of an unsupported sipp feature. Both test procedure and verification logic depended on use of this special sipp, which was running on the sipp server used for this testbed. The test case routines did not always implement the test cases as written, for example there were test cases that called for media in both directions in the test call, and the routine produced only one-way media. And a few of the test case routines were implemented as written, but the test case was actually wrong (and the wrongness was not due to the feature interaction mentioned above) – I suspect maybe the Dev reviewer(s) had read the test case titles and not always the actual content. And the module was not well written, there were major chunks of duplicated code that should have been abstracted.

So my little packaging task turned into a major refactoring project. Dealt with the feature interaction, made it able to run in any testbed, made it use standard sipp, eliminate code duplication, fixed incorrect test cases, made notes about additional changes needed for adding 2 way media if and when that was desired (my manager and I having decided that me doing that was not worth the additional time it would require).

Sigh.

I suppose I get the weird stuff because I can cope with the weird stuff.

I hope that first guy got a job where he got better training, or maybe found a new line of work.