Data cubing is a fantastic topic. The systems which do real-time cubing are incredibly finicky. It's actually not a simple problem to solve to be honest.
Having written a couple of systems which do it in the past and now working with one at Space Command HQ, I'm quite pleased that the Big Brains at SCHQ have, basically, produced exactly what I did. An in-memory database of terrifying complexity, astonishingly bad error reporting and a tendency towards alienese gobble-de-gook when it comes to config and query languages.
So I've come to a conclusion.
Implement SQL in them.
Now.
Really, I know it looks like a pain, and it doesn't really need it. And that you really just need a nice simple scripting system and you can just bolt a little bit to it here and little bit to it there.
And before you know it, what was a simple little scripting language has turned into something that's the size of a medium battlecruiser, hovers in the air, glows with a alien purple light and explodes at the slightest provocation.
SQL is a faffy bastard, exactly because people spent a long time hammering it into a shape where it fits exactly into its distorted little universe. So, when you need a query language, just bite the bullet and implement some subset of SQL. You'll end up implementing something of that order of complexity anyway, you might as well set out down the right path rather than having to hammer all your own glowing purple components from scratch.
It's like most other things. Really just embed a lua interpreter or a python interpreter.
Because your "nice simple" config language will tend to just get bigger and bigger and bigger. Like those alien tendril things we're always clearing out of the bilges. Only instead of the slightly comical but ultimately horrifying staring orange eyeballs, config languages sprout things like macro expansions. If you're in a super smart environment, you also get cute little Lambda expressions with which to make the complexity explode.
You also get really weird variable expansion rules. One of the languages at SCHQ simply can't express the percent symbol because none of the several uses for percent have escapes back to a literal percent...
You might as well just start out with a turing complete language, into which you can plug query systems and so on. Rather than having a makefile which invokes python to create some config files full of lambdas which expand to macros which expand to settings for things... just use python to start with.
Part of the reason for this is the YAGNI approach. Everyone starts with flat text files and smoothly transitions to all the pain of maintaining a TC language without ever passing a point where someone says "No, stop! Just use an existing language!"
Obviously the point to do that then, is before you do ANYTHING more complicated than a genuinely flat text file. The minute you start considering something like shell variable expansions, it's time to stop and do it properly then.
Katie the Engineer
An innocent engineer trapped on a strange alien world.
Friday 3 August 2012
Thursday 31 March 2011
More Making Make Make Sense
One of the most annoying things in the observable universe is the tendency of make when given makefiles written by cadet level engineers to silently "not do" things.
The canonical way of provoking this behaviour is lists of things that you produce with one or more $(wildcard) which you then use later on -- a typical "gather-and-work" operation.
Problem; what happens if your wildcard operations don't find anything? Because (say) a previous stage named them something different to what was expected. If you're lucky, you'll get a message saying that "cp myoutputs/" is expecting a second argument.
If you're unlucky (like I've just been) this'll happen inside a makefile invoked inside a makefile which is inside a chroot which is making a YUM repo so that stuff can be installed in a filetree which can be packed into a virtual machine image.... and the actual outcome is that after thirty hours of building junk there's something missing from your fresh VM. It's worked all the way down the build of course, because the makefile has no constraint saying that there has to be something in that list.
Wouldn't it be lovely if the makefile did some sanity checking about these things?
Here's how we can do this in a nice handy functional-style way.
Moocows are, of course, named for the extinct species of Old Earth. Test cases are always best named for those extinct species; partially because there's so many of them.
When the "moocows" target is attempted, the second sanity check will halt with a big banner error message. Brilliantly, it tells you which variable isn't set, which is a damn sight more useful than an error telling you cp doesn't have enough args. And it's WAY better than something simply ignoring your missing files.
The canonical way of provoking this behaviour is lists of things that you produce with one or more $(wildcard) which you then use later on -- a typical "gather-and-work" operation.
Problem; what happens if your wildcard operations don't find anything? Because (say) a previous stage named them something different to what was expected. If you're lucky, you'll get a message saying that "cp myoutputs/" is expecting a second argument.
If you're unlucky (like I've just been) this'll happen inside a makefile invoked inside a makefile which is inside a chroot which is making a YUM repo so that stuff can be installed in a filetree which can be packed into a virtual machine image.... and the actual outcome is that after thirty hours of building junk there's something missing from your fresh VM. It's worked all the way down the build of course, because the makefile has no constraint saying that there has to be something in that list.
Wouldn't it be lovely if the makefile did some sanity checking about these things?
Here's how we can do this in a nice handy functional-style way.
sanity_not_empty_list = $(if $($1),@true, @echo "";echo "";echo "EMPTY LIST: $1"; echo ""; echo ""; false ) MOOCOWS := some stuff moocows: $(call sanity_not_empty_list,MOOCOWS) $(call sanity_not_empty_list,MOOCOWS2)
Moocows are, of course, named for the extinct species of Old Earth. Test cases are always best named for those extinct species; partially because there's so many of them.
When the "moocows" target is attempted, the second sanity check will halt with a big banner error message. Brilliantly, it tells you which variable isn't set, which is a damn sight more useful than an error telling you cp doesn't have enough args. And it's WAY better than something simply ignoring your missing files.
Friday 18 March 2011
Makefile Madness (probably part 1)
I've visited a number of alien planets now, and one of the things that always astonishes me is the way people will abuse Makefiles.
On some worlds where the happy optimistic colonists have accidentally built a fascist totalitarian regime, they'll have carefully churned out acres and acres of paperwork about the way that code has to be written. This often includes details down to the number of spaces one must place between tokens and how many blank lines must be in files and so on, and carries sanctions of death or worse for violation. These people still allow their makefiles to look like the demented native amoeba-analogues wrote them; the glorious code of the Spacefaring People's Republic will still be built by a ramshackle collection of recursive, copy-and-pasted makefiles full of the most eyewatering convolutions[1].
So, here's the first of a cut-out-and-keep guide on how to avoid your Makefiles being dodgy.
It's a simple rule; "Make has a looping construct. Don't use shell loops, use Make's"
I'm forever seeing makefiles who carefully construct a list of stuff -- often files -- and then in a target do something like this;
Now, the problem with this is that if anything goes wrong, it tells you at best that it failed while building foo-target. In general though, it will just succeed -- as in the example given. Why? Because for loops return true.
So when Make has completed executing foo-target, it may or may not have built it and there's no trivial way to tell.
Usually, what happens on alien worlds at the far edge of civilised space, is that some bright cadet figures that this might fail and hence adds checks to subsequent steps where they all look for the file they expect to exist and bail if it doesn't...
Honest to goodness, if you are in that place, it's time to stop digging.
I want to emphasise this isn't the ONLY thing wrong with this approach; it's not trivially possible (for instance) to run one single part of that "dosomestuffto" cycle. Even if you get it to fail, make will halt with an error report saying that the loop system as a whole failed. It won't tell you where.
Cadets, Junior Grades of the Starfleet Corps of Software Engineering, of course, will at this point add voluminous amounts of "Starting to do X!!", "Finished X!!!" type logging. Which makes the logs bigger, which means compiles take up more space.. and more time to wade through when doing failure analysis... and then people get into the whole "what can I prefix my log messages with to make them stand out best" competition that ultimately ends up with your compilation scrolling ASCII-art banners at you.
{It's amusing how often engineering cadet practices ultimately end up in the "try and be the loudest voice" trap.}
So what should people be doing?
Well, the simplest way of doing this is to realise you have the list of things to do, and make will quite happily iterate it for you.
That's nicely taken care of the iteration. Purists will argue that there's a problem here, because everytime foo-target is asked for, then the dostuffs will happen. That's true, but crucially that was true to start with -- solving that is actually a different problem.
If something fails, it will fail on a commandline with the variables substituted so you can even tell which dostuff operation failed. Your starship crew will thank you for that small bit of plumbing in the way starship crew always thank their engineers -- by finding another triviality to whine like hell about.
If you're really cunning about this, you can then start putting flagfiles into place. The obvious thing to do is store the logfile;
The bracketty junk at the end means (if I've typed it from memory right) that if the dostuff returns false, the second part will be evaluated which will dump the logfile and then fail, meaning the line fails. If the first part works lazy evaluation will skip the second bit and we just carry on.
Even better, you can test out a single one of the operations by saying "make myfile.dosomestuffto" and it will run it on its own.
The moral of this story is that if you have a stack of systems which include things like turing complete languages (make in this case) reasoning about systems which include turing complete languages (the shell commands), get the one at the top of the stack to do as much of the work as possible because it's always easier to talk to it rather than through it at one of the worker languages.
[1] The exception being anywhere that uses Qt. Sometimes they'll try and use makefiles, but it's the case that you can hypertravel to an alien world using Qt and from orbit make the single correct prediction that their build system will at best involve make being invoked by shonky scripting of some sort but more usually not involve lazy compilation at all. The reason for this is that although make is competely capable of doing both the dependency graph generation AND solving it, the recipes for this are non-obvious and for some reason the Qt people seem to actively discourage you from doing it; I personally suspect this is because they've always seen their development environment as some sort of "Visual Studio for the rest of the world" and are following the same strategy of pretending you get a "compile" button and that's the only option. Certainly as soon as you realise you don't have to use their "IDE", you don't and that's a moment as bright as when all three suns rise at the same time, so they've got some justification in not telling people how to leave.
On some worlds where the happy optimistic colonists have accidentally built a fascist totalitarian regime, they'll have carefully churned out acres and acres of paperwork about the way that code has to be written. This often includes details down to the number of spaces one must place between tokens and how many blank lines must be in files and so on, and carries sanctions of death or worse for violation. These people still allow their makefiles to look like the demented native amoeba-analogues wrote them; the glorious code of the Spacefaring People's Republic will still be built by a ramshackle collection of recursive, copy-and-pasted makefiles full of the most eyewatering convolutions[1].
So, here's the first of a cut-out-and-keep guide on how to avoid your Makefiles being dodgy.
It's a simple rule; "Make has a looping construct. Don't use shell loops, use Make's"
I'm forever seeing makefiles who carefully construct a list of stuff -- often files -- and then in a target do something like this;
foo-target: for i in $(foo-target-inputs); do dosomestuffto $$i; done
Now, the problem with this is that if anything goes wrong, it tells you at best that it failed while building foo-target. In general though, it will just succeed -- as in the example given. Why? Because for loops return true.
So when Make has completed executing foo-target, it may or may not have built it and there's no trivial way to tell.
Usually, what happens on alien worlds at the far edge of civilised space, is that some bright cadet figures that this might fail and hence adds checks to subsequent steps where they all look for the file they expect to exist and bail if it doesn't...
Honest to goodness, if you are in that place, it's time to stop digging.
I want to emphasise this isn't the ONLY thing wrong with this approach; it's not trivially possible (for instance) to run one single part of that "dosomestuffto" cycle. Even if you get it to fail, make will halt with an error report saying that the loop system as a whole failed. It won't tell you where.
Cadets, Junior Grades of the Starfleet Corps of Software Engineering, of course, will at this point add voluminous amounts of "Starting to do X!!", "Finished X!!!" type logging. Which makes the logs bigger, which means compiles take up more space.. and more time to wade through when doing failure analysis... and then people get into the whole "what can I prefix my log messages with to make them stand out best" competition that ultimately ends up with your compilation scrolling ASCII-art banners at you.
{It's amusing how often engineering cadet practices ultimately end up in the "try and be the loudest voice" trap.}
So what should people be doing?
Well, the simplest way of doing this is to realise you have the list of things to do, and make will quite happily iterate it for you.
foo-target: $(patsubst %,%.dosomestuffto,$(foo-target-inputs)) %.dosomestuffto: dosomestuffto $*
That's nicely taken care of the iteration. Purists will argue that there's a problem here, because everytime foo-target is asked for, then the dostuffs will happen. That's true, but crucially that was true to start with -- solving that is actually a different problem.
If something fails, it will fail on a commandline with the variables substituted so you can even tell which dostuff operation failed. Your starship crew will thank you for that small bit of plumbing in the way starship crew always thank their engineers -- by finding another triviality to whine like hell about.
If you're really cunning about this, you can then start putting flagfiles into place. The obvious thing to do is store the logfile;
%.dosomestuffto: %.dosomestuffto.log
@true
%.dosomestuffto.log: % dosomestuffto $* >$*.dosomestuffto.log 2>&1 || ( cat $*.dosomestuffto.log;false)
The bracketty junk at the end means (if I've typed it from memory right) that if the dostuff returns false, the second part will be evaluated which will dump the logfile and then fail, meaning the line fails. If the first part works lazy evaluation will skip the second bit and we just carry on.
Even better, you can test out a single one of the operations by saying "make myfile.dosomestuffto" and it will run it on its own.
The moral of this story is that if you have a stack of systems which include things like turing complete languages (make in this case) reasoning about systems which include turing complete languages (the shell commands), get the one at the top of the stack to do as much of the work as possible because it's always easier to talk to it rather than through it at one of the worker languages.
[1] The exception being anywhere that uses Qt. Sometimes they'll try and use makefiles, but it's the case that you can hypertravel to an alien world using Qt and from orbit make the single correct prediction that their build system will at best involve make being invoked by shonky scripting of some sort but more usually not involve lazy compilation at all. The reason for this is that although make is competely capable of doing both the dependency graph generation AND solving it, the recipes for this are non-obvious and for some reason the Qt people seem to actively discourage you from doing it; I personally suspect this is because they've always seen their development environment as some sort of "Visual Studio for the rest of the world" and are following the same strategy of pretending you get a "compile" button and that's the only option. Certainly as soon as you realise you don't have to use their "IDE", you don't and that's a moment as bright as when all three suns rise at the same time, so they've got some justification in not telling people how to leave.
Sunday 13 March 2011
Process Drivers
I've never fully understood how companies alien planets end up with the internal processes they do. At some point they don't have processes, then the processes arrive. I've never been there at the time to watch the actual hatching but it must be fascinating to observe, possibly from behind thick glass shields.
Either they "just happen" and later are formalised, or they're actually designed, but in both cases it requires some end thought before they're The Law. Which makes it odd that they end up the way they do.
Take $CURRENT_MISSION at $CURRENT_ALIEN_PLANET, for instance.
I've definitely got to do the mission. Nooooo question at all of that. It's actually quite a decent mission involving the hitting of much strange alien technology with the Big Hammer, so aside from normal engineering laziness it's OK.
Problem is that we have to get the work through the change management system. Now, I'm not opposed to the idea of these in general. The concept is that in order to stop people changing stuff at random, all the work has to be vetted. It's a nice way to broadcast each teams working to other areas of the project; and to allow them to veto things if it'll (say) stomp on stuff they already have in progress.
However. I'm the one who has to get the work through this process. Well, my response is that if I'm the one who wants this doing the most, we shouldn't be doing it ('cos I don't want to do the work) and if I'm not the person who wants this doing the most then... shouldn't THEY be pushing it through these reviews?
It's supposed to work that you turn up, pitch the work, make a case for it and the board says yea or nay as to whether the benefits outweigh the risks. But in this case we're definitely doing it -- to the point where we're already doing the actual engineering work.
And even better -- the people who want the work doing are ON the change management board.
The work proposal is written by someone who doesn't want to do the work in order to convince people who want the work done and have already decided to have the work done to decide again to have the work done because the work is already being done and we want to put an approval stamp on it...
The work proposal is not even an artifact of the process, because we've already decided the outcome, so I can't entirely see what it represents, apart from using up time which increases the risk that the actual work (also being done by your innocent engineer here) doesn't get done in time.
This process must have made sense to someone at some point. And like I say the actual concept isn't bad.
But this is stressing it to the limits of credibility, because somewhere along the line the actors and the roles have got mixed up in people's heads.
Of course, your innocent engineer here was thick enough to try and mention all this and everyone in mission command looked at me like I'd done one of those alien acid wees and burnt a hole through the floor.
Sigh.
Either they "just happen" and later are formalised, or they're actually designed, but in both cases it requires some end thought before they're The Law. Which makes it odd that they end up the way they do.
Take $CURRENT_MISSION at $CURRENT_ALIEN_PLANET, for instance.
I've definitely got to do the mission. Nooooo question at all of that. It's actually quite a decent mission involving the hitting of much strange alien technology with the Big Hammer, so aside from normal engineering laziness it's OK.
Problem is that we have to get the work through the change management system. Now, I'm not opposed to the idea of these in general. The concept is that in order to stop people changing stuff at random, all the work has to be vetted. It's a nice way to broadcast each teams working to other areas of the project; and to allow them to veto things if it'll (say) stomp on stuff they already have in progress.
However. I'm the one who has to get the work through this process. Well, my response is that if I'm the one who wants this doing the most, we shouldn't be doing it ('cos I don't want to do the work) and if I'm not the person who wants this doing the most then... shouldn't THEY be pushing it through these reviews?
It's supposed to work that you turn up, pitch the work, make a case for it and the board says yea or nay as to whether the benefits outweigh the risks. But in this case we're definitely doing it -- to the point where we're already doing the actual engineering work.
And even better -- the people who want the work doing are ON the change management board.
The work proposal is written by someone who doesn't want to do the work in order to convince people who want the work done and have already decided to have the work done to decide again to have the work done because the work is already being done and we want to put an approval stamp on it...
The work proposal is not even an artifact of the process, because we've already decided the outcome, so I can't entirely see what it represents, apart from using up time which increases the risk that the actual work (also being done by your innocent engineer here) doesn't get done in time.
This process must have made sense to someone at some point. And like I say the actual concept isn't bad.
But this is stressing it to the limits of credibility, because somewhere along the line the actors and the roles have got mixed up in people's heads.
Of course, your innocent engineer here was thick enough to try and mention all this and everyone in mission command looked at me like I'd done one of those alien acid wees and burnt a hole through the floor.
Sigh.
Subscribe to:
Posts (Atom)