Codecraft: 2009

Thursday, April 16, 2009

Science and Its Misapplication

Science is observation.

It is the systematic search for knowledge – a striving to understand the reality that we live in. It’s exciting and fun in the same way that it is exciting and fun for a young child to discover the environment that she is placed in – touching, feeling, and tasting everything. The skill with which we observe is at a much higher level, but the basic process and motivation are the same. We want to know what is going on around us.

Science is limited.

Because science is observation, the only knowledge that we can attain using it is knowledge that can be obtained through first hand observation and inference. Anything else is not science. If you hear your science professor tell you that the Earth is round and believe him, this is not science – even though she is a science professor. If your mother tells you that the stove top is hot and you believe her – this is not science.

Science is not enough.

It is sheer arrogance to think that through direct observation we can get a hold of enough knowledge to make good decisions about how to live. To re-use a previous example: if when you were a child and your mother told you the stove top was hot – the best approach is not the empirical one. Similarly, you don’t need to directly observe the effects of a severe car accident to believe those who tell you that it is important to wear a seat-belt when riding in or driving a car. Imagine how many problems we would have if before we made a life decision we had to systematically observe the exact same situation – it would be paralyzing – and ridiculous.

Science evolves.

All of these things should be fairly self-evident – I think most people would agree that science is about observation and that there is knowledge out there that isn’t observable. However, I think a lot of people miss that scientific understanding about a particular bit of knowledge evolves and is often times found to be downright wrong – to say that again with a more positive angle – scientifically obtained knowledge is constantly refining, drawing closer to the actual state of reality. Take gravity for instance – Newton was pretty sure he figured that one out – but it turns out there are situations where gravity doesn’t work like he thought. Or take tree rings – trees are typically expected to add a single ring for every year that they are living – however conditions have been observed where trees either add extra rings or don’t add any rings – contradicting previously held assumptions.

So to finish – it is important to remember that when thinking through life’s big issues: scientific observation is a phenomenal tool that we’ve developed however it is not the Inerrant Complete Answer to Everything™. Also, when thinking through a particular issue, if it is important to you to use science, it is not “more scientific” to believe someone whose profession is science as opposed to believing someone of another profession – both are faith.

Tuesday, March 31, 2009

INT v. GUID - The Battle Rages On

So a friend of mine (and probably countless other geeks) continue to debate the data type that is best used for "surrogate" primary keys in database tables....a sarcastic comment on Facebook from yours truly sparked a bloody battle in the larger war.. ok I'm being dramatic - but it did kick off a good conversation... and brought out some of the underlying issues surrounding this oh-so-important debate....the discussion follows:

---

Brian Sneddon Looks like Twitter's record ids for storing tweets is around 1.4 billion. Hope they're using bigint already because the clock is ticking...

Kyle Rowland at 3:46pm March 30
guid

Brian Sneddon at 4:43pm March 30
If you use a guid you have no way to compare the time a record was created relative to other records unless you add a timestamp field. Then if you want to be able to search off it you need to index it. Can be quite a high cost versus just using something like a bigint.

Kyle Rowland at 4:46pm March 30
Right - but you're adding multiple purpose to the key - this is kind of a hack - at bare minimum it's unclear. Justifying it in terms of processing cost is like arguing that a GUI takes too much computing cost and we should therefore use only commandline prompts (yeah, i know i know...if only)... it's the same thing though, you're trading clarity for nominal processing cost.

Brian Sneddon at 4:56pm March 30
I can see your perspective, but I don't agree that it's a hack or unclear. I think having an autoincrementing primary key is a self-evident mechanism for both establishing comparative chronology as well as ensuring a unique reference to the row. I don't think there's anything wrong with taking self-evident and consistent behavior and making logical conclusions based on it. If I store bank balance information and want to use it not only to determine a customer's balance but to also make a business decision as to what tier I consider a customer to be in if the tier is consistently determined by the range of their account balance, I wouldn't see a problem in using that account banance for multiple purposes.

Kyle Rowland at 5:52pm March 30
There's nothing *wrong* with it per se - it's just better not to tie multiple meanings together like that. The bank balance thing is a perfect example - sure you could store it like that, but what if the business rule changes...in your world that rule is now embedded as an assumption of the code that is reading the data. If the rule changes you have to change whatever code was making assumptions about the rule. If you separated them out into two distinct fields (say a balance field and a tier field) - it would be very clear what was going on, and if your business rule changed it would require minimal changes to the client code. It's not a right or wrong thing - it's a readable, maintainable or not thing..

Brian Sneddon at 6:31pm March 30
To compare apples with apples in my example, the tier never changes. If it would change then it would make more sense to store it separately. In this example a tier is fundamentally tied to a specific balance range and is immutable. Not the best real world example, I admit. However using an autoincrementing primary key still allows you to both establish uniqueness as well as provide comparative chronology in both an efficient and straightforward manner. I think the argument that it's not clear could be taken ad absurdum to claim somewhat subjectively that we should abandon any complex datatypes simply because there might be some confusion as to how they should be used. There's a balance you strike here between efficiency and simplicity, and while any point on the spectrum isn't inherently wrong there will be disagreement about the necessity of leaning in either direction. I just doubt that anyone who works with database apps would actually be confused by it.

Brian Sneddon at 6:35pm March 30
Then to get into database implementations, when inserting into an index it is cheaper to insert into the end of the index as in typical tree implementations it requires much less maintenance of the tree in the process. With guid primary keys, which is of course what this stemmed from (no pun intended), you're constantly inserting into various points in the tree and can incur a lot of maintrnance cost in the process. I just don't see a whole lot of value to using guid vs a sufficiently large integer when you need a separate primary key.

Kyle Rowland at 8:16am March 31
Of course it's subjective - this is where software is an art. I'm not saying using an int primary key is evil - just that it slightly less clear (IMHO). In general, I like to avoid doubling up on meaning - this is just one design decision - but multiply it by 1000 and depending on the collection of individual choices one of two things emerges: an elegant, modular system that is a joy to maintain or that other thing. Both work, but I know which I'd prefer to work on. So to get back to the main point - there are really two arguments that you've proposed for using int for a primary key:

1) optimized database access
2) the convenient side effect of having chronology

They're both valid... however, until you show me some evidence that #1 even makes a difference in the real world, I don't believe it should be taken in to account - and I don't mean benchmarks, I mean in the context of a real, well written app. And really in total, I'd still prefer clarity over both of these...

Kyle Rowland at 8:18am March 31
Also re: The Mythical Immutable Requirement.... I don't believe it exists. In either case here or at all.

Brian Sneddon at 10:32am March 31
Check http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-81922 for an explanation as to why UUID can hurt performance. I know you said you don't want a benchmark, but you know I don't have access to any real-world scenerios. ;) However based on the simplicity of the benchmark at the end (i.e. bulk loading into a table) you can see that as the amount of data you need to insert rises the performance impact becomes clearly visible. This impact might not be noticable in smaller apps, but would certainly manifest as it grew.

Brian Sneddon at 10:37am March 31
of course at the same time Brian Aker's tests using UUID don't show that same performance degredation. So I guess the results aren't so clear. The great debate will live on! :)

Kyle Rowland at 11:46am March 31
yes it will :)

Saturday, March 7, 2009

Waiting to Sail

I find myself sitting in my car waiting to board a boat to Kingston - just looking around.

it is one of those times where it just seems to sink in, the insanely beautiful elegance of Creation...

The elegance of everything that the Lord Jesus placed in motion and entrusted to us is matched only by its unbelievable scale.

Looking at the water and the chaotic symmetry with which it interacts with the wind, and how the water all seems to like to stay together....that some objects stay on top of it while others are underneath.... I'm not a ph.d. In physics or biology, but all the moving pieces that are in play and doing their thing according to the Master's plan is crazy..

And that's just the water...there's also the blue sky with it's perfectly manipulated reflections and refractions of the sun's light....

And of course there are the people that are walking around the dock.....with their uncountable number of cellular (biological not telephone) operations happening constantly.... That go nearly flawlessly at all times.......

The Creator's hand is so evident and so powerful in all this, I can't help but feel small. And grateful...and in Love with the Master... Jesus.

Sent from my Verizon Wireless BlackBerry

Codecraft

Thursday, April 16, 2009

Science and Its Misapplication

Tuesday, March 31, 2009

INT v. GUID - The Battle Rages On

Saturday, March 7, 2009

Waiting to Sail

Items of Interest

Blog Archive

About Me