Saturday, September 22, 2007

Talk about a word until it makes no sense.  Why would you do this?  This is much of what philosophy seems to consist of, attempting to reach the absolute definition of a word with zero ambiguity.  While a neat intellectual exercise, there exists a threshold where this ceases to be useful for most humans.  Typically most words can be defined in a short phrase.  This suffices to give us a mental picture of what a word is meant to convey.  In fact passing this mental threshold might be somewhat harmful.  There be dragons there. 

We can draw a parallel to programming or to program design.  Consider a database schema or a UML diagram of a painstakingly designed class hierarchy.  Each quantum of information neatly defined in terms of other types, just as a word1 is defined by other words.  Each word defined to the point of lowest ambiguity.  At least this is the intention of the designer if one follows the typical methodologies for such things.

However, is there a need to leave ambiguity in the system?  Can we equate ambiguity to abstraction?  On some level abstraction and ambiguity are linked. 

Consider the need the most organisms have for ambiguity in their perception of their environment.  Humans do night see all spectrums of light.  We do not hear all frequencies of sound.  The human eye cannot distinguish every possible shade of color.  If you say a restroom is for woman, your brain can picture the type of person that may enter that room.  The general class of woman is sufficient for the needs of blocking males from entering the females restroom.  Our brains have the ability to remove unneeded information so that we can concentrate on what is important.  

This is the crux of the issue.  This is abstraction.  Instead of dealing with the billions of women on the planet, we can deal with the one class of women.  We abstract away the complexity and gain ambiguity.  If we say that all women have long hair, we over specify the general case of women and lose the power of abstraction. 

What information is important in our system and how much do we really have to specify it.  Is it enough that we know that something is a series of characters forming a string?  Is a person’s name a series of fields like prefix, first name, middle name, last name, and another series of suffixes?  Or is it simply one string of characters?  

Let’s examine the first case.  In a large organization there is often a design or architecture team.  This team is typically obligated to define an organizations data.  The team most likely will tend to choose the multi field very explicit representation of a person’s name.  After all, what is there job but to minimize ambiguity and maximize specificity? 

In the second case a lone hacker with a deadline may decide that one long database field is enough to store the name.  This is most expedient and it gets the job done. 

Why does the lone hacker’s system survive longer and cause fewer problems?  Consider the first case again.  Most likely we will have to define a suffix table where every known suffix (Jr, Sr, MD, CPA) is defined.  After all a user must now choose this from a dropdown.  Also the prefix must be entered.  We will need another table to define these.  What happens with a person who has two middle names?  Do we define another column?

It appears that even in the simplest data point definition we have run head first into mushrooming complexity.  How long do you think the design meetings will take JUST to determine how to define a name?  I’ve been in on this and it can take a surprisingly long.

The lone hacker has written most of his system by the time the committee has determine the proper representation of each data point in the system. 

The fact is that ambiguity or rather the ability to handle ambiguity is a feature of a system not a flaw.  Many times the ability to handle ambiguous data from a user or external system saves a monumental amount of redesign and rework.  An example would be a company that decides to open a foreign branch and now must handle names in a different format.  Do we need to redesign our system or does it just work?

A caveat here is the interfacing to other (broken) systems that require the data to be split into small pieces and fed in.  A partners system requires first and last names.  Is this a problem?

It can be.  One solution is to parse the name field.  Ninety percent of the names will probably be first and last name only.  This is because most users are lazy and only want to enter the minimum amount of data.  Some self absorbed types are going to enter every credential in the list including their Boy Scout merit badges.  Now we may have to kick back this data to a human to look and correct.  Last I checked data entry was cheap, programming was expensive.  A great solution?  Maybe, maybe not.  How much money did your design team burn by arguing over what should be in the prefix and suffix table? 

In the age of search, ambiguity becomes more useful.  If we can search on any field in the database than I no longer have to use last name to organize a list of people.  Searching for them may work much better.  Or it may not and you find that you want a separate first and last name field.  

Of course if your requirements call for separate fields because they must be fed to another legacy system, then by all means specify two fields.  You have a valid reason to do so this increases the coherency of the system.   My point is about over specification in the absence of requirements.  

In the absence of requirements, don’t over specify.  Creating unneeded complexity in a system does not benefit anyone.  Ambiguity, as a facet of abstraction, provides the designer a way to deal with complexity or at least to mitigate it.

 

1-In fact languages like Forth or Factor call the smallest unit of execution a “word.”  The word in such a language is equivalent to a function in the more “mainstream” languages.

Saturday, September 22, 2007 9:52:04 PM (Central Standard Time, UTC-06:00)  #    Comments [0]