Results 1 to 14 of 14

Thread: Database on the Brain

  1. #1
    Join Date
    Dec 2005
    Posts
    14,315

    Database on the Brain

    I know of no such effort underway, but about twenty years ago, I cut my database teeth on a copy of dBase III+ I taught myself how to use over a weekend. Ever since I've been bitten by the db (database) bug, and have come to believe that if everyone learned from the outside, from grade school on, how to properly organize information, the ridiculously duplicated plethora of information in this world would be drastically reduced while it's use would then lend itself to massively greater degrees of automation than is currently available.

    The world where information is messily stored in text-only format would be gone. The ability to search on related information would appear. Cross-referencing would no longer be a massively labor-intensive task, but would become a background function, capitalizing on the nature and organization of information itself.

    More to follow. Feel free to jump in at any time.

  2. #2
    Join Date
    Jul 2003
    Posts
    13,886
    There's something to this, but I have difficulty seeing it as a complete solution. For example, 'related information' is sometimes in-your-face obvious, but other times totally subjective.

    I'd add that the initial creation of those links could become a specialized job in the future. Writers, note.

  3. #3
    Join Date
    Dec 2005
    Posts
    14,315

    The Need to Organize Information

    Most of you already know the basics of proper database design, and don't even know it. As I walk you through the basics, you'll realize, "hey, this is pretty simple!

    Let's take school, for example, and assume the simplest case, where each high school student attends but one school in the district. Each student has six classes. Each class may be taught by one or more teachers.

    Where does this leave us? It appears as if the information is becoming hopeless complicated already. Each student can remember their own schedule, as can the teachers. But the homeroom teachers can't remember the schedules of all of their students, nor can a student remember the schedules of all their teachers. (well, possibly so in each case, but that would be the rather difficult exception, rather than the rule).

    That's when the information becomes a nightmare.

    Can you imagine keeping track of all of that using text documents? It can be done (I've seen it), but it's labor-intensive. Yet this was done throughout modern school systems decades before the advent of the relational database. Simple questions like "how many students does Mr. K have in his fifth period class," or "what is the ratio of boys to girls?" were fairly straightforward, but labor-intensive, requiring a manual tally of either all the school's student records, or a summary tally of the tallies done by all the home room teachers.

    Questions like, "What's the percentage difference in the grades between boys and girls in math and science courses?" were, well, out of the question. The only time these sorts of questions were definately answered was when university or government researchers came in and poured over student records to find the answes to these and other questions. Since the most labor-intensive part was going through each student's record, they'd amass a bunch of questions, find out just what pieces of information were relevant, then gather those from each record before proceeding to the next. They wouldn't start crunching numbers until back at their research centers.

    Back then, it took many researchers weeks to answer those questions. Today, with access to a database containing that information, I can answer those questions in a few minutes.

    Many people may not realize it, but the vast savings in time and money computers brought to the table with respect to storing, organizing, and retrieving this information is what largely drove the development of computer technology between the 1950s through the 1970s. From the 1980s through about the year 2000, the driving force involved word processing, and to a lesser extent, storing and retrieving information on personal computers, while the high-end shifted focus into numerical modeling. Since about 2000, the driving force behind PCs has been gaming. Think about it. Most of my efforts involve reading and writing information in text format. There's nothing along these lines I'm doing on my 1 GHz, DuoCore that I couldn't do on my 10 MHz XP clone, including going online and message boarding.

    Today, however, information is booming. Wikipedia might be taughted as a tremendous success, but consider the cost: How many man-hours times what hourly rate would it have cost them to build such a collection of volumes if they'd paid people to do it? At more than 2.6 million articles, with a wag of around 5 hrs per article, times $15 an hour, that would come to $195,000,000. I dare say that given all the additional effort that's gone into it, including multiple revisions by many different authors, that figure is low by at least one order of magnitude, possibly as much as two orders of magnitude.

    But while cross-referenced, Wikipedia is a data nightmare. It's highly useful as an encyclopedia, but it's incapable of generating basic descriptive statistics.

    For example, how many subspecies of Britain and Ireland's surf clam are native to New Zealand? Looking up surf clam tells you nothing, as does looking up New Zealand. Even following the various links won't help you. If you're from New Zealand, you might have guessed it's the tuatua, but how many subspecies of tuatua are there? Looking up "tuatua" on Wikipedia reveals there are indeed three.

    The missing piece of information, however, comes only from having detailed knowledge of New Zealand. Without that missing piece, the question is impossible to answer using Wikipedia. But it's easy to answer using a taxonomy database, in which we'd simply do a join where "common name" equals "surf clam" and "location" equals "New Zealand."

    And this example introduces us to the term, "taxonomy," which is the science of classification. This science grew out of the need to organize information so that we could:

    1. Make sense of it. Understanding that boys do better in math than girls, and that girls do better in language than boys is but one example.
    2. Use it to achieve a practical purpose. Real-time data fed through numerical modeling simulations is used to predict severe thunderstorm activity and to warn people about possible tornados.

    Since this thread is about using databases for taxonomy, which is a science, it should appear in a non-astronomy, science section. However, none exists, which is why it's here in the catch-all OTB.

    Regardless, we're all aware that our society is awash in information, but in scarce supply of various means to make rapid, often highly automated use of that information towards the common good of all people.

  4. #4
    Join Date
    Sep 2008
    Posts
    5,892
    Mugs! All this just to make a point?
    Last edited by PraedSt; 2008-Nov-01 at 06:16 PM.

  5. #5
    Join Date
    Dec 2005
    Posts
    14,315

    Information is Self-Organizing

    All information has several key qualities

    Uniqueness: Think about it - individuals are thought to be "unique." Thus, we never speak of someone in the plural. I am me, not my brother or my father. I may be related to them, but they are are them, not me. This extends to groups of information. My mother, father, brother, and I make up a single immediate family that is also unique. We are related to other families.

    Relatedness: Also, information is usually associated with other information. Thus, I may have a school I attend, while my father works for a company, which has thousands of similar people working for them, each with their own familes.

    This brings us to the nature of relations. There are three types:

    One to one. For each US Social Security Number, there is one, and only one, person. The one to one relationship is stated as 1:1.

    One to many: Each person can have many bank accounts. Thus, for each SSN, there may be many account numbers. This is stated as 1:∞.

    Many to many: A bank can have many customers, yet a customer can also have many banks. This is stated as ∞:∞.

    In database theory and design, we find three things to be true:

    1. There are exceedingly few 1:1 relationships. Even the bit about the person and the SSN we find, in practice, that the person is represented by several variables, including first name, last name, middle initial, address, phone numbers, etc. Thus, a more appropriate example of a 1:1 relationship might be the SSN of a husband and the SSN of his wife. This would only hold true in societies where polygamy is outlawed. While that may not stop the practice, it does allow database designers the ease to represent marital relationships via two people's SSNs.

    2. There are way too many ∞:∞ relationships. Each bank as an ABA routing number that's unique to that bank. Each person has an SSN that's unique to that individual. When you list the data of all people and their banks, or all banks and their people (same thing), the relationship is ∞:∞. In practice, I find that most ∞:∞ relationships are the result of failing to reduce data elements to their lowest common denominator.

    3. The third thing we find is that after reducing all data elements to their lowest common denominator, all remaining ∞:∞ relationships indicate side-wise parings between trees.

    4. For data sets without any ∞:∞ relationships, they're said to be "lone trees." Those with one or more ∞:∞ relationships are said to be part of a forest.

    Those of you familiar with Microsoft's Active Directory understand the use of these terms...

    The point of the above is that it is the information itself, the logical ways in which it can, and cannot be paired, is what organizes the information, and not some arbitrary choice on the part of the person who is organizing the data. If the latter happens, there will be reality, and a database which doesn't interact well with, or reflect, reality, which doesn't do anyone any good. But the good database developer seeks to understand the nature of the information itself, how it relates with other information, and in so doing, properly organize that data in a manner which does reflect reality.

  6. #6
    Join Date
    Dec 2005
    Posts
    14,315
    Quote Originally Posted by PraedSt View Post
    Mugs! All this just to make a point?
    The point is that this stuff really isn't rocket science if I can explain the basics in a few posts.

    And if it's that easy, why isn't it being taught from day one in the schools?

    The third point is that because it's not integrated into modern education, few people learn even the basis, which results in far too many people unnecessarily duplicating the work of many others. This parasitic effect costs governments, companies, consumers, and citizens alike trillions of dollars annually.

    Case in point: About ten months ago I walked into an office and discovered some otherwise bright people verifying information contained in a Word document before copying and pasting small parts of that data to another word document.

    This took each of them about four hours, every day. It was their "job," as in, "Well, that's our job, here, to consolidate this information and provide it to our customers." Furthermore, they are one of more than a hundred such departments which perform the same function.

    Let's see... 2 people x 4 hrs / day x 100 departments x 250 work days / yr = 200,000 misspent man-hours a year. At an effective $35 an hour, that's $7,000,000 a year. $7 Million annually.

    In two weeks I'd created a database which automated their trasks except for about half an hour to input the information which changed daily.

    Back to point #3: Why didn't any one of the bright, well-educated 200 individuals spot this ridiculous waste and simply fix it?

    What's up with that?

    It's simply because they weren't brought up to think in terms of how information is inherently organized. As a result, when the opportunity clearly presented itself, instead of implementing a time-saving solution, they went forth with the old "paper and pencil trick," or in this case, the electronic equivalent of it, by using a text editor rather than a database.

  7. #7
    Join Date
    Sep 2008
    Posts
    5,892
    Oh I see. Sorry, Mugs, I thought you were having a dig!
    Quote Originally Posted by mugaliens View Post
    Since this thread is about using databases for taxonomy, which is a science, it should appear in a non-astronomy, science section. However, none exists, which is why it's here in the catch-all OTB.
    As for this:
    Quote Originally Posted by mugaliens View Post
    Back to point #3: Why didn't any one of the bright, well-educated 200 individuals spot this ridiculous waste and simply fix it?

    What's up with that?
    Often just bad management in my experience. But there might be other reasons. For example, I sometimes use 'paper and pencil' for some tasks, however repetitive or tedious, because it clarifies things in my head.

    Also, you computer genius, sorry for bringing up one of my threads here, but have you had any experience with cloud computing? If you have, I would appreciate a factoid or two
    Last edited by PraedSt; 2008-Nov-01 at 09:22 PM. Reason: Spelling

  8. #8
    Join Date
    Jan 2002
    Location
    The Valley of the Sun
    Posts
    9,954
    I also have the db bug. I started using dBase II in 1986 when I took over running and maintaining someone else's programs. In 1988 I converted them to FoxBase+ and then wrote my company's inventory, invoicing, and accounting systems all in FoxBase+. Before that, the boss had been using general purpose commercial programs that never quite did everything he wanted them to do. I wouldn't have wanted to write that software in any general purpose programming language. They wouldn't be nearly as useful. Those programs are still in use today.

  9. #9
    Join Date
    Dec 2005
    Posts
    14,315
    PraedSt - I'll stroll on over to your cloud computing thread in just a moment...

    Chuck - cool! I flirted with dBase II, but never did learn it. I used FoxBase+ to create a database used by one of the services to keep track of targeting information. For some reason, the project manager (read, "Mr. I don't know a thing about databases") was "sold" on FoxBase+, meaning someone had sold it to him, and he'd better use it. I had to use their software, and they had a copy of dBase III+, so I wrote it in III+, but he said, "I want it in FoxBase - it's better!"

    So I cracked the plastic wrapping and got to work. The commands were almost identical, as they, like Clipper, were based on the dBase series. But I recall several nights of banging my head against the wall trying to figure out why the program code (it was simple stuff, really) wasn't producing the right answers. I finally figured it out, though, and he was happy.

  10. #10
    Join Date
    Jan 2002
    Location
    The Valley of the Sun
    Posts
    9,954
    FoxBase+ was supposed to work like dBase III but faster, but I never had dBase III to compare them. It certainly worked faster then dBase II. It came with a utility to convert dBase III programs to FoxBase+ but since I didn't have dBase III I had to manually convert dBase II code to Foxbase+, but that didn't turn out to be a major problem.

  11. #11
    Join Date
    Apr 2005
    Posts
    11,545
    Quote Originally Posted by mugaliens View Post
    Today, however, information is booming. Wikipedia might be taughted as a tremendous success, but consider the cost:
    That brought me up short, until I realized you must've meant "touted", right?
    But while cross-referenced, Wikipedia is a data nightmare. It's highly useful as an encyclopedia, but it's incapable of generating basic descriptive statistics.

    For example, how many subspecies of Britain and Ireland's surf clam are native to New Zealand? Looking up surf clam tells you nothing, as does looking up New Zealand. Even following the various links won't help you. If you're from New Zealand, you might have guessed it's the tuatua, but how many subspecies of tuatua are there? Looking up "tuatua" on Wikipedia reveals there are indeed three.
    I googled "surf clam"+"New Zealand" site:en.wikipedia.org and it returned (in the top five) a link to the wiki page on tuatua, with the description "Tuatua is the Māori language name for three subspecies of surf clam native to New Zealand:"
    Since this thread is about using databases for taxonomy, which is a science, it should appear in a non-astronomy, science section. However, none exists, which is why it's here in the catch-all OTB.
    ? BAUT does have the General Science forum.
    Quote Originally Posted by mugaliens View Post
    And if it's that easy, why isn't it being taught from day one in the schools?
    Day one being kindergarten?
    Back to point #3: Why didn't any one of the bright, well-educated 200 individuals spot this ridiculous waste and simply fix it?

    What's up with that?
    Some people never learn.

  12. #12
    Join Date
    Dec 2005
    Posts
    14,315
    Quote Originally Posted by hhEb09'1 View Post
    That brought me up short, until I realized you must've meant "touted", right?
    Yes.

    I googled "surf clam"+"New Zealand" site:en.wikipedia.org and it returned (in the top five) a link to the wiki page on tuatua, with the description "Tuatua is the Māori language name for three subspecies of surf clam native to New Zealand.
    Good for you!

    ? BAUT does have the General Science forum.
    General Science (which covers non-S&A topics) exists as a sub-forum of S&A. That makes about as much sense as selling hamburgers at a hot-dog stand. It should be a sub-forum of a non-S&A forum.

    Day one being kindergarten?
    Why not? We teach readin', writin', and 'rithmahtick....

    Some people never learn.
    You say that facetiously. The issue remains - some 200 people, many of whom were taught how to fix the problem in college, did nothing. What they weren't taught to do is to think critically. Many of them often stated, "this is stupid," but no one did anything about it. Had they been taught how to properly organize information from the start, it would have been second nature to them, rather than an exercise for some final exam.

  13. #13
    Join Date
    Apr 2005
    Posts
    11,545
    Quote Originally Posted by mugaliens View Post
    Good for you!
    I just pointed that out because it seemed like an example of something you said couldn't be done.
    General Science (which covers non-S&A topics) exists as a sub-forum of S&A. That makes about as much sense as selling hamburgers at a hot-dog stand. It should be a sub-forum of a non-S&A forum.
    Still, it's for non-S&A topics ("From aardvarkology to zymurgism, anything scientific that isn't astronomy can be discussed here.") That's contrary to what you said earlier ("However, none exists,"), I was just trying to be helpful.
    Why not? We teach readin', writin', and 'rithmahtick....
    What would we leave out?

    I run into this all the time. People think the world would be an obviously better place if their particular expertise would be emphasized more in the school system. At the PhD level, a geophysics professor insisted that grad students did not have to take geology courses but geology students should take as much geophysics as possible. A petrochemistry professor complained that some of the geophysics students had an appalling lack of knowledge about geology (a PhD candidate defending his thesis on seismic prospecting seemed to think that oil deposits were vast voids under the earth), and I sympathized but I pointed out that he often discouraged his students from taking geophysics. He looked at me blankly, of course they didn't have to take geophysics.
    You say that facetiously.
    Yes...
    The issue remains - some 200 people, many of whom were taught how to fix the problem in college, did nothing. What they weren't taught to do is to think critically. Many of them often stated, "this is stupid," but no one did anything about it. Had they been taught how to properly organize information from the start, it would have been second nature to them, rather than an exercise for some final exam.
    Of course, teaching critical thinking is problematic, since it would require students to be suspicious of their teachers.

    What we need is a new paradigm.

  14. #14
    Join Date
    Dec 2005
    Posts
    14,315
    Quote Originally Posted by hhEb09'1 View Post
    I just pointed that out because it seemed like an example of something you said couldn't be done.
    I said it wasn't straightforward. I never said it was impossible.

    I was just trying to be helpful
    Thanks.

    What would we leave out?

    I run into this all the time....
    (sigh) Me, too. I'm not so sure if it's that we need to add more curricula, or if we simply need to modify existing curricula.

    What we need is a new paradigm.
    Exactly!

Similar Threads

  1. Right brain/left brain
    By eyeinthesky77 in forum Off-Topic Babbling
    Replies: 22
    Last Post: 2004-Nov-23, 11:06 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •