Big Data Thoughts Bastille Day 2013

Embed Size (px)

Citation preview

  • 7/28/2019 Big Data Thoughts Bastille Day 2013

    1/1

    Living on Pleasant Hill Time Neither aught nor naught: Secret of Big Data July 17, 2013

    Copyright 2013, David M. Sherr Annals of a Running Dog P a g e | 1

    PrefaceFifteen years ago today, Bonnie and I entered Paris by train to all the great excitement, congestion and gendarmescarrying Uzis.

    In honor of Bastille Day, Le Marseilles scene from perhaps the most famous film of all time, Casablanca .

    Today, I am dealing with Big Data forty years and counting.

    July 14, 2013, Bastille Day (Libert, Fraternit, Egalit)In the mid 1970 s, I processed several 9-track EBCDIC tapes from the US Agriculture Department with the HANES

    (Health And Nutrition Evaluation Study) data. HANES I was the dietary recall longitudinal history stratified samplestudy of people s food habits.

    I used PL/1 to process the tapes because I could specify guarded commands ( ON construct) and have them

    execute semi-automatically. In this way I could accommodate missing data easily, distinguish the Naught or Null value. This was handled differently from Zero which was the default value of a Missing Value if one didnot trap and process it differently.

    In the mid 80 s Bonnie became a Nutrient Data Base Researcher for Campbell Soup and became a SAS expert atusing the HANES II data. I left Data Analysis of food consumption to her until, during my 1996-99 financial systemsemploy interregnum, from Oct 1998 to March 1999, I converted a Campbell Soup DOS/BASIC-based PC app calledMENUSCAN into an interactive Web app.

    The Secret of Big Data AnalysisAnd, then, as now, the secret of analyzing this data is handling that which was Neither aught nor naught. Onemay mean, Neither aught nand naught but nand is a not an English, but the Boolean construct . Not (aught ornaught) is the specific meaning, so it is some non-zero number. This is, as my friend Clarke maintains, Separatingthe pepper from the fly specks. Only he is a bit more colorful in his language.

    The point is handling the data requires understanding the full range of possible data values as well as no data at all. Zero is not always the absolute value 0 but can be the middle point of any distribution of data.

    Median is the best example. Like the median 2011 income in the US being $26,364: just as many above as belowthat income.

    [And now I go to last week s news of Farm Bills and Food Stamps. Tsk!]

    Median is much more descriptive than Average since the Uber Rich skew the Average up to $51,560. Imagine, the0.1% (300,000 people) of the population receiving the difference between Median and Average, viz., $25,000 X300,000,000 = $7.5 X 10 12 or $7.50 Trillion. This equates to about $69 Million each on average for the Uber Rich(since they received about 90% of increase in income). This is not wealth, but annual income. The Wealth is, well,Uber.

    Let them eat cake! How much is enough?

    https://www.facebook.com/david.sherr.5/posts/10200556961625694https://www.facebook.com/david.sherr.5/posts/10200556961625694http://en.wikipedia.org/wiki/NOR_gatehttp://en.wikipedia.org/wiki/NOR_gatehttp://en.wikipedia.org/wiki/NOR_gatehttp://en.wikipedia.org/wiki/Nand_gatehttp://en.wikipedia.org/wiki/Nand_gatehttp://en.wikipedia.org/wiki/Nand_gatehttp://www.emeraldinsight.com/content_images/fig/0670330504013.pnghttp://www.emeraldinsight.com/content_images/fig/0670330504013.pnghttp://www.emeraldinsight.com/content_images/fig/0670330504013.pnghttp://www.emeraldinsight.com/content_images/fig/0670330504013.pnghttp://en.wikipedia.org/wiki/Nand_gatehttp://en.wikipedia.org/wiki/NOR_gatehttps://www.facebook.com/david.sherr.5/posts/10200556961625694