Sunday, March 1, 2020

TOPOLOGY - dealing with too much information?

-  2644 - TOPOLOGY  -   dealing with too much information?  Topology is the mathematical study of properties of geometric forms that do not change with transformations, bending or shaping.  It is the mathematical study of shapes.  How can information have shapes.  Well, graphs, bar charts, power point presentations offer thousands of ways to shape information
---------------------   2644  - TOPOLOGY  -   dealing with too much information?
-  In this information age we are drowning in data yet thirsting for knowledge. 
-  Combining math, information science, and software scientists are starting to study patterns and shapes on their vast collections of data.  Math is great because it is not limited to three dimensions. 
-  We have a problem visualizing it but math sees higher dimensions as business as usual.  Visualize a cube connecting 8 corners.  Ok, now visualize a tesseract, a four dimensional hypercube connecting 16 corners.  Topology can mathematically describe these shapes in multiple dimensions without flinching.
-  To illustrate this concept a study was done on diabetes.  Each patient with diabetes was measured with four metabolic and one weight related number.  So in effect each patient was looked at in 5 dimensions.  When the shape of the data was studied it became clear that there were two distinct clusters of data, two distinct types of this disease.  They became known as Type 1 and Type 2 diabetes.
-  You can see how complexity quickly rises as we go from 5 dimensions to 100 dimensions.  Yet, in the topology of the data we can still find distinct shapes.  Learning how to do this analysis could bring new insights into our understanding of the data.  This science is coming of age none too soon.
-  Think of the large Hadron Collider, the giant particle accelerator in Geneva, Switzerland.  This device is measuring 40,000,000 particle collisions that are occurring every second.  The detectors are expected to process only 25% of 1% of the collision data, but, even that will generate 2,000 petabytes of data per year. 
-  How would you like to analyze that?  The Library of Congress has 29,000,000 books that contain only 15,000 Gigabytes.  So, the Hadron Collider detectors are processing over 130,000 Libraries of Congress every year.
-  To analyze this data scientists need high speed, massively parallel computers.  They will be trying to discover new physics, or new particles.  So, they need to subtract all known physics from the data and see what is left.  Like trying to find a particular sequence of words in 130,000 Libraries of Congress’s books.
-  In 1995 the National Archives had 57 Gigabytes of data in electronic storage.  In 2004 it was 1.9 Terabytes.  This year it is 5.1 Terabytes by July.  All these Bytes are making me dizzy.
-----------------------  A 3x5 photograph is 100,000 bytes, 100 Kilobytes.
-----------------------  A floppy disk is 1,400,000 bytes, 1.4 Megabytes.
-----------------------  A Gigabyte is 1,000,000,000 bytes, equivalent to a pickup truck filled with books.
-  A Terabyte is 1,000,000,000,000 bytes.  The records gathered for 9/11 Commission Report contained 1.2 Terabytes of information.   That is equivalent to 60,000 trees made into paper and printed.  That is just for the first copy.
-  A Petabyte is 1,000,000,000,000,000 Bytes  All the material ever printed in the world to this date is 200 Petabytes.  The National Archives preserves all White House records and 2% of other federal records.  By 2022 it will have 347 Petabytes of information.
-  A Exabyte is 1,000,000,000,000,000,000 Bytes.  All the words every spoken by human beings is 5 Exabytes.  (and that is just the women.  The average woman says 7,000 words a day.  The average man 2,000 words.  By 5:00 o’clock I have used up all my words for the day)
-  Let there by no data without records.  Let there be no records without analysis.  Let there be no analysis without a decision.  Let there be no decisions without action.  Let there be no action without data measuring the results.  Let there be no data without records…... 
-   You get the idea.  How do you decide what actions to take with all this data?
-  Topological analysis may be the only answer.  A picture is worth a thousand words.  We need one a few hundred pictures worth a Petabyte each to deal with this problem.
I have just read more bytes than I can swallow.
-   March 1, 2020                                      605                                     2644                                                                                                                                                                                                                                 
-----  Comments appreciated and Pass it on to whomever is interested. ----
---   Some reviews are at:  -------------- ----- 
--  email feedback, corrections, request for copies or Index of all reviews
---  to:  ------  ------  “Jim Detrick”  -----------
-  --  --
 ---------------------          Sunday, March 1, 2020    --------------------

No comments:

Post a Comment