What’s this thing they call Big Data?

Two apparently different occurrences in my life yesterday appeared to share more synergy that I first observed.

Thinking zombieOne of these things has been puzzling me for several months already: What is this thing they call Big Data really? A mass of data generated so fast that it cannot be practically managed by conventional methods. So, how much can be? And tomorrow we can manage more data than we could yesterday, as technology advances, so what is Big Data today might not be tomorrow? – Of course, tomorrow we will also be able to produce data faster that we were yesterday, for the same reason.

The other thing is, how my taxi driver yesterday had problems with his touch-screen system for accepting taxi calls, and I talked with him about how things are typically built, not perfect, but sufficiently good.

Now, to show the connection between these two, I first explain couple of concepts – the NP-completeness-mindset of an engineer, and fuzzy logic.

Computer scientists are traditionally engineer-minded. Mathematicians in essence. Mathematicians, and especially computer scientists have been enthusiastic about whether or not a problem is NP-complete since 1971.[1] This problem concerns, whether or they can be solved easily, or if their solution needs hard and tedious work that is possibly impossible to confirm, whether an answer can be found or not. This is fascinating to engineers, who like to find clear cut solutions to any and all problems. Of course, this has been a motivating drive for engineers already far before the concept of NP-completeness. People like to have confirmation and certainty.

A bigger pop-hype than NP-completeness has been Fuzzy logic – introduced in 1965 by Lotfi Zadeh.[2] Not everything is just either or, but there are values in-between. It might be more efficient to build, for example, washing machines which can consider a set of laundry, not just dirty or clean, but possibly also something in-between and take action based on that. I remember people thinking in 1990’s how fuzzy logic would solve everything and be used especially in the making of artificial intelligence.

Now, in practise, fuzzy logic is more mature logic than binary logic. Most things in reality are not binary. For a child a cake might appear as absolutely horrible and unacceptable, if the cherry on it’s top falls off. An effort is only successful, if it succeeds perfectly. Otherwise it’s a failure. A mature adult can see things more openly. The cake is not perfect, but absolutely great never the less.

Most manufacturing does not aim for perfection, but products that are sufficiently good. As goes with the touch-screen system of my taxi driver. Apparently he succeeded in accepting more calls than he fails in. The system is an improvement for his business. Yet, it is not perfect, and when it fails, it causes a big negative reaction. It makes the driver hate it. The system is not perfect, because designing it perfect would make it excessively expensive. It is much cheaper to build things that work sufficiently well, and the additional quality would not be worth the cost. The products are optimised for return of investment (ROI).

Everything in life does this. Our lifespan is not eternal – it’s sufficiently long. Our eye sight is not perfect – it’s sufficiently good. A tiger isn’t able to catch each prey it stalks – only sufficiently many. It is not a simple question of whether or not. It’s fuzzy. The tiger could even just manage to maim the prey a little, before it get’s away to lick it’s wounds, and then possibly keep on thriving, or die of the infection and be eaten by scavengers. So did the tiger kill it, or not..?

The “conventional methods” that fail Big Data, appear to be binary methods. To manage the data means complete and fault free control over the data, where any margin of error is decided as failure. Big Data methods are therefore methods similar to fuzzy logic, where even 0.9 is sufficiently good, not just 1.0. Big Data should rather ask the users, if they want to know how often the tiger feels like it killed the prey, or how often the prey feels like the tiger killed it. It might want the users to give up from finding the complete answer to each case.

I don’t know, how much of me is the childish engineer wanting to have the complete explanation what is Big Data and what isn’t, and how much of me is a philosopher just posing the question in effort to see what sorts of answers and further questions it brings forth. At least I need to find a sufficiently good solution for talking about Big Data in my scientific writing.

[1] https://en.wikipedia.org/wiki/NP-completeness

[2] https://en.wikipedia.org/wiki/Fuzzy_logic

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s