The whole Big Data is greater than the sum of its parts
How to win in the application economy by finding business value in information and potential data.
“Imitation Game,” the current Oscar-nominated film about code buster Alan Turing, reminds us that from the early days of computers, we have thought of “data” as something organized into rows and columns; something that could be structured. A newspaper or book clearly contained information, but it wasn’t really data because to a computer it appears to have a random organization; it was unstructured and couldn’t be processed usefully by computer programs.
The revolution underlying the application economy is the emergence of new tools, and enough processing power, to glean value from unstructured information thus turning it into data. We call it, “Big Data,” because this revolution gives access to much more data than we had before. The code has changed forever.
There is a myth that, “Big Data analytics,” is all about NoSQL databases and unstructured data. A lot of the clever analysis that companies are doing with high volume, transactional data is achieved with structured data alone. This is Big Data too. The two archetypal cases most used today are:
• Recommendation engines: used real-time or post-sale to suggest additional purchase options based on what other similar buyers have bought
• Fraud detection: usually real time to alert on unusual behavior pattern data (access point, transaction time, purchase type, etc.) that might be indicative of fraud.
The revolution that has happened here is based just upon the availability of lots of compute power. This gives the ability to process complex queries that used to take hours, in seconds or short enough periods to afford an effective real-time value.
In the mainframe world IBM achieved this by offloading complex DB2 SQL queries to their DB2 Accelerator – the ex-Neteeza device that attaches directly as an extension to the mainframe to receive data and processing instructions at lightning speed in a way that is transparent to the applications spawning the requests. The downside is that this technology is expensive and only for the well-healed elite. Hadoop democratized the possibility of large-scale compute power by making it available through massive parallel consumption of commodity servers. The cloud providers democratized access to it with their IaaS and SaaS offerings offered at cheap prices with a pay-for-use business model.
Many innovative and valuable analyses are being done purely using unstructured data. For instance, imagine analyzing text data like Tweets, Facebook posts and emails sent to customer service, companies might visualize whole new emerging problems that they could build products to solve. Minimally they might be able to validate or eliminate new ideas and “fail fast” as the axiom of lean innovation teaches us. Earlier than the competition does is all that’s needed; he who has the best data scientist wins!
However, the most immediate ways to augment or create new business processes are probably achieved by combining the two types of data, structured and unstructured. The previous use of social data, or person based feeds, is called a, “sentiment analysis.” By combining data from the product catalog, with sales data and a sentiment analysis, companies can quickly get an early grasp on the shape and size of dissatisfactions. This allows product managers to make changes and then use the same data sources as a feedback loop to see if the investment fixed the customer issues.
And the increasingly important use case of fraud Detection can be hugely enriched by the use of unstructured data. Logs of movement through the Internet, or other infrastructure, can reveal deeper patterns than just transaction origin can alone. Activity on social media might indicate buying (or other) behavior patterns preceding a fraud or help illuminate correlations with post-fraud selling activity. These are just two simple examples.
Some analysts tell us that 80 percent of the data a company has today is unstructured data. As the IoT becomes a reality, unstructured data will become more like 99.9 percent of the data a company has. The winners in the application economy will be those that can find business value in all this information and potential data.
In a survey from last December it was revealed that 67 percent of large companies are in production with Big Data analytics.
Although this is on the high side for survey results, the chances are that if you are not doing it yet then be assured that your competition is.
You better start playing the imitation game quickly and start busting the new code of Big Data for yourself.