This year we had a HUGE number of entries from many different industries for the Best Use of Data Science in a Large Company Award powered by FTI Consulting. This Year’s Large Company finalists consist of Bank of Ireland, Vodafone, Depuy Synthes, Microsoft and Core Media, who’s teams were represented them in force at the presentation finals on August 17th in Croke Park.
Martin Perry, Senior Data Professional at Microsoft, led his team on presentation day and has written this great piece on Big Data Insight to provide a better understanding as to what drives companies to use Big Data and where this data derives from.
Sources of Big Data
The business opportunity of Big Data is extracting value. Data produced by business is expanding exponentially and is now a significant trend for IT and Business in 2017. Notably:
- 90% of the world’s data was created in last two years.
- every two days now we create as much information as we did from the dawn of civilization up until 2003
- currently the volume of data created by U.S. companies alone each year is enough to fill ten thousand Libraries of Congress.
- Data production will be 44 times greater in 2020 than it was in 2009.
Businesses have access to mountains of data., such as range of data from transaction logs captured within databases, to customer data within their CRM Systems, to publicly available data from the Web and social networking sites.
There is the challenge in extracting value from such data, with more than 37.5% of large organizations saying that analyzing big data is the biggest challenge. Yet analysing these large data sets is a source of competitive advantage. Through data driven knowledge discovery of actionable information, data can provide insight such as:
- Sales growth from analyzing customers’ preferences to products and services
- Productivity growth from collecting and analyzing transactional data leading to business process improvement.
- A retailer using big data has the potential to increase its operating margin by more than 60 percent.
New Sources of Data and the Digitization of Existing Data Sources
Multiple sources are responsible for the explosive growth of accessible data. Some of these sources are entirely new data sources, while others are a change in the access and availability of existing data generation. There is huge growth with the creation of digital representations of existing data in Industries such as:
- Transportation – Sensor data are being generated at an accelerating rate from fleet GPS transceivers
- Logistics, retail, & utilities, – the use of RFID (radio-frequency identification) tag readers and smart meters
- Telecommunications – cell phones generate substantial data records including geolocation elements
- Health care – an industry that is moving quickly to electronic medical records and images, for uses such as short-term public health monitoring and long-term research programs
- Government – examples are digitizing public records such as censuses, energy usage, budgets, Freedom of Information Act documents, and law enforcement reporting.
- Entertainment media – there has been a huge impact of digital recording, production, and delivery of content. In addition, the collection of user behavior data to better personalize offerings.
- Life sciences – Low-cost gene sequencing generates tens of terabytes of information that can be analyzed for genetic variations and increasing potential treatment effectiveness.
Big Data – A Working Definition
Commercially available spreadsheet software (such as Excel and OpenOffice Calc) have a maximum number of rows per sheet of just over 1 million (1,048,576). Google Sheets, the online spreadsheet application, can only handle 2 million cells – the maximum number rows being dictated by numbers columns used (e.g. a four column sheet can only be able to handle 250,000 rows).
However volume is not the only component of big data. The 5 V’s model is effective for defining Big Data- Figure 1‑1:
Figure 1‑1 Five V Model for Defining Big Data
- Volume: Big Data comes in one size: big. Terabytes and petabytes of information are easily amassed
- Velocity: Data is often time sensitive, both how fast data is being produced and how fast the data must be processed to meet demand. There is increasing demand for the technology to analyse the data as it is being produced.
- Variety: Big Data extends beyond structured data to include unstructured data. Previously structured data that neatly fitted into tables or relational databases, such as financial data, was the focus for enterprises. Now with 80% of the world’s data being unstructured (text, images, video, voice, messages, social media conversations, photos, sensor data, video or voice recordings) this has become more important.
- Veracity: Purity of information is critical for Quality and accuracy remain essential requirements.
- Value: Value is essential to help with:
- understanding and targeting customers
- understanding and optimizing business processes
- improving healthcare and public health
- improving science and research
Data, big, or small, has no value itself, unless it can provide insight that leads to action and thus desired outcomes. This is key to unlocking the value of the data requires the right questions must be asked of the data. See figure 1‑2.
Figure 1‑2 Insight Leads to Desired Outcomes
According to Gartner, an ‘organization that successfully integrate high-value, diverse new information types and sources into a coherent information management infrastructure will outperform their industry peers financially by more than 20 percent‘.
Are you making the most of your data sources?
Be sure to get your ticket for this year’s DatSci Awards taking place on 21st September 2017 in Croke Park, Dublin, Ireland.
It will be a unique Awards that will give you the opportunity to connect with over 400 Leading Data Science Professionals and to learn from your peers in the Data Science Community!