By Dr. Kandis Y. Wyatt, PMP
Faculty Member, Transportation and Logistics
Big data – copious amounts of information that is too large for humans to process easily – has become a part of our everyday lives. With the explosion of the internet, cell phone apps and social media sites, the amount of information that technology can collect exceeds our current capacity to store it.
Big data is useful because it enables us to gain a better understanding of our world and the people in it. However, what type of data should be collected and stored? How much data should we access about the people in our society?
Types of Big Data
Big data is particularly useful in business. It is an important driver in decision making and can be divided into three categories: structured, unstructured, and semi-structured.
Structured data comprises nearly 20% of all big data. It is typically stored in a database and used as an information source about customers, processes, and staff.
Unstructured data includes information gathered from the content provided in websites, social media and text messages. Semi-structured data is the intersection of unstructured data and structured data; it does not have strict limits but is collected in the same manner.
Scientists, health officials, politicians and tax collectors use big data daily. For instance, health officials can analyze public health data to see where disease outbreaks are occurring and determine which communities require help the most.
Big Data Storage Categories
Data storage for individuals and companies is typically categorized using the “five Vs” – volume, velocity, variety, veracity and value:
- Volume – the amount of data to be stored
- Velocity – the speed at which information will be sent, distributed or archived
- Variety – the types of data and its categories in a database
- Veracity – the accuracy of the data when it is collected and stored
- Value – who will access the data and what benefit it could provide an individual or company
Big Data Benefits
There are many benefits to using big data. Databases can be structured to share information, which can benefit individuals as well as communities. For instance, organizations like Amazon use big data to determine your shopping preferences and suggest products or services you may want to buy.
The data that is collected about you can be customized according to your age, gender and country. For example, a pharmacist can consult a database, see all of your prescriptions and determine if a certain mix of drugs would be harmful to you.
Big data can also be used to increase public safety. For instance, transportation officials can analyze vehicle crash statistics against the time of day and the locations of accident sites. Using multiple databases could help a community pinpoint when and where the deadliest fatalities are occurring so proactive measures can be taken to prevent future traffic accidents.
Similarly, streaming platforms such as Netflix and Hulu collect user preferences and make future recommendations on what customers might like to watch. This strategy improves each customer’s user experience, which can lead to higher customer retention.
Another benefit of using big data is fraud prevention. If a customer’s credit card is fraudulently used in a different location or in various locations, a credit card company can detect the unauthorized use of the card and check with the card’s owner.
Drawbacks to Big Data
There are several drawbacks to using big data at times. For instance, big data can be time-consuming to collect, and the construction of databases to store it can be equally time-consuming.
Also, computers and servers have a limited storage capacity. Even cloud-based resources have limited capabilities when it comes to storing data.
In addition, data needs to be collected and stored in a consistent way. There is an ongoing need for interconnectedness among databases and transparency in documenting stored data.
Another concern is access to the data. For instance, consideration should be given to the security measures used to protect databases from hackers, internal thieves, spyware and malware.
There are some ethical concerns as well. If someone is a smoker, for instance, data could be used to predict that person’s life expectancy rate, hospitalizations and smoking-related illnesses.
A company such as an insurance company could make business decisions based on that information. But what if that person decides to quit smoking? Should that former smoker’s insurance still be affected?
In the tech world, there is a saying: “Garbage in, garbage out.” This statement means that data must be highly accurate for any data-based algorithms and predictions to be accurate. In some cases, data bias can cause businesses to draw misleading conclusions when interpreting data and make the wrong business decisions.
Big data can also be dependent on someone’s personal interpretation and cultural norms. Selective screening makes data easier to comprehend, but some vital data could be overlooked if someone doesn’t consider it to be useful. As a result, ethics, risk assessments, and quality evaluations should be involved in determining the best methods for selecting and screening data.
Ethical Considerations for the Future
As the amount of readily available data exponentially increases, we will need more guidelines to ensure data is used for beneficial and ethical purposes. Companies that regularly collect big data about their customers will need to be more transparent and tell customers what information is being collected about them. Similarly, the citizens of a community should also be informed of what information is gathered about them and how it will be used.