By Dr. Kandis Y. Wyatt, PMP
Faculty Member, Transportation and Logistics
But what do data scientists do? According to Masters in Data Science, “Data scientists are big data wranglers, gathering and analyzing large sets of structured and unstructured data. A data scientist’s role combines computer science, statistics, and mathematics.”
Masters in Data Science also notes that data scientists “analyze, process, and model big data then interpret the results to create actionable plans for companies and other organizations. Data scientists are analytical experts who utilize their skills in both technology and social science to find trends and manage data. They use industry knowledge, and contextual understanding to uncover solutions to business challenges.”
The Type of Work Data Scientists Perform
A data scientist’s work typically involves making sense of messy, unstructured big data. This data is often gathered from sources such as smartphones, social media feeds, tablets, smart appliances, and emails.
A data scientist also analyzes big data, determines the difference between valid versus invalid data, and extracts important insights from it. The data scientist can also use algorithms (i.e. mathematical formulas) to analyze and create predictions about the future. For instance, data scientists can analyze big data to identify trends and improve mobile app and website user experiences.
As the amount of big data exponentially increases, technical experts are needed to parse, analyze and store the data. These individuals should also be able to extract key data and apply it to organizational principles, strategic planning, and company mission and vision.
What Education and Skills Do Data Scientists Need?
Most of the early data scientists acquired the position by happenstance, meaning there was a company need that they filled at the time with no formal training. However, the growth of big data and recognition of its organizational value has driven the need for uniformity of big data processing. These changes have led to the creation of formal data scientist positions within organizations, as well as the need for uniform instruction and formal education to prepare the next generation of data scientists.
As more colleges and universities offer degrees in data science, the curriculum includes a mix of math, statistics, computer programming and business analytics. In addition to undergraduate, graduate and doctoral degrees, there are also certifications in data science that are applicable to individuals who already have a degree in a similar degree field.
In addition to formal education, data scientists need both hard and soft skills. For instance, a data scientist must have hard skills such as analytical skills, statistics, quantitative reasoning, computer programming and data visualization. Soft skills that are also needed by a data scientist include oral/written communication and presentation skills.
There Is a Need for Diversity among Data Scientists
One of the biggest challenges for data scientists is making sure the data is accurate and free of inherent biases. For example, many of the earliest data scientists were all male. When engineers were working to determine initial air bag specifications for vehicle manufacturers, they applied data for the average height and weight of a male in the driver’s seat and passenger seat.
As a result, early airbag deployment data indicated the level of protection waned if a female was in the driver’s seat and/or a child was in the passenger seat. In some cases, women and children were injured when the airbags deployed.
This problem was caused by an inherent bias of the algorithm based on the data scientists’ preconceived notions of the “standard” height and weight of a driver and passenger. To avoid such cases of inherent bias, it’s important to have more diversity among data scientists.
Data Scientists Benefit Society
Ultimately, the insights that data scientists collect can benefit society. From health to education to even the Olympics, big data and data scientists can provide organizations with the right information to make data-driven decisions.