Filling the data scientist gap, part 3: Challenges and solutions for the road ahead

By Seth DeLand

Product Marketing Manager

MathWorks

May 01, 2018

Story

Filling the data scientist gap, part 3: Challenges and solutions for the road ahead

As organizations begin to put data analytics tools in the hands of their domain experts, challenges can arise, including showing the value of data analytics to those who are skeptical.

Enabling domain experts to perform data science has obvious benefits to the business. However, this is not something that happens overnight. As organizations begin to put data analytics tools in the hands of their domain experts, challenges can arise, including showing the value of data analytics to those who are skeptical. Being ready to address these challenges will keep projects moving forward, and keep the critics at bay.

Learning curve for new technology

Challenge: The pace of innovation in the data analytics space is very fast, and each new piece of technology has its own learning curve. In many cases, the original technology is developed by computer scientists, with the intended audience also being someone with very strong programming skills. These software packages are implemented in many different programming languages, so the learning curve is very steep for those who do not write code full-time.

Solution: Engineers with domain knowledge should look for tools that enable them to get up and running quickly, preferably within computational platforms that they’re already familiar with. Point-and-click apps like those found in MATLAB can serve as an easy starting point for these engineers. Beyond that, a programmatic interface is typically required to fine-tune analytics to improve robustness and accuracy. If businesses are serious about data analytics, they should also look for training courses that can help engineers ramp up much faster than learning from trial and error.

The large amount of cutting-edge research in the data analytics space creates a wave of new technologies that have the potential to disrupt. However, in the wake of that wave, successful tools arise that are general enough for engineers with domain knowledge to use.

Engineer or data scientist: Who does what?

Challenge: Organizations are trying to determine “who is the right team to do this work?” While data scientists often have strong backgrounds in machine learning, they are often new to or unfamiliar with the ins and outs of the business and its products. Engineering and science groups have knowledge of the business and its products, but may not be experienced with machine learning.

Solution: A common compromise is to pair up engineers who have domain knowledge with data scientists to leverage each of their strengths, but this may not be possible in many cases because there are far more domain experts than data scientists. Another solution is to adopt tools that simultaneously lower the bar for machine learning (for the domain experts) and provide flexibility and extensibility (for the data scientists). In practice, this means adopting a tool that has both a graphical interface (i.e. apps) and a programming language.

Even as data science groups grow within organizations, the data science work will continue to be done by both engineers with domain knowledge and data scientists. Both will play an important role in the successful adoption of data analytics by the business, so creating an environment where they can collaborate is key.

Where does an analytic end up?

Challenge: A successfully developed analytic or machine learning model has limited value to the business if it cannot be integrated with the business’s systems, products, and services. This could mean integrating the analytic with servers maintained by the IT organization, or deploying the analytic to embedded devices (such as edge nodes in an Internet of Things system).

Traditionally, the analytic is developed in a tool that’s suitable for research and development, but not for running the analytic in production, so the analytic must be recoded into a different programming language before it can be deployed. This process typically takes several weeks to months, and can introduce bugs.

Solution: Platforms for developing analytics offer ways to package the algorithm to run in different production environments. Look for a tool that provides integration paths and application servers for use with common IT systems, and also targets embedded devices.  For example, MATLAB provides deployment paths for integrating analytics with programming languages commonly used in IT systems (e.g., Java and .NET), as well as converting analytics to standalone C-code that can be run on embedded devices. Both of these deployment options are accessed through point-and-click interfaces, making them appealing for engineers with domain knowledge. By automating the process of converting the analytic to run in production systems, these tools significantly reduce the time for design iterations.

Technologies that enable domain experts to apply machine learning and other data analytics techniques to their work are here to stay. They provide exciting opportunities for engineering teams to innovate – in both their design workflows and the products they create. It does not appear that the shortage of data scientists will be addressed anytime soon. Domain experts will play a crucial role in filling this gap. Their knowledge of the business and the products it produces positions them well to find innovative ways to apply data analytics technologies.

Seth DeLand is application manager at MathWorks for data analytics. Before that, he was product manager for optimization products. Prior to MathWorks, Seth earned his BS and MS in mechanical engineering from Michigan Technological University.

The MathWorks, Inc.

www.mathworks.com

@MATLAB

LinkedIn: www.linkedin.com/company/the-mathworks_2

Facebook: www.facebook.com/MATLAB

Google+: www.facebook.com/MATLAB

 

 

Product Marketing: Developed presentations, demos, and web content for MATLAB Optimization Products. Also created several videos and webinars for awareness creation and demand generation.

More from Seth