What is a Microsoft Data Scientist?

One of the hottest IT terms for a work definition being bandied around currently is that of Data Scientist. A somewhat new workplace title that takes in a lot of technical abilities. A nice succinct definition of Data Scientist is "a person that analyzes digital data to assist in making business decisions". This definition has evolved from a Data Analyst or Business Intelligence(BI) Analyst due to applying technologies like Machine Learning, Big Data and statistics. But what does it take to become a Microsoft Data Scientist (MDS)? This post looks at the differing technologies and tools that define the position MDS and gives you links to a couple of free courses to get you started as a MDS.

Core Fundamentals for a MDS include querying relational data with SQL, analyzing and presenting data using Excel and/or Power BI (specializing in at least one) and a basic understanding of statistics. If you took statistics in college, this is the level we are talking about.

Core Data Science skills include the ability to take that understanding of statistics and data and be able to manipulate it in a statistical scripting language to analyze the data. The two most popular scripting languages for this purpose are R and Python. The MDS should have a proficiency in one of these. A basic understanding of Machine Learning (ML) is a core skill. ML is basically the capability of computers to learn with explicit programming with the main uses of predictive analysis, identification of outliers (threats/malfunctions) and pattern recognition such as speech, text and facial. Microsoft tooling around ML is Azure ML Studio, which allows you build ML experiments with supporting datasets, integration with R/Python/SQL and turn your ML models into Web Services.

Applying your Data Science skills requires a proficiency in R or Python. It also takes the capability of building complex Azure ML Experiments in Azure ML Studio which gives you a process design surface to integrate data, the scripting languages referred to earlier and many different statistical algorithms to create sophisticated ML models. Azure ML also gives you the elastic scaling power that the ML models require. Other areas of specialization in applying Data Science with Microsoft tooling are Predictive Models with Spark in Azure HDInsight and developing Intelligent Applications including use of the Microsoft Cognitive APIs.

If you want to get started with the Microsoft tooling around Data Science here are a couple of free EDX courses for you to get started. There will be a complete Microsoft Certified Data Scientist Curriculum in the future.

DAT204x: Introduction to R for Data Science

DAT203.1x: Data Science Essentials

Comments are closed