Global AI and Data Science

 View Only

Who does what on a Data Science project?

By AYRTON DIDHIER MONDRAGON MEJIA posted Thu June 18, 2020 09:46 AM

  

My name is @AYRTON DIDHIER MONDRAGON MEJIA, I am a test engineer for enterprise Power Systems servers, and currently I’m working in some projects related to the Data Science field. I started to learn about Data Science almost 3 years ago, my firsts projects were school assignments. Two years ago, I received the invitation to work as volunteer with the DDP team, where we started learning new technologies like Cassandra, MongoDB, Scala and Django. All this, with the aim of building the architecture for the DDP project, that we described in the first entry of this blog. The most relevant part of the initial learning phase of the project for me, was that all of us started to work with real data as Data Analysts, one of the different roles present in Data Science, that we are going to talk about in this post.

Every Data Science project has different roles to play, it is like if you and your team were trying to organize a barbecue party. Probably, all of you know how to cook the meat, or where to buy it, but not all of you are the experts. There is somebody that is the expert cocking it, and there is one who knows where the best place is to buy it. The same happens on a Data Science project, probably all the team members knows how to do every step in the process, but not all of them are the experts in each part of it.

The different Data Science roles that you can find in almost every project are:

  • Data Engineer

  • Data Analyst

  • Data Scientist

  • Data Visualization Expert

  • Business Intelligence, Analytics & Reporting

  • Data Base Administrator/Data Base Architect

  • Data Science Platforms & Tools Developer

  • Modeling Analyst

  • Machine Learning Engineer

In this post, I’ll walk you thru the most known ones. The first role that I’m going to talk about is the Data Analyst, statistics and math are not a must have skills for this role, it is very good at retrieving and handling structured data, and creating visualizations, moderate programming/computer science skills are useful for him. The main responsibility for this role is to find trends on the data and show what he has founded. He is very good at SQL and some other structured data tools and also with some visualization dashboards.

The second role I am going to describe is the Data Engineer. A Data Engineer has the knowledge to build and maintain the infrastructure needed for the big data DB. It has a strong knowledge in distributed systems like Hadoop and Spark. The main focus for him is to develop, deploy, manage and optimize data pipelines to transform and transfer data for the Data Scientist for querying. He has a very good programming skills and NoSQL data bases knowledge. To have a better explanation about what a Data Engineer does, I have found this analogy: imagine this, you are trying to make a carrot cake, in order to do that you need carrots, so with that said, the Data Engineer is the responsible to harvest, clean and store the carrots, so you can have them ready to use them in your cake.

Another important role is the Data Science Platforms & Tools Developer, this role can be found as Software Engineer as well. The person on this role is responsible to create cloud solutions and jobs to run in Scala or Hadoop. The solutions been created for this role helps the Data Scientist and the Data Engineer to integrate the flow of the information. It creates packages and libraries that all other roles can use to facilitate their job.

Now, I’m going to talk about the Data Scientist role. This role has become one of the most popular job roles on these days, but what does a Data Scientist do? Well, a Data Scientist is the one who uses math and statistics, data manipulation, data preparation, data modeling, has experience with structured and no structured DB, and is very good doing visualization and communicating the findings on the data. We could think that since a Data Scientist knows the whole process, it could be think that is the most important role on a data science project, but this is not true, he is just one piece of the entire world of Data Science.

The last role I’m going to talk is the Data Visualization Expert role, the person responsible for this is very creative, it is the person in charge of presenting the data in a pictorial way. He is responsible to create all the graphics with the findings/predictions on the data, with the visualizations that it has created, it will communicate the results to a management or technical level as needed. It is very good working with tools like Tableau, Cognos, etc. that are dashboarding tools, it has a very good story telling skill.

I have given you a small example of some of the most known roles in a Data Science project, each one of those roles are an important key on every Data Science project, and as I mentioned before, the projects are always composed of an interdisciplinary team with different responsibilities and skills. If you want to get deeper on each role, you can click on each role title and you can find very interesting and useful information about them.

What specific skills are needed for each role?

Now, I’m going to show you some other needed skills on the Data Science field. In the diagram below you can see what you need in terms of skills and the role they are related to.

Skills needed for Data Science Roles

In DDP, we started working as Data Analysts, right now we are working on getting the other skills needed to be a Data Scientist or Data Engineer. We are working with all the data that we have processed before, cleaning it and processing it to create some predictive models and some useful visualizations to improve the decision making.

So, keep going and don’t miss our next post very soon.

Co-Authors
@SILVANA DE GYVES AVILA
@ISMAEL SOLIS MORENO
@GLORIA EVA ZAGAL DOMINGUEZ
@GONZALO SEBASTIAN AYALA MERCAPIDEZ

​​​​​
#GlobalAIandDataScience
#GlobalDataScience
0 comments
17 views

Permalink