Why Data Science?
Data Science is an interdisciplinary field composed of Computer Science, Math and Statistics and Domain knowledge that seeks to derive insights from data. It is basically the intersection of data engineering and the scientific method. It uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.
The goal of Data Science is to transform data into knowledge that can be used to make rational decisions so that we can take actions that help us to achieve our goals.
The Data Science Process
The general Data Science process works like this;
First, we find a question that we want to answer. It can be a Hypothesis we want to test, a decision we want to make or something we want to attempt to predict.
Second, we collect data for analysis. Sometimes this means designing an experiment to create a new data or using data that already exists.
Third, we prepare the data for analysis, a process often referred to as data munging or data wrangling. We clean and transform our data into suitable form for analysis.
We then create a model from our data i.e. Numerical model, Visual model, Statistical model or Machine Learning model. We then evaluate the model and finally deploy our model into production. We then repeat this process for each problem in our backlog.
General Skills in Data Science:
Data science requires statistics, computer science and math knowledge as general skills - no surprise here. Analysis and machine learning are also at the heart of data scientist jobs. Getting insights from data is a primary function of data science. Machine learning is all about creating systems to predict performance and it is very in demand.
AI and deep learning are subsets of machine learning. Deep learning is being used for more and more of the machine learning tasks that other algorithms were used for previously. I expect deep learning skills will be sought more explicitly in the future and that machine learning will become more synonymous with deep learning.
Brief look at the most common tech skills in Data Science:
Top-Rated Programming tools and Software;
- Python is the most in-demand language. The popularity of this open-source language has been widely observed. It’s beginner friendly, with many support resources. The vast majority of new data science tools are compatible with it. Python is the primary language for data scientists.
- R is not far behind Python. It once was the primary language for data science. The roots of this open source language are in statistics, and it’s still very popular with statisticians. Python or R is a must for virtually every data scientist position.
- SQL is also in high demand. SQL stands for Structured Query Language and is the primary way to interact with relational databases. SQL is sometimes overlooked in the data science world, but it’s a skill worth demonstrating mastery of if you’re planning to hit the job market.
- Excel – Data analysis using excel is one of the powerful tools to analyse data. It is not arguable to say that it is the most popular tool for data analysis with its built-in pivot tables. Another reason why it is popular is that you don’t have to spend long periods of training to learn Excel to deliver simple data analysis.
- Hadoop and Spark – Up next are Hadoop and Spark, both open source tools from Apache for big data. Apache Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to data sets.
- Tableau is next in demand. This analytics platform and visualization tool is powerful, easy to use, and growing in popularity. It has a free public version, but will cost you money if you want to keep your data private.
In addition to these, if you learn to use tools like Rapid Miner, Tableau, etc getting a job is easier.
There is a huge demand of skilled professionals in Data Science. Job profiles such as Data Scientist, Data Analyst, Big Data Engineer, Statistician are being largely hunted by companies. Not only they are they handsomely paid, but a career in analytics has much more to promise
You need to regularly update your knowledge by reading contents online and reading relevant books on trends in data science. Don’t be overwhelmed by the sheer amount of data that is flying around the internet, you have to be able to know how to make sense of it all. Curiosity is one of the skills you need to succeed as a data scientist.
Essentially, you will be collaborating with your team members to develop use cases in order to know the business goals and data that will be required to solve problems. You will need to know the right approach to address the use cases, the data that is needed to solve the problem and how to translate and present the result into what can easily be understood by everyone involved.
I’m sure there are items I have missed, so if there’s a crucial skill or resource you think would be helpful to any data science hopefuls, feel free to share it in the comments below!