Intro to Data Science

In this post , we will talk about what is Data Science and who is a Data Scientist.

Data Science is the art of deriving meaningful conclusions out of available data.It is a collective term for a set of approaches or methods applied from a variety of fields like mathematics, statistics, computer science, and software development.

So why do companies need to apply these diverse methods ? In order to derive actionable intelligence from existing data, so that companies can make better data driven decisions leading to profitability.These conclusions may be helpful to improve a company’s business.Data Science is also sometimes referred to as Data driven science.

Usually companies have a collection of organized and unorganized data. This raw data needs to be cleaned and aligned to generate meaningful data.When faced with a business or scientific problem, companies can extract knowledge from this meaningful data.Data science is an upcoming field which has great uncertainty, rapid changes and exciting opportunities.

Hence data science can be defined as

1.The science of studying business or scientific data

2.An integration of statistics, computing technology, and domain knowledge, as shown in the venn diagram below.

Source: Towards data science

There are two types of data:

Real data:The data which represents things in the real world like a company name, address, pin code etc.

Virtual data:The data which does not represent things in the real world, like a software virus, which are not actual viruses, but they are a sort of malicious code developed by hackers.

We are in the midst of a deluge of both real data and virtual data.

Whether it is in the commercial world, social media, internet, sciences, healthcare or government, every sector is inundated with data.

Data Science is a hot topic in almost every industry.Knowledge is found within all this information. Companies, governments and institutions are recognizing that their future success and survival is increasingly dependent on their ability to transform their data into information, insights and novel data-products.

The industry is actively trying to recruit the professionals to fill this skills shortage.

Hence Harvard termed Data Scientist as the sexiest job of 20th century.

Who is a data scientist?

A data scientist is a person who should be able to leverage existing data sources in order to extract meaningful information and actionable insights.A data scientist must have
solid computer programming skills.A data scientist must have a solid understanding of mathemetics,
statistics and analytics methodologies.Usually, Data Science and “big data” go hand-in-hand.
A data scientist must have excellent understanding of the functional domain in which they operate.
He or she must be able to tell a good story with data that they have.A data scientist is someone “who is better at statistics than any software engineer and better at software engineering than any statistician.”

Data Scientists are in huge demand in the job market.The demand for a data scientist is much more than the supply of data scientists across different job markets. The sexiest job of the century requires a broad range of skills.So what skills are required to be a data scientist ?

Skills required to become a data scientist

We can divide the skills required to become a data scientist into four major sections.

1.Math and Statistics – Machine learning, statistical modeling,Bayesian Inference, Supervised learning,decision trees,random forests,logistic regression, Unsupervised learning, clustering, dimensionality reduction, optimization: gradient descent and variants.

2.Programming and Database – Computer science fundamentals, Scripting language like Python,Statistical computing package like R, Databases like SQL, NoSQL, Relational algebra, Parallel databases and parallel query processing, MapReduce concepts, Hadoop and Hive/Pig.

3.Domain knowledge and soft skills – Passionate about business, Curious about data, Ability to influence without authority, Hacker mindset, Problem Solver, Strategic, proactive, creative, innovative and collaborative.

4.Communication and visualization – Able to engage with senior management,Story telling skills,translate data driven insights into decisions and actions, visual art design, R packages like ggplot or lattice, knowledge of any of visualization tools like Flare, D3.js, Tableau

Applications of data science

Business Analytics and Prediction:Gathering information about how a company performed across several decades can throw insights into the working of the business and help to develop a model to gauge future performance.

Security:Information gathered from client logs are used to differentiate abnormal activities.Banks and other financial institutions use data mining and machine learning approaches to proactively stop these fraudulent activities.

Healthcare:Data science methods are used to better understand the pattern of occurrences of diseases like cancer, analyzing information from X-Ray pictures of effected individuals.

Government:Data Science can be used by Government to reduce wastage, battle cyber attacks, and shield sensitive data.

 

Recent Posts

Menu