Data Science Bookmarks

Bookmark of the links and references I have referred for Data Science development and learning.

Basic Statistics - My Video references:

Statistic Jargons:

  • Outlier - an outlier is an observation point that is distant from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set.

Pandas

  • Very importantuseful to play around with datasets/csv
  • Data frames and Series
  • Finding unique values
  • dataframe apply is a very useful function to perform row/column operations in a dataframe. Can check its other uses too.
  • Pandas Dataframe apply usage
  • Data Cleaning is an important application of pandas
  • Matplotlib is intended to be a plot library and pandas to be a a data analysis library. However, in some cases their functionality overlap.
  • Best Pandas functions
  • Understanding Described datarame

Numpy

  • random.rand(n) returns random float values. random.randint(n) returns random integer values
  • Range_Arange_Diff

Data Visualization

  • Even similar summary statistics can have different Visualizations - Francis Anscombe conducted an experiment.
  • Dataviz
  • Data Storytelling

Pandas Long and Wide data format

  • Wide format is mostly used in pandas when operatins are done on values in a column
  • Wide format is the data format for statistical modelling
  • .merge, .pivot, flatten, aggregate functions
  • [Understanding Described datarame]

Plotting

  • pyplot
  • [grouped-unstack-histogram] (http://themrmax.github.io/2015/11/13/grouped-histograms-for-categorical-data-in-pandas.html)

Machine learning

Linear Regression

Classification

Feature Selection

Gradient Descent

Clustering

Natural Language Processing

Written on March 28, 2017