1
Introduction
1.1
Prerequisites
1.2
Other books
1.3
Programming language
2
Foundations
2.1
Programming
2.1.1
Data types
2.1.2
Databases
2.1.3
Control structures
2.2
Linear algebra
2.2.1
Vector geometry
2.2.2
Matrix algebra
2.2.3
Calculus and Eigenanalysis
2.3
Probability
2.3.1
Random variables
2.3.2
Joint distributions
2.3.3
Group-by
2.4
Statistics
2.4.1
Simulation studies
2.4.2
Sample mean distribution
2.4.3
Central limit theorem
2.4.4
Sample selection bias
2.4.5
Maximum Likelihood
2.5
Algorithms
2.5.1
Measuring complexity
2.5.2
Paradigms
2.5.3
Practical computing
2.6
Optimization
2.6.1
Basic strategies
2.6.2
Iterative procedures
2.6.3
Linear and quadratic programming
2.6.4
Nelder Mead
3
Linear models
3.1
Derivation and foundations
3.1.1
Derivation
3.1.2
Conditional expectation and predictions
3.1.3
Matrix algebra representation
3.1.4
Special models
3.2
Model selection and endogeneity
3.2.1
Maximum likelihood
3.2.2
Model selection criteria
3.2.3
Step-wise regression
3.2.4
A note on hypothesis testing
3.2.5
Endogeneity definition
3.2.6
Sources of endogeneity
3.3
Hypothesis testing
3.3.1
t-Testing
3.3.2
Anova
3.3.3
Heteroskedasticity
3.3.4
White correction
3.3.5
White test
3.3.6
General least squares
3.3.7
Weighted least squares
3.4
Missing data
3.4.1
Missing at random
3.4.2
Dropping data points
3.4.3
Dropping variables
3.4.4
Imputation
3.4.5
Distrbutional modeling
3.4.6
EM Algorithms
4
Limited Dependent Variables
4.1
Statistical theory
4.1.1
Linear probability and logistic models
4.1.2
Maximum likelihood estimation
4.1.3
Marginal effects
4.1.4
Latent variable interpretation and probit regression
4.1.5
Classification and decision theory
4.1.6
Generalized linear model
4.1.7
Poisson regression
4.1.8
Gamma regression
4.1.9
Multinomial logistic regression
4.1.10
Censored and truncated data
4.2
Coding guide
4.2.1
Linear probability model
4.2.2
Logistic regression
4.2.3
Margins and prediction
4.2.4
Probit regression
4.2.5
Poisson regression and others
4.2.6
Multinomial logit
4.2.7
Tobit
4.2.8
Pooled logit and conditional logit
4.2.9
Markov chains
5
Time-Dependent Data
5.1
Missing data
5.2
Statistical theory
5.2.1
Stationarity
5.2.2
ARMA models
5.2.3
Persistance and nonstationarity
5.2.4
Integrated processes
5.2.5
Spurious correlations
5.2.6
Cointegration
5.2.7
Newey-west corrected standard errors
5.2.8
VAR models
5.3
Coding guide
5.3.1
Preparing the data
5.3.2
Stationarity testing
5.3.3
Time series plots
5.3.4
Regression with time series
5.3.5
ARIMA models
5.3.6
VAR models
5.4
Simulation studies
5.4.1
Helper functions
5.4.2
Spurious regression
5.4.3
Stationary regression
5.4.4
KPSS simulation
5.4.5
ARIMA coefficients
5.4.6
ARIMA model selection
5.5
Panel Data
5.5.1
Structure
5.5.2
Pooled OLS
5.5.3
Fixed effects (Within groups)
5.5.4
Between-group effect
5.5.5
Simpson’s paradox
5.5.6
Arellano correction
5.5.7
Large T panels
6
Multivariate Analysis
6.1
Statistical theory
6.1.1
Rotation interpretation
6.1.2
Eigenanalysis
6.1.3
Interpreting principal components
6.1.4
Choosing correct number of PCs
6.1.5
Common factor models
6.1.6
Factor-augmented regression
6.1.7
Common factor identification
6.1.8
Principal component regression
6.2
Coding guide
6.2.1
PCA estimation
6.2.2
Eigenvalues and screeplot
6.2.3
Eigenvalue ratio test
6.2.4
Common factor models
6.2.5
Time series factor models
6.2.6
Testing common factors
6.2.7
Panel data factor models
7
Machine Learning
7.1
Tree-based models
7.1.1
Binary splitting
7.1.2
Recursive binary splitting
7.1.3
Tree pruning
7.1.4
Regression trees
7.1.5
Classification trees
7.1.6
Comparison with linear models
7.1.7
GLM trees
7.2
Coding guide
7.2.1
Titanic data desparse
7.2.2
Ctree procedure
7.2.3
Trees for continuous variables
7.2.4
GLM trees
7.3
Clustering
7.3.1
Supervised vs unsupervised
7.3.2
Clustering goal
7.3.3
k-means algorithm
7.3.4
k-Means clustering
7.3.5
Within group sum of squares
7.3.6
Eigenvalue ratio based estimators
7.3.7
Hierarchical clustering algorithm
7.3.8
Algorithmic complexity
7.3.9
Dendrogram
7.3.10
Applying clustering
References
Applied Computational Statistics
6.1
Statistical theory
6.1.1
Rotation interpretation
6.1.2
Eigenanalysis
6.1.3
Interpreting principal components
6.1.4
Choosing correct number of PCs
6.1.5
Common factor models
6.1.6
Factor-augmented regression
6.1.7
Common factor identification
6.1.8
Principal component regression