# Econometrics Tools¶

## OLS¶

To estimate an OLS regression, you pass the `reg()`

function at least three arguments

- The DataFrame that contains the data.
- The name of the dependent variable as a string.
- The name(s) of the independent variable(s) as a string (for one variable) or as a list.

Following these arguments, there are a number of keyword arguments for various
other options. For example, the following code estimates a basic wage
regression with state-level clustering and fixed effects, weighting by the
variable `sample_wt`

.

```
import pandas as pd
import econtools.metrics as mt
# Load a data file with columns 'ln_wage', 'educ', and 'state'
df = pd.read_csv('my_data.csv')
y = 'wage'
X = ['educ', 'age', 'male']
fe_var = 'state'
cluster_var = 'state'
weights_var = 'sample_wt'
results = mt.reg(
df, # DataFrame
y, # Dependent var (string)
X, # Independent var(s) (string or list of strings)
fe_name=fe_var, # Fixed-effects/absorb var (string)
cluster=cluster_var # Cluster var (string)
awt_name=weights_var # Sample weights
)
```

Note that `reg()`

does *not* automatically estimate a
constant term. In order to have a constant/intercept in your model, you can (a)
add a column of ones to your DataFrame, or (b) use the `addcons`

keyword arg:

```
results = mt.reg(
df,
y,
X, # does not include a constant/intercept
addcons=True # Adds a constant term
)
```

## Instrumental Variables¶

Estimating an instrumental variables model is very similar, but is done using
the `ivreg()`

function. The order of arguments is
also slightly different in order to differentiate between the instruments,
endogenous regressors, and exogenous regressors. Other keyword options, such as
`addcons`

, `cluster`

, and so forth, are exactly the same as with
`reg()`

.

One additional keyword argument is method, which sets the IV method used to
estimate the model. Currently supported values are `'2sls'`

(the default) and
`'liml'`

.

```
# <Imports and loading data>
y = 'wage' # Dependent var
X = ['educ'] # Endogenous regressor(s)
Z = ['treatment'] # Instrumental variable(s)
W = [ 'age', 'male'] # Exogenous regressor(s)
results = mt.ivreg(df, y, X, Z, W)
```

## Returned Results¶

The regression functions `reg()`

and
`ivreg()`

return a custom
`Results`

object that contains beta
estimates, variance-covariance matrix, and other relevant info.

The easiest way to see regression results is the `summary`

attribute. But
direct access to estimates is also possible.

```
import pandas as pd
import econtools.metrics as mt
df = pd.read_stata('some_data.dta')
results = mt.reg(df, 'ln_wage', ['educ', 'age'], addcons=True)
# Print a nice summary of the regression results (a string)
print(results)
# Print DataFrame w/ betas, se's, t-stats, etc.
print(results.summary)
# Print only betas
print(results.beta)
# Print std. err. for `educ` coefficient
print(results.se['educ'])
# Print full variance-covariance matrix
print(results.vce)
```

The full list of attributes is listed `here`

.

### F tests¶

`econtools.metrics`

contains two functions for conducting F tests.

The first, `Ftest()`

, is for simple,
Stata-like tests for joint significance or equality. It is a method on the
`Results`

object.

```
results = mt.reg(df, 'ln_wage', ['educ', 'age'], addcons=True)
# Test for joint significance
F1, pF1 = results.Ftest(['educ', 'age'])
# Test for equality
F2, pF2 = results.Ftest(['educ', 'age'], equal=True)
```

The second, `f_test()`

, is for F tests of arbitrary
linear combinations of coefficients. The tests are defined by an `R`

matrix and an `r`

vector such that the null hypothesis is \(R\beta = r\).

## Other Estimation Options¶

### Save memory by not computing predicted values¶

The `save_mem`

flag can be used to reduce the memory footprint of the
`Results`

object by not saving predicted
values for the dependent variable (`yhat`

) and the residuals (`resid`

), as
well as the sample flag (`sample`

). Since these vectors are always size N (or
bigger for `sample`

), setting `save_mem=True`

can be very useful when
running many regressions on large samples.

### Check for colinear columns¶

The `check_colinear`

flag can be used to check whether the list of regressors
contains any colinear variables. More technically, when `check_colinear`

is
`True`

, the regression function checks whether the regressor matrix X is full
rank. If it is not full rank, it figures out which columns are colinear and
prints the names of those columns to screen. It *does not* automatically drop
colinear columns.

Because these checks can be computationally expensive, `check_colinear`

defaults to `False`

.

## Spatial HAC (Conley errors)¶

Spatial HAC standard errors (as in
Conley (1999),
Kelejian and Prucha (2007),
etc.) can be calculated by passing a dictionary with the relevant fields to the
`shac`

keyword:

```
shac_params = {
'x': 'longitude', # Column in `df`
'y': 'latitude', # Column in `df`
'kern': 'unif', # Kernel name
'band': 2, # Kernel bandwidth
}
df = pd.read_stata('reg_data.dta')
results = mt.reg(df, 'lnp', ['sqft', 'rooms'],
fe_name='state',
shac=shac_params)
```

Important

The `band`

parameter is assumed to be in the same units as `x`

and
`y`

. If `x`

and `y`

are degrees latitude/longitude, `band`

should
also be in degrees. `econtools`

does not do any advanced geographic
distance calculations here, just simple Euclidean distance.