notebook This is my personal notebook ^_^

pyviz notes

Solution for “JavaScript output is disabled in JupyterLab”:

  • conda install nodejs
  • jupyter labextension install jupyterlab_bokeh

NetCDF notes

  • Extract variable “SST” from in.nc $ ncks -v SST in.nc out.nc
  • Delete variable “lev” from in.nc $ ncks -C -O -x -v lev in.nc out.nc
  • Delete dimension “lev” from in.nc $ ncwa -a lev in.nc out.nc
  • Repeack the out.nc after averaging-out the level dimension with “ncwa” $ ncpdq in.nc out.nc

Nonlinear Autoregressive Network in Matlab

% Solve an Autoregression Time-Series Problem with a NAR Neural Network
% Script generated by Neural Time Series app
% Created Mon Nov 13 17:19:59 EST 2017
%
% This script assumes this variable is defined:
%
%   simplenarTargets - feedback time series.

T = simplenarTargets;

% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. NTSTOOL falls back to this in low memory situations.
trainFcn = 'trainlm';  % Levenberg-Marquardt

% Create a Nonlinear Autoregressive Network
feedbackDelays = 1:2;
hiddenLayerSize = 10;
net = narnet(feedbackDelays,hiddenLayerSize,'open',trainFcn);

% Choose Feedback Pre/Post-Processing Functions
% Settings for feedback input are automatically applied to feedback output
% For a list of all processing functions type: help nnprocess
net.input.processFcns = {'removeconstantrows','mapminmax'};

% Prepare the Data for Training and Simulation
% The function PREPARETS prepares timeseries data for a particular network,
% shifting time by the minimum amount to fill input states and layer states.
% Using PREPARETS allows you to keep your original time series data unchanged, while
% easily customizing it for networks with differing numbers of delays, with
% open loop or closed loop feedback modes.
[x,xi,ai,t] = preparets(net,{},{},T);

% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideFcn = 'dividerand';  % Divide data randomly
net.divideMode = 'time';  % Divide up every value
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;


% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse';  % Mean squared error

% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','plotresponse', ...
  'ploterrcorr', 'plotinerrcorr'};


% Train the Network
[net,tr] = train(net,x,t,xi,ai);

% Test the Network
y = net(x,xi,ai);
e = gsubtract(t,y);
performance = perform(net,t,y)

% Recalculate Training, Validation and Test Performance
trainTargets = gmultiply(t,tr.trainMask);
valTargets = gmultiply(t,tr.valMask);
testTargets = gmultiply(t,tr.testMask);
trainPerformance = perform(net,trainTargets,y)
valPerformance = perform(net,valTargets,y)
testPerformance = perform(net,testTargets,y)

% View the Network
view(net)

% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotresponse(t,y)
%figure, ploterrcorr(e)
%figure, plotinerrcorr(x,e)

% Closed Loop Network
% Use this network to do multi-step prediction.
% The function CLOSELOOP replaces the feedback input with a direct
% connection from the outout layer.
netc = closeloop(net);
%[xc,xic,aic,tc] = preparets(netc,{},{},T);
testa = T(end-10:end);
inputa = testa;
inputa(3:end) = num2cell(ones(1,length(inputa(3:end)))*nan);
[xc,xic,aic,tc] = preparets(netc,{},{},inputa);
yc = netc(xc,xic,aic);
perfc = perform(net,tc,yc);
% Multi-step Prediction
% Sometimes it is useful to simulate a network in open-loop form for as
% long as there is known data T, and then switch to closed-loop to perform
% multistep prediction. Here The open-loop network is simulated on the known
% output series, then the network and its final delay states are converted
% to closed-loop form to produce predictions for 5 more timesteps.
inputb=T(end-10:end-9);
[x1,xio,aio,t] = preparets(net,{},{},inputb);
[y1,xfo,afo] = net(x1,xio,aio);
[netc,xic,aic] = closeloop(net,xfo,afo);
[y2,xfc,afc] = netc(cell(0,9),xic,aic);
% Further predictions can be made by continuing simulation starting with
% the final input and layer delay states, xfc and afc.

---------------------------------------------
%%Personal Notes%%
% Eessentially, the above two procedures are the same. Training open-loop networks, 
% then use closed-loop networks to do the mutli-step prediction. 
% The only difference is: for first one, we need to fill NaNs in order to do predition,
% while no need to do that for the second one. For example, for the first method, we 
% need to use something like [.1 .2 NaN NaN NaN] as input in order to predicit the last 
% three values. For the second method, we only need to use [.1 .2] as input and use cell(0, 3)
% as the input of netc.


---------------------------------------------


% Step-Ahead Prediction Network
% For some applications it helps to get the prediction a timestep early.
% The original network returns predicted y(t+1) at the same time it is given y(t+1).
% For some applications such as decision making, it would help to have predicted
% y(t+1) once y(t) is available, but before the actual y(t+1) occurs.
% The network can be made to return its output a timestep early by removing one delay
% so that its minimal tap delay is now 0 instead of 1.  The new network returns the
% same outputs as the original network, but outputs are shifted left one timestep.
nets = removedelay(net);
[xs,xis,ais,ts] = preparets(nets,{},{},T);
ys = nets(xs,xis,ais);
stepAheadPerformance = perform(net,ts,ys)

Notes for the Elements of Statistical Learning

Notes for the Elements of Statistical Learning

Ch2. Overview of supervised learning

Two simple approaches to prediction: least sqaures and nearest neighbors. The linear model makes huge assumptions about structure and yields stable but possibly inaccurate predicitons. The k-nearest neighbors makes very mild structural assumptions: its predicitons are often accurate but can be unstable.

A large subset of the most popular techinques in use today are variants of these two simple procedures.

  • Kernel methods use weights that decrease smoothly to zero with distance from the target point, rather than the effective 0/1 weights used by k-nearest neighbors.
  • In high-dimensional spaces the distance kernels are modified to emphasize some variable more than others.
  • Local regression fits linear models by locally weighted least squares, rather than fitting constants locally.
  • Linear models fit to a basis expansion of the original inputs allow arbitrarily complex models.
  • Projection pursuit and neural network models consist of summs of nonlinearly transformed linear models.

The errors of model fitting could be decomposed into two parts: variance and squared bias. Therefore, we will have this kind of bias-variance tradeoff.

More generally, as the model complexity increases, the variance tends to increase and the bias tends to decrease, and verse versa.

Ch3. Linear Methods for Regression

To test the hypothesis that a particular coefficient is zero, we can calculate the z-score. A large (absolute) value of z-score will lead to rejection of this hypothesis. The z-score measures the effect of dropping certain variable.

If we need to test for the significant of groups of coefficients simultaneously. We can use the F statistic.

The Gauss-Markov theorem implies that the least sqaures estimator has the smallest mean sqaured error of all linear estimators with no bias.

Best-subset selection finds for each k the subset of size k that gives smallest residual sum of squares.

Rather than search through all possible subsets, we can seek a good path through them. Forward-stepwise selection starts with the intercept, and then sequentially adds into the model the predictor that most improves the fit. It is a greedy algorithm.

Backward-stepwise selection starts with the full model, and sequentially deletes the predictor that has the least impact on the fit. The candidate for dropping is the variable with the smallest Z-score.

Shrinkage Methods: Ridge (L2 norm), Lasso (L1 norm), Elastic Net (combine both). Ridge regression may be preferred because it shrinks smoothly, rather than in discrete steps. Lasso falls somewhere between ridge regression and best subset regression, and enjoys some of the properties of each.

Ch4. Linear Methods for Classification

Linear Discriminant Analysis (LDA) approaches the classification problems by assuming that the conditional probability desnity functions are both normally distributed with mean and covariance parameters, respectively. Under this assumption, the Bayes optimial solution is to predict points as being from the 2nd class if the log of the likelihood ratios is below some threshold T.

Logistic regression does not assume any specific shapes of densities in the space of predictor variables, but LDA does. Logistic regression is based on maximum likelihood estimation. LDA is based on least squares estimation.

It is generally felt that logistic regression is a safer, more robust bet than the LDA model, relying on fewer assumptions.

Notes for Jupyter Notebook