notebook This is my personal notebook ^_^

R notes

This is my personal note for learning R. A lot of contents come from online resources, such as Coursera and Datacamp.

Basic R, R markdown, and git

Basic R

If you know the basic programming skills, some cheat sheets will be quite useful for you. Here’s what I found:

R markdown

Rmarkdown is very useful tool for putting notes together. More references can be found at Rstudio rmarkdown reference and the official cheat sheet.

Github also has very useful markdown tutorial.

git

Create a new repository on the command line:

echo "# notes" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin https://github.com/xzenggit/notes.git
git push -u origin master

or push an existing repository from the command line

git remote add origin https://github.com/xzenggit/notes.git
git push -u origin master

Github’s guide for using git can be found here.

Some guides including github pages and other interesting stuff can also be found here.

The dplyr package provides very powerful functions for data manipulation:

  • filter() and slice()

  • arrange()

  • select() and rename()

  • distinct()

  • mutate() and transmute()

  • summarise()

  • sample_n() and sample_farc()

Here is a good introduction about dplyr.

For data.table, Datacamp has a good tutorial and cheat-sheet, which are pretty straight.

The basic principle of exploratory data analysis is to better understand the data. So you can use whatever tools you like to analyze the data, and get preliminary understanding of the data structure and distribution etc.

Typlical, people plot distribution of interesting variables, or cluster difference variables, depending on the problem.

ggplot is a great graphic tool to explore the data. Please see:

Regression and statistical inference

Regression and statistical inference are more complicated topics. Here are some resources:

Machine learing and big data

Machine learning and big data are really the future of data science. There are tons of tutorials and courses about this. Here’s what I find:

github with multiple accounts

I like github. Really.

Sometimes, I need to use two different accounts, and I don’t want the two accounts related. Then, I found this post, which is quite useful and clearly on setting up two accounts for github.

It helps a lot to me.

Notes for setting up github pages

The following is how I setup the github papges.

Basically, I followed the steps given at github-pages. For the user papges, you need to create a repo as username.githup.io. For the project pages, you need to create a branch called gh-pages under you project repo. The procedures are quite clear at its help pages.

There are some templates you can download at http://jekyllthemes.org. After downloading your favorite template, you can uncompress it under your local repo directory (for user pages, it’s your master branch; and for project pages, it’s your gh-pages branch).

Then, you need to change the the line baseurl: in the _config.yml file to baseurl: http:/your-username.github.io. Substitute your-username with your own github username. Or add baseurl:\repo-name for project papges. You can also change other things as you like.

As last, you can do git add . and git commit -m "new web", and git push to your github repo. Go to your-username.github.io(for user papges) or your-username.github.io/project-name (for project pages), you can find your own website now! The greatest thing is it’s free!

How to make your first post? Under the _posts directory, create a file named like 2015-08-21-post-title.md if you use markdown. Then put whatever you want in it following the markdown format, and push it back to your repo. You’ll get your first post.

Checkout the jekyll docs. They have a pretty good instruction on how-to.