Showing posts with label R programming. Show all posts
Showing posts with label R programming. Show all posts

20 July 2015

Series about decision tree


Dear Readers,

Good morning and Eid Mubarak to all moslem brothers and sisters. I will be writing this post in Indonesia language, along with some English quotation.

I met "Prana Ugi" a keen statistician from Medan (the capital of North Sumatera). He's working a lot with SPSS on various subjects. What can I say, he was graduated from math and statistics program. No wonder huh. Today we'll be covering on the decision tree subjects. R codes will be developed later based on this (or these) post (or posts).

This first draft would still be a QA session between Prana Ugi and I. We will develop the text shortly.

# First post 




The QA was:

Me:
Seandainya ada banyak kolom ttg riwayat penyakit, apakah bisa menghasilkan risiko rendah sedang tinggi. Jadi yg bilang risikonya sebesar apa, adalah data. Seperti itu sepertinya data yg saya kirim tempo hari.


Prana Ugi:
Pak Dasapta Erwin Irawan => dari data excel yang bapak kirim, Ugi bingung pak untuk menentukan, kolom (variabel) mana yang dependen dan independen. Dari data excel bapak dimulai dari kolom A (no), sampai kolom AE (lag1). Boleh diberi keterangan pak, kolom yang berupa dependen dan independen.

Me:
Menurut saya yang independen hanya koordinat. kolom yang lain akan saling dependen kepada kolom yang lain. Skenario saya justru bagaimana mencari yang terkuat.


# Second post



no comment


# Third post



Me:
Proses learningnya apakah menggunakan prinsip regresi?

Prana Ugi:
Pak Dasapta Erwin Irawan => Ada banyak algoritma untuk menginduksi (membuat) pohon keputusan (decision tree), seperti: CART (C&RT), ID3, C4.5, SLIQ, SPRINT, QUEST, DTREG, THAID, CHAID, dan sebagainya. Pada CART, Metode klasifikasi CART terdiri dari dua metode yaitu metode pohon regresi dan pohon klasifikasi. Jika variabel dependen yang dimiliki bertipe kategorik maka CART menghasilkan pohon klasifikasi (classification trees). Sedangkan jika variabel dependen yang dimiliki bertipe kontinu atau numerik maka CART menghasilkan pohon regresi (regression trees). Pada Gambar yang Ugi posting, variabel dependennya bersifat kategori.
Prana Ugi Classification and Regression Trees (CART) adalah salah satu metode atau algoritma dari teknik pohon keputusan. CART adalah suatu metode statistik nonparametrik yang dapat menggambarkan hubungan antara variabel respon (variabel dependen) dengan satu atau lebih variabel prediktor (variabel independen). Menurut Breiman dkk (1993), apabila variabel respon berbentuk kontinu maka metode yang digunakan adalah metode regresi pohon (regression trees), sedangkan apabila variabel respon memiliki skala kategorik maka metode yang digunakan adalah metode klasifikasi pohon (classification trees). Semoga bermanfaat pak, diskusinya.

# Forth post



Data Mining, Concepts, Models, and Techniques
By: Florin Gorunescu, Springer

no comment

# Fifth post


Prana Ugi:
Alhamdulillah, proses perhitungan manual Classification and Regression Tree (CART) dengan algoritma ID3 (iterative dichotomiser 3) dan C4.5 dengan kriteria entropy (impurity) selesai. Algoritma diimplementasikan ke Excel berhasil, sekarang pindah ke R, semoga ada package-nya, kalau nggak ada, buat sendiri. 

Me:
coba R package berikut:
- CARET (https://cran.r-project.org/web/packages/caret/index.html),
- RPART (https://cran.r-project.org/web/packages/rpart/index.html), dan
- TREE (https://cran.r-project.org/web/packages/tree/index.html)```
We'll keep updating this post.

28 May 2015

1st Circular: Indonesia R meet up


Karena ternyata sudah banyak yang "terungkap" sebagai Pengguna R (pada tahap beginners hingga advanced), sudah saatnya merancang acara R meet up. Contohnya seperti ini: http://r-users-group.meetup.com/.

Komunitas Indonesia R User akan menyelenggarakan Indonesia R Meet Up yang pertama, dengan tema R 4 All.

TORnya sebagai berikut:

  • Siapa saja yang boleh hadir: semua yang berminat dan join R user group.
  • Siapa saja yang boleh mengirimkan abstrak: pengguna R (tidak ada batasan kompetensi), dan harus join R user group.
  • Apa saja yang boleh dipresentasikan: semua topik asal menggunakan R.

Topik sementara ini hanya dibagi dua: 

  1. Natural sciences (termasuk kedokteran, kesehatan)
  2. Social sciences (termasuk ekonomi)

  • Bagaimana format abstrak: 200 kata, ada max 5 kata kunci, menggandung latar belakang, metode, hasil, kesimpulan, rekomendasi. Kode R disampaikan sebagai lampiran.
  • Dikirim ke mana: di post di Wall R User Group.

Masukan-masukan terhadap penyelenggaraan acara agar dapat ditulis di kolom komentar.

Terimakasih.


09 May 2015

Upcoming 3rd Mini Workshop Intro to R



After a long hiatus here's an update.

My tweets shouting the upcoming 
Intro to R Workshop.

A teaser in Prezi http://goo.gl/ZYIA3v

The 3rd Intro to R mini workshop, 25May, ITB Central Library
 
1) installation:
1.1 what and How to install, 
1.2 how to upgrade, 
1.3 basic syntax; 

2) data prep
2.1 data entry, 
2.2 formatting rows-cols, 
2.3 dealing with NAs; 

3) basic ops
3.1 data loading, 
3.2 data structures, 
3.3 missing data 

4) basic anl
3.1 descriptive anl, 
3.2 basic plots; 
3.3 pairs anl; 
3.4 regression anl 







17 March 2015

hydrogeological analysis using open source tools: case Cikapundung River

Dear friends,

The following slides (in Rmd) or pdf format are from my recent talk on Sarasehan Geologi Populer, which was held by Geological Survey of Indonesia. It covers various open source tools, with more focus on R, for geological and hydrogeological analysis. This talk tells some bits of my research on finding out interaction between groundwater and surface water interaction by analysing water quality pattern. I used R in this research. The slides contains some R codes example. The objective of this talk is to raise awareness of open source apps and how its contribution to reproducibility in science.

You can view and download:
@dasaptaerwin


17 December 2014

Twitter data mining: first try out





Dear friends,

This post was my first try in using R to generate word cloud from Twitter. Here we use: R for text mining processes with search keys "bencana" (hazards) and "Indonesia" on Twitter. The above two pictures show the result from two periods: 5-15 Dec 2014 (left pic) and 5-17 Dec 2014 (right pic). Different results demonstrate the dynamic of tweets over time.

Code originally written by: Miga M. Julian, available here. I will convert it to Rmd later. 

‪#‎rforall‬ #rstats

07 December 2014

Matrix aggregation (finalized post)

Dear friends,

This post had finally written and posted on this link Example of matrix aggregation in R. But unfortunately I only wrote in Indonesian language. Very sorry for that.

After a long hiatus, here's my new post (in Bahasa Indonesia, next will be translated to English) about matrix aggregation. This article is a collaboration between me and Ali Akbar Hakim (@osairisali) a fellow R user from Faculty of Economic Brawijaya University Indonesia. His major is Economic, and currently finishing his undergrad thesis in Input-Output Economic Analysis. The article is currently under final revision, but the following are a few snapshot of it.




29 October 2014

Some new papers in my repos: Hiatus in my post

Dear friends,

There will be a hiatus in my post since I have been busy writing some papers. Some of them are using R for the analysis. The full papers were uploaded in my repos: Academia.edu and ResearchGate.net.

All papers will be presented in the upcoming ICMNS 2014 (International Conference on Math and Natural Sciences) at Institut Teknologi Bandung, Indonesia. The title of the papers are:

  1. Spatial analysis of groundwater quality data using geoR and mgcv R-package   
  2. Groundwater and River Water Interaction at Ciromban and Cibeureum Riverbank, Tasikmalaya: Can We Solve Water Shortage?
  3. Groundwater and River Water Interaction on Cikapundung River: Revisited
  4. Revisiting hydrostratigraphy in Bandung-Soreang Groundwater Basin: a well-logs re-analysis
All papers mainly discuss about groundwater characterisation based on its hydrochemistry and multivariate analysis. I hope they can give you another example of using both methods to identify groundwater system in tropical areas. 

Cheers,
@dasaptaerwin

16 October 2014

Level plot and bubble plot try out

Dear friends,

I've posted example of level plot and bubble plot on water quality dataset on rpubs. The objective of these plots is to show comparison of element concentration in groundwater and in the river water. The river data plot is located in the lower right position of the figure, making an elbow-like shape.

The plots were made using the following base code:

plotNO3 <- data="data,<o:p" levelplot="" x="" y="">
          xlab="X coord", ylab="Y coord",
          main="Level plot NO3",
          col.regions=terrain.colors(100))

bubbleNO3 <- bubble="" data="" o:p="" zcol="NO3">
          xlab="X coord", ylab="Y coord",
          main="Bubble plot NO3",

          scales=list(tck=0.5))

In this example I used the following packages:
  • lattice        # for plotting
  • dplyr          # for data manipulation
  • gridExtra      # for multiple plot
  • sp             # for spatial analysis
I used gridExtra package to show multi-plots on a page, using the following code.

grid.arrange(bubbleSiO2bubbleFebubbleMgbubbleMnncol=2)
However, I haven't finished tweaking the code so it can show the figure in the correct size.
I invite you to drop a comment or two.

Cheers,
@dasaptaerwin


10 October 2014

R warnings on Mac

Dear friends,



This was my oldest warnings that I haven't blogged.
It was when I run R on my Mac for the first time, few years a go. 
The following warnings popped out when R was starting:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_PAPER failed, using "C"
The problem was something about the default encoding and language between your Mac and R that were not matched.

I found solution on stackoverflow by typing (or copy pasting) the following script:

defaults write org.R-project.R force.LANG en_US.UTF-8

in your Mac terminal.

Then the warnings disappeared when you start R.

@dasaptaerwin

08 April 2014

[R] How to compile R package (Hydromad) on Mavericks

[R] Hey R, may I introduce you to Mavericks :-)

I was noticing that R and Mavericks doesn't work-well together. It's because Mavericks keeps using its own new compiler from Xcode 5xx, that different from the standard R package compiler (pls CMIIW). Then install.package command won't work in this situation, including for Hydromad.

So for those of you that mistakenly have upgraded the OS to Mavericks, you have to introduce the right compiler for R package to Mavericks. So it will use it whenever you run install.package command. Thanks to StackOverflow Q&A.

First: You have to download the Xcode 5 from Appstore. It should be properly installed in you Mac. So you don't have to run it.

Second: Run the "terminal' window. The icon should be in your "Utility" program group. The one in the middle.

If your run it the a window like this will open

Third: Then on the terminal prompt type the following code (press "return" after each line):

cd /usr/bin
sudo ln -fs clang llvm-gcc-4.2
sudo ln -fs clang++ llvm-g++-4.2

After that, you should close everything then restart your MBA.

Then you can try again the process to install.package in R from tar.gz file.

Hope it helps. It works on my friend's MBA running Mavericks.

27 March 2014

[R] Installing Hydromad package on Mac OS X

I am sure that some of previous questions about this have been answered. I just thought this post (or re-post) might be useful for new user that come up with this problem . 



I use OS X Lion (10.7.2) and previously had problem in installing Hydromad Package with conventional way (provided in the web site). First I thought It was completely installed by downloading the ./hydromad_0.9-19.tar.gz from http://hydromad.catchment.org/src/contrib/, because the "hydromad" was already listed in the package list. But, it turned out to be not completely installed, since I kept getting errors after loading it. 

So I asked this problem to Statistics and R Communities in Google Plus, and surprisingly +Jesse Hamner(from Univ of North Texas) answered.

Based on his answer, I did the following steps:

1) basically Mac User have to download Xcode from the App Store (only for Maverick user).

For those with older OS X (which was also my case), you can go to https://developer.apple.com/xcode/ (but you have to spare sometime to register as App Developer first and off course you must have an Apple ID). 

After that, you can easily choose the right Xcode version based on your OS, download it, install, and reboot.

(the opening page, choose "download previous version of Xcode)


(after register as app developer and sign in Apple ID then you can choose the correct Xcode version)



2) Then you can start RStudio and install all the dependencies with: install.packages(c("zoo", "latticeExtra", "polynom", "car", "Hmisc", "reshape")) 

3) Then download the ./hydromad_0.9-19.tar.gz source file from "http://hydromad.catchment.org/src/contrib/


4) Then from RStudio open the "Package Installer" dialog box and select install from "Archive Package File", browse to the location of your ./hydromad_0.9-19.tar.gz. Click it. 

Then after that, your Hydromad should be running perfectly. 

Thanks  +Jesse Hamner

03 March 2014

[R] How to cite R




{R-030314-citation}

You must know how to spell R, but do you know to cite it?

First lesson today is to know hiw to cite R. We often forget to add the list soft apps or packages we use in our paper/research. In hydrogeology we often use ArcGIS, ModFlow, and the most basic of all, Microsoft Excel. One (including me) shouldn't forget to add them in all of our works. 

Anyway this is how to list the reference data of R. Read the last paragraf. That's why we put this on the first place.

Type:

> citation()

Then enter,

And the following lines will come up.

To cite R in publications use:

 R Development Core Team (2012). R: A language and environment for
 statistical computing. R Foundation for Statistical Computing,
 Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.

A BibTeX entry for LaTeX users is

 @Manual{,
   title = {R: A Language and Environment for Statistical Computing},
   author = {{R Development Core Team}},
   organization = {R Foundation for Statistical Computing},
   address = {Vienna, Austria},
   year = {2012},
   note = {{ISBN} 3-900051-07-0},
 }

We have invested a lot of time and effort in creating R, please cite
it when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.


Dasapta Erwin Irawan 
Applied Geology Research Division
Faculty of Earth Science and Tech
Institut Teknologi Bandung, Indonesia
{@dasaptaerwin}