'R 기초' 태그의 글 목록

R 기초

[Fast campus] 08. 유틸리티(반복, 조건, 사용자 함수) 2017.07.24
[Fast campus] 07. 데이터 핸들링 2 2017.07.20
[Fast campus] 06. 데이터 핸들링 1 2017.07.20
[Fast campus] 05. 외부 데이터 읽어오기 2017.07.19
[Fast campus] 04. 데이터의 유형(Type of Data) - 배열(Array), 데이터 프레임(Data.Frame), 리스트(List) 2017.07.19

[Fast campus] 08. 유틸리티(반복, 조건, 사용자 함수)

2017. 7. 24. 19:36

[Fast campus] Data Science with R 1기 - 이부일 강사님

R 공부하면서 배운 내용 복습 겸 정리하는 곳입니다.

# R.utils 패키지를 통해 printf() 사용 가능

install.packages("R.utils")

library(R.utils)

1. 반복문 : for

# 동일한 일을 여러 번 하거나

# 비슷한 일을 여러 번 할 때

1
2
3
4
5
6
7
8
9
10
11
12
13
> for(i in 1:10){
+   cat("hello,", i, "\n")
+ }
hello, 1 
hello, 2 
hello, 3 
hello, 4 
hello, 5 
hello, 6 
hello, 7 
hello, 8 
hello, 9 
hello, 10
cs

# in은 대입, 할당하는 역할

# 1:10 => 데이터의 개수(10개) 만큼 for문이 시행된다.

# cat 대신 printf("hello, %d \n", i) 를 사용해도 동일한 결과 출력

# 구구단

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
> for(i in 1:9){
+   cat(i, "단 이에요!", "\n")
+   for(j in 1:9){
+     cat(i, "x", j, "=", i*j, "\n")
+   }
+   cat("\n")
+ }
 
1 단 이에요! 
1 x 1 = 1 
1 x 2 = 2 
1 x 3 = 3 
1 x 4 = 4 
1 x 5 = 5 
1 x 6 = 6 
1 x 7 = 7 
1 x 8 = 8 
1 x 9 = 9 
...
Colored by Color Scripter
cs

# 2~9단도 동일한 결과로 출력됨

2. 조건문

(1) if( 조건 ){ 실행문 } - 조건이 TRUE면 실행문을 실행

1
2
3
4
5
6
7
8
9
> x = c(100, 30, 60)
> for(i in 1:3){
+   if(x[i] > 50){
+     printf("%d Very large number!!! \n", x[i])
+   } 
+ }
 
100 Very large number!!! 
60 Very large number!!!
cs

(2) if (조건문) { 조건이 참일 때 실행문 1}
else{ 조건이 거짓일 때 실행문 2}

1
2
3
4
5
6
7
8
> y  = 10
> if(y > 5){
+   print("Large!!")
+ } else{
+   print("Small!!")
+ }
 
[1] "Large!!"
cs

(3) if( 조건문1) { 실행문 1 }

else if( 조건문 2) { 실행문 2}

else{ 실행문 3 }

# else if 조건 추가 가능

1
2
3
4
5
6
7
8
9
10
> z = 7
> if(z>10){
+   print("Large")
+ } else if(z>5){
+   print("medium")
+ } else{
+   print("small")
+ }
 
[1] "medium"
cs

3. 사용자 함수

# 함수명 = function( ) { 실행문 }

1
2
3
4
5
6
7
8
9
10
11
12
> hello = function(){
+   print("hello, world")
+   return("hello, fastcampus")
+ }
 
> hello()
[1] "hello, world"
[1] "hello, fastcampus"
 
> x = hello()
> x
[1] "hello, world"
cs

# return 의 의미를 생각

# 숫자 x를 입력받으면 x*3 을 해주는 함수 작성

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
> triple = function(x){
+   if(mode(x) == "numeric"){
+  #if(is.numeric(x)) 을 사용해도 된다
+     tmp = 3*x
+     return (tmp)
+   } else{
+     print("숫자를 넣어주세요")
+   }
+ }
 
> triple(10)
[1] 30
 
> triple("10")
[1] "숫자를 넣어주세요"
cs

# 구구단 : 해당 숫자의 구구단을 출력

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
> gugudan = function(x){
+   if(is.numeric(x)){
+     printf("%d단 \n", x)
+     for(i in 1:9){
+       printf("%d x %d = %d \n", x,i,x*i)
+     }
+   } else{
+     print("숫자를 넣어주세요")
+   }
+ }
 
> gugudan(3)
3단 
3 x 1 = 3 
3 x 2 = 6 
3 x 3 = 9 
3 x 4 = 12 
3 x 5 = 15 
3 x 6 = 18 
3 x 7 = 21 
3 x 8 = 24 
3 x 9 = 27
Colored by Color Scripter
cs

저작자표시 (새창열림)

'Programming > R' 카테고리의 다른 글

[Fastcampus] 10. 집값 예측 miniproject (0)	2017.08.22
[Fast campus] 09. 기초 데이터 분석(hflights) (0)	2017.08.02
[Fast campus] 07. 데이터 핸들링 2 (0)	2017.07.20
[Fast campus] 06. 데이터 핸들링 1 (0)	2017.07.20
[Fast campus] 05. 외부 데이터 읽어오기 (0)	2017.07.19

[Fast campus] 07. 데이터 핸들링 2

2017. 7. 20. 19:41

[Fast campus] Data Science with R 1기 - 이부일 강사님

R 공부하면서 배운 내용 복습 겸 정리하는 곳입니다.

6. 새로운 변수 만들기

# 데이터$변수명 = 연산(수식)

# bmi 수치를 계산해서 새로운 변수로 추가

1
student$bmi = student$weight / ((student$height/100)^2)
cs

# ifelse(조건, 참일 때 표현식, 거짓일 때 표현식)

# 30대 이상 / 20대 이하를 나타내는 새로운 변수(age_group)를 추가

1
student$age_group = ifelse(student$age >= 30, "30대 이상", "20대 이하")
cs

# 20대 초반, 20대 중반, 30대 이상으로 나뉘는 age_group2를 만들어라

1
2
student$age_group = ifelse(student$age >= 30, "30대 이상",
                            ifelse(student$age >= 25, "20대 중반", "20대 초반")
cs

# cut(데이터명$변수명, breaks = 구간정보) => numeric data에 적용

경도 비만(1단계 비만) : 25 - 30

과체중 : 23 - 24.9

정상 : 18.5 - 22.9

저체중 : 18.5 미만

1
2
student$bmi_group = cut(student$bmi, 
                        breaks = c(0, 18.5, 23, 25, 30))
cs

# 데이터의 결과를 보면 ( ] 와 같이 소괄호 괄호로 표현된다

# (18.5, 23] => 18.5 초과 23이하

# ( ) 소괄호는 초과

# [ ] 대괄호는 이하

# 0과 30 초과 값들은 NA로 입력이됨

# 위 코드에 right = FALSE 라는 argument 를 추가해보자

1
2
3
student$bmi_group = cut(student$bmi, 
                        breaks = c(0, 18.5, 23, 25, 30), 
                        right  = FALSE)
cs

# right = FALSE 를 하게되면 괄호의 위치가 바뀌게 됨

# [18.5, 23) => 18.5 이상 23미만

# 30 이상의 값들은 NA로 표시됨

# 각 행들의 평균 구하기

1
score$avg = rowMeans(score[ , 2:6])
cs

# 각 열들의 평균은 colMeans( )

7. 데이터의 값을 수정하기

# home 이라는 데이터

1
2
3
4
5
6
7
8
9
10
11
> home
# A tibble: 7 x 4
     id  room price  area
  <dbl> <dbl> <dbl> <dbl>
1     1     1    40     13
2     2     4 50000    55
3     3     3 20000    35
4     4     3 50000    35
5     5     1    43     2
6     6     4 50000    45
7     7     1   500    15
cs
  

  # 500이라는 값을 변경

1
2
3
4
5
6
7
8
9
10
11
12
> home[home$price == 500, "price"] = 50
> home
# A tibble: 7 x 4
     id  room price  area
  <dbl> <dbl> <dbl> <dbl>
1     1     1    40     1
2     2     4 50000    55
3     3     3 20000    35
4     4     3 50000    35
5     5     1    43     2
6     6     4 50000    45
7     7     1    50    15 
Colored by Color Scripter
cs

8. 데이터 정렬하기

(1) 벡터를 정렬하기 : sort(벡터, decreasing = )

# 기본값 = 오름차순

1
2
3
4
5
6
> money = c(45, 50, 40, 50, 50, 30, 500)
> sort(money)
[1]  30  40  45  50  50  50 500
 
> sort(money, decreasing = TRUE) 
[1] 500  50  50  50  45  40  30
cs

(2) order(데이터명$변수명, decreasing = )

# sort는 벡터에서만 사용가능

# data.frame에서는 order 함수를 사용

1
2
> order(money)
[1] 6 3 1 2 4 5 7
cs

# order는 정렬될 인덱스를 알려준다.

# data.frame에서 데이터 정렬은 행이 바뀌는 것

# 슬라이싱의 행 자리에 order가 들어간다.

# 성별 내림차순

1
student[ order(student$height, decreasing = TRUE) , ]
cs

# 성별 내림차순 / 키 내림차순

1
student[ order(student$gender, student$height, decreasing = TRUE ) , ]
cs

# 성별 오름차순 / 키 내림차순

# 조건이 다르니 -를 사용

1
student[ order(student$gender, -student$height) , ]
cs

# 성별 내림차순 / 키 오름차순

# - 는 numeric에만 적용 가능

1
student[ order(student$gender, -student$height, decreasing = TRUE ) , ]
cs

# 성별 오름차순 / 거주지 내림차순

# character 타입의 데이터를 따로 정렬하려면?

# - 는 numeric에만 적용된다 => 기본 기능에서는 불가능

# data.table 패키지를 사용하여 데이터를 data.table로 변경

1
studentDT = as.data.table(student), ]
cs

# str(studentDT)를 통해 확인해보면 data.frame, data.table 2가지 형태를 모두 인식하는 것을 볼 수 있다.

1
studentDT[ order(gender, -address) , ]  
cs

# 데이터의 column에 바로 접근 가능

# 결과를 콘솔 화면에 보여주기만 함

1
setorder(studentDT, gender, -address)  
cs

# setorder( ) 를 사용하면 데이터를 정렬해서 결과 저장

※ data.frame vs data.table 참고

http://using.tistory.com/81

# data.table()_cheet sheet 확인

# setkey(), J(), fread() 공부

9. 데이터 합치기

(1) rbind(데이터1, 데이터2, ... ) => 위, 아래로 데이터 합치기

(2) merge(데이터1, 데이터2, ..., by=, all=, all.x=, all.y=)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
> df4 = data.frame(id  = c(1, 2, 4, 7),
+                  age = c(10, 20, 40, 70))
> df5 = data.frame(id = c(1, 2, 3, 6, 10),
+                  gender = c("M", "M", "F", "M", "F"))
> df4;df5
  id age
1  1  10
2  2  20
3  4  40
4  7  70
  id gender
1  1      M
2  2      M
3  3      F
4  6      M
5 10      F
Colored by Color Scripter
cs

# merge 4가지 방법

① inner join(교집합)

# inner join은 2가지 데이터만, 3가지 이상은 다른 기능으로

# by는 PK(Primary Key) => 데이터를 구분해주는 값

# PK 예) 사람-주민등록번호, 회사직원-사원

1
2
3
4
> merge(df4, df5, by="id")
  id age gender
1  1  10      M
2  2  20      M
cs

# outer join

② full join(합집합) - all=TRUE

1
2
3
4
5
6
7
8
9
> merge(df4, df5, by="id", all=TRUE)
  id age gender
1  1  10      M
2  2  20      M
3  3  NA      F
4  4  40   <NA>
5  6  NA      M
6  7  70   <NA>
7 10  NA      F
cs

# R에서 숫자는 NA / 문자는<NA>로 표시해서 구분해준다.

③ left join - all.x=TRUE

1
2
3
4
5
6
> merge(df4, df5, by="id", all.x=TRUE)
  id age gender
1  1  10      M
2  2  20      M
3  4  40   <NA>
4  7  70   <NA>
cs

④ right join - all.y=TRUE

1
2
3
4
5
6
7
> merge(df4, df5, by="id", all.y=TRUE)
  id age gender
1  1  10      M
2  2  20      M
3  3  NA      F
4  6  NA      M
5 10  NA      F
cs

10. R데이터 저장하기

(1) 외부 데이터로 저장하기

1
2
3
write.csv(student,
          file      = "fs/data/student.csv",
          row.names = FALSE)
cs

# row.names = FALSE 를 주게되면 행 이름은 저장하지 않음

(2) R데이터로 저장하기

1
save(R데이터, file = "파일위치/파일명.RData")
cs

(3) R데이터 불러오기

1
load(file = "파일위치/파일명.RData") 
cs

저작자표시 (새창열림)

'Programming > R' 카테고리의 다른 글

[Fast campus] 09. 기초 데이터 분석(hflights) (0)	2017.08.02
[Fast campus] 08. 유틸리티(반복, 조건, 사용자 함수) (0)	2017.07.24
[Fast campus] 06. 데이터 핸들링 1 (0)	2017.07.20
[Fast campus] 05. 외부 데이터 읽어오기 (0)	2017.07.19
[Fast campus] 04. 데이터의 유형(Type of Data) - 배열(Array), 데이터 프레임(Data.Frame), 리스트(List) (0)	2017.07.19

[Fast campus] 06. 데이터 핸들링 1

2017. 7. 20. 19:30

[Fast campus] Data Science with R 1기 - 이부일 강사님

R 공부하면서 배운 내용 복습 겸 정리하는 곳입니다.

Data Handling = Data Pre-processing

## 사용할 데이터 읽어오기

# 외부 데이터를 읽어오면 data.frame 형태로 저장된다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
> student = readxl::read_excel(path = "fs/data/student.xlsx",
+                              sheet = "data",
+                              col_names = TRUE)
> student
# A tibble: 12 x 8
      id gender   age height weight  address
   <dbl>  <chr> <dbl>  <dbl>  <dbl>    <chr>
 1     1   남자    29    188     80   구파발
 2     2   남자    28    185     63     강동
 3     3   남자    25    172     63 압구정동
 4     4   여자    24    160     48     수원
 5     5   남자    26    180     80     용인
 6     6   남자    26    188     77     성수
 7     7   남자    59    170     63     수원
 8     8   여자    25    160     45     분당
 9     9   남자    27    178     78     강동
10    10   남자    31    181     95     의왕
11    11   여자    27    167     58     길동
12    12   남자    40    175     70     부산
# ... with 2 more variables: major <chr>,
#   company <chr>
Colored by Color Scripter
cs

1. 데이터 전체보기

(1) View(데이터)

View(student)

# 별도의 팝업 창으로 데이터를 볼 수 있음

(2) 데이터 : 콘솔(Console)에 출력

student

2. 데이터 구조(Structure) 보기

(1) str(데이터)

str(student)

1
2
3
4
5
6
7
8
9
10
> str(student)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    12 obs. of  8 variables:
 $ id     : num  1 2 3 4 5 6 7 8 9 10 ...
 $ gender : chr  "남자" "남자" "남자" "여자" ...
 $ age    : num  29 28 25 24 26 26 59 25 27 31 ...
 $ height : num  188 185 172 160 180 188 170 160 178 181 ...
 $ weight : num  80 63 63 48 80 77 63 45 78 95 ...
 $ address: chr  "구파발" "강동" "압구정동" "수원" ...
 $ major  : chr  "경영정보학" "산업경영학" "경영학" "수학" ...
 $ company: chr  "NC소프트" "구글" "아마존" "SK" ...
cs

# student의 id만 보고 싶다면 => str(student$id)

3. 데이터의 일부 보기

(1) head(데이터)

# 상위 6개의 데이터를 보여줌

1
2
3
4
5
6
7
8
9
10
11
12
> head(student)
# A tibble: 6 x 8
     id gender   age height weight  address
  <dbl>  <chr> <dbl>  <dbl>  <dbl>    <chr>
1     1   남자    29    188     80   구파발
2     2   남자    28    185     63     강동
3     3   남자    25    172     63 압구정동
4     4   여자    24    160     48     수원
5     5   남자    26    180     80     용인
6     6   남자    26    188     77     성수
# ... with 2 more variables: major <chr>,
#   company <chr>
cs

# head(student, n = 3)

# 출력 데이터 개수 지정 가능

(2) tail(데이터)

# 하위 6개의 데이터를 보여줌

# head와 동일한 출력 형태

# tail(student)

# tail(student, n = 3)

4. 데이터 프레임의 속성

(1) 행의 개수 : nrow(데이터)

1
2
> nrow(student)
[1] 12
cs

(2) 열의 개수 => 변수의 개수 : ncol(데이터)

1
2
> ncol(student)
[1] 8
cs

(3) 행의 이름 : rownames(데이터)

# 결과는 character

1
2
3
> rownames(student)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8" 
 [9] "9"  "10" "11" "12"
cs

(4) 열의 이름 => 변수의 이름 : colnames(데이터)

# 결과는 character

1
2
3
> colnames(student)
[1] "id"      "gender"  "age"     "height" 
[5] "weight"  "address" "major"   "company"
cs

(5) 차원(dimension) : 행, 렬 : dim(데이터)

1
2
3
4
5
6
7
8
> dim(student)
[1] 12  8
 
> dim(student)[1] 
[1] 12
 
> dim(student)[2] 
[1] 8
cs

(6) 차원의 이름 : 행의 이름, 열의 이름 : dimnames(데이터)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
> dimnames(student)
[[1]]
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8" 
 [9] "9"  "10" "11" "12"
 
[[2]]
[1] "id"      "gender"  "age"     "height" 
[5] "weight"  "address" "major"   "company"
 
> dimnames(student)[1] #리스트
[[1]]
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8" 
 [9] "9"  "10" "11" "12"
 
> dimnames(student)[[1]] #벡터
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8" 
 [9] "9"  "10" "11" "12"
 
> dimnames(student)[[1]][3] #벡터(dimnames(student)[[1]]) 의 3번 째 값
[1] "3"
Colored by Color Scripter
cs

5. 데이터(data.frame)의 슬라이싱

# 데이터[행index, 열index] => data.frame은 행,열의 2차원 구조

# Vectorization 이 적용됨, for문 없이 연산이 끝까지 수행

(1) 열(column)

# 데이터[ , index]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
> student[ , 2]
# A tibble: 12 x 1
   gender
    <chr>
 1   남자
 2   남자
 3   남자
 4   여자
 5   남자
 6   남자
 7   남자
 8   여자
 9   남자
10   남자
11   여자
12   남자
cs

문제. 짝수번째 열 가져오기

1
2
3
student[ , seq(from = 2,
               to   = ncol(student),
               by   = 2)] 
cs

# vector 에서는 length

# data.frame 에서는 nrow, ncol

문제. weight, height의 데이터를 가져오기

1
student[ , c("weight", "height")
cs

# index가 character 이므로 c( )를 사용

# 변수명에 특정한 패턴이 있는 것을 추출

# grep("패턴", 문자열)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# 변수명 중에서 'e'라는 글자를 포함하고 있는 변수명의 위치
grep("e", 
     colnames(student))
 
# 'e'라는 글자를 포함하고 있는 변수명
grep("e", 
     colnames(student), 
     value=TRUE) 
 
# 'e'라는 글자를 포함하고 있는 데이터를 추출
student[ , grep("e", 
                colnames(student), 
                value=TRUE)]
 
# 'a'라는 글자로 시작하는 데이터를 추출 => ^a
student[ , grep("^a", 
                colnames(student), 
                value=TRUE)]
 
# 't'라는 글자로 끝나는 데이터를 추출 => t$
student[ , grep("t$", 
                colnames(student), 
                value=TRUE)]
 
# 't'라는 글자로 끝나거나, 'a'라는 글자로 시작하는 데이터를 추출
student[ , grep("t$|^a", 
                colnames(student), 
                value=TRUE)]
cs

# 정규표현식을 익히자

# https://wikidocs.net/1669

# http://www.nextree.co.kr/p4327/

(2) 행(row)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 성별이 여자인 데이터만 가져오기
student.female = student[student$gender == "여자", ]
 
# 거주지가 수원이 아닌 사람들의 데이터
student[student$address != "수원", ]
 
# 몸무게가 50 이하인 사람들의 데이터
student[student$weight <= 50, ]
 
# 나이가 30대 이상이고, 키는 175 이상인 사람
student[(student$age >= 30) & (student$height >= 175), ]
 
# 나이가 30대 이상이거나 키는 175 이상인 사람
student[(student$age >= 30) | (student$height >= 175), ]
cs

(3) 행, 열

1
2
3
4
# 키가 170cm 이상이고, 몸무게가 60kg 이상인 사람들 중에서 
# 변수명에 'e'라는 글자가 들어가는 데이터
student[(student$height >= 170) & (student$weight >= 60), 
        grep("e", colnames(student))]
cs

저작자표시 (새창열림)

'Programming > R' 카테고리의 다른 글

[Fast campus] 08. 유틸리티(반복, 조건, 사용자 함수) (0)	2017.07.24
[Fast campus] 07. 데이터 핸들링 2 (0)	2017.07.20
[Fast campus] 05. 외부 데이터 읽어오기 (0)	2017.07.19
[Fast campus] 04. 데이터의 유형(Type of Data) - 배열(Array), 데이터 프레임(Data.Frame), 리스트(List) (0)	2017.07.19
[Fast campus] 03. 데이터의 유형(Type of Data) - 요인(Factor), 행렬(Matrix) (0)	2017.07.19

[Fast campus] 05. 외부 데이터 읽어오기

2017. 7. 19. 22:53

[Fast campus] Data Science with R 1기 - 이부일 강사님

R 공부하면서 배운 내용 복습 겸 정리하는 곳입니다.

외부 데이터 : txt, csv, excel(xls, xlsx)

1. 텍스트 데이터 : txt

(1) 구분자(Separator) : 공백 하나(blank, white space)

1
2
3
데이터명 = read.table(file  = "파일위치/파일명.txt",
                     header = TRUE,
                     sep    = " "
cs

(2) 구분자(Separator) : comma(,)

1
2
3
데이터명 = read.table(file   = "파일위치/파일명.txt",
                     header = TRUE,
                     sep    = ","
cs

(3) 구분자(Separator) : 탭(tab)

1
2
3
데이터명 = read.table(file   = "파일위치/파일명.txt",
                     header = TRUE,
                     sep    = "\t"
cs

2. CSV(Comma Separated Value)

# 엑셀의 특수한 형태

1
2
3
데이터명 = read.table(file   = "파일위치/파일명.txt",
                     header = TRUE,
                     sep    = "\t"
cs

3. 엑셀 : xls, xlsx

# R의 기본 기능에서는 못 읽어 옴

# 추가 기능(패키지(Package))을 설치 - install.packages("패키지명")

# 패키지 로딩하기, 구동하기 - library(패키지명)

1
2
3
4
5
6
> install.packages("readxl")
package ‘readxl’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\JungChul\AppData\Local\Temp\RtmpG8Gy4u\downloaded_packages
> library(readxl)
Colored by Color Scripter
cs

# 패키지 설치 명령어 후 successfully 메시지가 나오면 설치 완료

# 패키지는 항상 코드 가장 위( 다른 pc에서 실행 할 수도 있으니 install.packages명령어도 같이 적어두기 )

# 패키지 내 함수를 사용할 때는 '패키지명::함수' 형태로 사용하자

1
2
3
데이터명 = readxl::read_excel(path="파일위치/파일명.xlsx",
                             sheet=index or "시트명",
                             col_names=TRUE)
cs

저작자표시 (새창열림)

'Programming > R' 카테고리의 다른 글

[Fast campus] 07. 데이터 핸들링 2 (0)	2017.07.20
[Fast campus] 06. 데이터 핸들링 1 (0)	2017.07.20
[Fast campus] 04. 데이터의 유형(Type of Data) - 배열(Array), 데이터 프레임(Data.Frame), 리스트(List) (0)	2017.07.19
[Fast campus] 03. 데이터의 유형(Type of Data) - 요인(Factor), 행렬(Matrix) (0)	2017.07.19
[Fast campus] 02. 데이터의 유형(Type of Data) - 벡터(vector) (0)	2017.07.19

[Fast campus] 04. 데이터의 유형(Type of Data) - 배열(Array), 데이터 프레임(Data.Frame), 리스트(List)

2017. 7. 19. 22:46

[Fast campus] Data Science with R 1기 - 이부일 강사님

R 공부하면서 배운 내용 복습 겸 정리하는 곳입니다.

IV. 배열(Array)

# 다차원

# vector, 행렬의 확장

# 벡터의 특징( Recycling Rule, Vectorization ) 그대로 적용됨

# array(vector, dim=)

1
2
> array(1:10, dim=10)
 [1]  1  2  3  4  5  6  7  8  9 10
cs

# dim에 1개의 숫자

# 1차원 형태를 지닌 벡터의 결과

1
2
3
4
5
> array(1:10, dim=3:4)
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8    1
[3,]    3    6    9    2
cs

# dim에 3,4 => 2개의 숫자

# 2차원 형태를 지닌 행렬의 결과

1
2
3
4
5
6
7
8
9
10
11
12
13
14
> array(1:10, dim=c(3,4,2))
, , 1
 
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8    1
[3,]    3    6    9    2
 
, , 2
 
     [,1] [,2] [,3] [,4]
[1,]    3    6    9    2
[2,]    4    7   10    3
[3,]    5    8    1    4
cs

# dim에 숫자 3,4,2 => 3개의 숫자

# 3행 4열 2높이 => 3차원

V. 데이터 프레임(Data.Frame)

# 행, 열로 구성. 2차원

# 여러 개의 데이터 유형을 가질 수 있음

# 단, 하나의 열은 하나의 데이터 유형만 가짐

# data.frame(벡터1, 벡터2, 행렬, ...)

1
2
3
4
5
6
7
8
9
10
11
> id = 1:5
> gender = c("m", "m", "m", "f", "m")
> address = c("구파발", "강동", "압구정", "수원", "용인")
> survey = data.frame(id, gender, address)
> survey
  id gender address
1  1      m  구파발
2  2      m    강동
3  3      m  압구정
4  4      f    수원
5  5      m    용인
cs

VI. 리스트(List)

# 분석한 결과를 저장할 때에 많이 사용하는 형태

# 가장 유연한 데이터 형태

# 리스트의 원소로 vector, factor, matrix, array, data.frame, list를 가질 수 있다.

# list(vector, factor, matrix, array, data.frame, list, ...)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
> v1 = 1:10
> v2 = 1:3
> v3 = c("ch", "nu", "lo")
> v4 = c(TRUE, FALSE)
> m1 = cbind(v1, v2)
> result = list(v1, v2, v3, v4, m1, survey)
> result
[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10
 
[[2]]
[1] 1 2 3
 
[[3]]
[1] "ch" "nu" "lo"
 
[[4]]
[1]  TRUE FALSE
 
[[5]]
      v1 v2
 [1,]  1  1
 [2,]  2  2
 [3,]  3  3
 [4,]  4  1
 [5,]  5  2
 [6,]  6  3
 [7,]  7  1
 [8,]  8  2
 [9,]  9  3
[10,] 10  1
 
[[6]]
  id gender address
1  1      m  구파발
2  2      m    강동
3  3      m  압구정
4  4      f    수원
5  5      m    용인
 
Colored by Color Scripter
cs

리스트의 슬라이싱

# 리스트의 슬라이싱은 [ ] , [[ ]]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> a1 = result[1]
> a2 = result[[1]]
> a1
[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10
 
> a2
 [1]  1  2  3  4  5  6  7  8  9 10
 
> b1 = result[4]
> b2 = result[[4]]
> b1
[[1]]
[1]  TRUE FALSE
 
> b2
[1]  TRUE FALSE
cs

# 대괄호 1개 사용 : 결과는 list

# 대괄호 2개 사용 : numeric, character, logical, list로 나타남.

저작자표시 (새창열림)

'Programming > R' 카테고리의 다른 글

[Fast campus] 06. 데이터 핸들링 1 (0)	2017.07.20
[Fast campus] 05. 외부 데이터 읽어오기 (0)	2017.07.19
[Fast campus] 03. 데이터의 유형(Type of Data) - 요인(Factor), 행렬(Matrix) (0)	2017.07.19
[Fast campus] 02. 데이터의 유형(Type of Data) - 벡터(vector) (0)	2017.07.19
[Fast campus] 01. Basic (0)	2017.07.19

PREV 1 NEXT

Cheo-ri

R 기초

[Fast campus] 08. 유틸리티(반복, 조건, 사용자 함수)

'Programming > R' 카테고리의 다른 글

[Fast campus] 07. 데이터 핸들링 2

'Programming > R' 카테고리의 다른 글

[Fast campus] 06. 데이터 핸들링 1

'Programming > R' 카테고리의 다른 글

[Fast campus] 05. 외부 데이터 읽어오기

'Programming > R' 카테고리의 다른 글

[Fast campus] 04. 데이터의 유형(Type of Data) - 배열(Array), 데이터 프레임(Data.Frame), 리스트(List)

'Programming > R' 카테고리의 다른 글

+ Recent posts

티스토리툴바