[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vip /vp / vr / w / wg / wsg / wsr / x / y ] [Search | Home]
If images are not shown try to refresh the page. If you like this website, please disable any AdBlock software!

You are currently reading a thread in /sci/ - Science & Math

File: r-project-logo[1].jpg (23 KB, 300x300) Image search: [iqdb] [SauceNao] [Google]
23 KB, 300x300
I need a basic answer for a question about R, I'm just starting out

lets say I have a vector of integers called x. Normally, if i try to return certain elements, I can specify x[some range of elements]. What is going on if I try to return x[NA]?

Is this question better suited for /g/?
>>
>>7800716
Are you trying to find NaNs in a vector?
>>
It returns NA over and over again, depending on the number of elements in a vector. I know you can use is.na to NA's. I'm just trying to understand the output of x[NA]
>>
>>7800732
>>7800754
>>
>>7800716
OP, I think you are asking:
> I used NA as positional index for a numeric vector x[NA].
> This returns a vector of the same length whose all elements are NA.
> What is R doing?

TL;DR
>(1) NA represents a missing value for a data set.
>(2) R honors this by returning NA for every operation involving missing values.
>(3) NA is a logical value.
>(4) Inputting logical values as positional indices makes R evaluate every vector element, which then return NA consistently with (2).
>>
>>7800732
>>7801899 Also, check 'em

Let me walk you through it with examples:
> x <- c(5:9)
> x
[1] 5 6 7 8 9

As expected: first element, range from third to fifth elements.
> x[1]
[1] 5
> x[3:5]
[1] 7 8 9

But, for the 'NAth' position we get this:
> x[NA]
[1] NA NA NA

This is what's happening:
When returning vector values with numeric values as its positional index, R is doing a *numerical* operation.
R returns the 1st, 3rd to 5th, last element, etc...

However, NA is a *logical* constant - meaning "Not Available", a missing value, *NOT* to be confused with NaN (Not A Number) or others.
R evaluates the logic for every element and returns what applies.

> is(x)
[1] "integer" "numeric" "vector" "data.frameRowLabels"
> is(3)
[1] "numeric" "vector"

> is(NA)
[1] "logical" "vector"

.
.
.
Now, let's see how does R behave when a logical element is inputted as positional index
> is(TRUE)
[1] "logical" "vector"
> x[TRUE]
[1] 5 6 7 8 9

> is(x > 6)
[1] "logical" "vector"
> x[x > 6]
[1] 7 8 9

In both cases, R is
1) evaluating the logic for every value in the index
2) returning values that agree with the logic
3) dropping values that violate the logic [being the first all TRUE, none of the vector elements is dropped]
>>
>>7800716
>>7801899
>>7801901

Now: (Numerical and Logical) Operations involving NA will (almost always) result in NA.
This is because R will not give you a different answer than: "That data point is missing."
> 42 == NA
[1] NA
> 42 > NA
[1] NA
> 42 + NA
[1] NA
> NA == NA
[1] NA

Therefore, when presented with x[NA], R will evaluate a logical operation with NA on every value, which return NA.
Notice that NA results are *not* considered TRUE - they are actually FALSE.
However, because that data point is missing, R won't sweep it under the rug and will still signal you that you carried out an operation on missing values.

> NA == NA
[1] NA
> isTRUE(42 == NA)
[1] FALSE
> isTRUE(NA == NA)
[1] FALSE

> x[1] == NA
[1] NA
> x == NA
[1] NA NA NA NA NA
> x[x == NA]
[1] NA NA NA NA NA
> x[NA]
[1] NA NA NA NA NA

Hope this helps!

References:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/NA.html
https://stat.ethz.ch/R-manual/R-devel/library/base/html/which.html
https://cran.r-project.org/doc/manuals/r-release/R-lang.html#NA-handling
>>
Probably a better, more insightful question is why the NA error code appears for each entry when it is set as the index whilst NaN or Inf only return one entry.

My guess is that probably NA is one kind of error code whilst the others are something else.
>>
>>7801912
>NA error code
NA is *NOT* an error code, and neither is NaN
https://stat.ethz.ch/R-manual/R-devel/library/base/html/NA.html
>>
>>7801926
They can't be computed, so what are they then?
>>
>>7801936
Users can program around it to their discretion.

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/na.fail.html
http://www.inside-r.org/r-doc/stats/na.fail
http://www.ats.ucla.edu/stat/r/faq/missing.htm

It's not an error. This default behaviour is optimal because no data scient would want a piece of software blindingly deciding what to do with missing values.
> How would you have it otherwise?

Remember that the various NA's matter in large/real-world data sets.

OP's question is more out of curiosity because he'd expected a scalar value for vector[index], but R returns the above.
>>
>>7801956
>users can program around it

You can't do any operation on it without getting rid of it. The simplest solution is to turn it into a character then gsub it away to zero. That puts NA in the same class as Inf and NaN, which are error terms.

>data scientist

Yuck. That meme field full of overqualified idiots. Take that title, it's fitting for your knowledge of your workhorse.

>more out of curiosity
>scalar value

My question was out of curiosity. Something that is gone in both CS and Statistics with the latest crop of you mouth breathers getting into the field because it supposedly promises 'big money'.

Sadly, the field of computational statistics will be destroyed because Hadley and the others opened it up to the trash coming through the Universities today with their easy to learn packages and the no brainer blogs that allow lazy, stupid academics to structure courses.

It's the story of everything though, open it up to the masses, have the field depreciated due to stupidity.