1

I tried making the following histogram in R (randomly select 10% of all rows and color them red):

a = rnorm(100000,60000,1000)
b = a

c = data.frame(a,b)
color <- c("black", "red")     
color_1 <- sample(color, nrow(c), replace=TRUE, prob=c(0.9, 0.1))
c$color_1 = as.factor(color_1)


hist(c$a, col = c$color_1, 100000, main = "title")

legend("topleft", legend=c("group a", "group b"),
       col=c("red", "black"), lty = 1, cex=0.8)
title(
      sub = "some title")

Problem: But for some reason, the colors are not showing up:

enter image description here

I tried to see if other commands might get the colors to show up:

hist(c$a, col = color_1, 100000, main = "title")

Or trying to remove the color variable as a "factor":

a = rnorm(100000,60000,1000)
b = a

c = data.frame(a,b)
color <- c("black", "red")     
color_1 <- sample(color, nrow(c), replace=TRUE, prob=c(0.9, 0.1))
c$color_1 = color_1


hist(c$a, col = c$color_1, 100000, main = "title")

legend("topleft", legend=c("group a", "group b"),
       col=c("red", "black"), lty = 1, cex=0.8)
title(
      sub = "some title")

I also tried to follow the advice from this question here (Partially color histogram in R) :

h = hist(c$a, col = c$color_1, breaks = 100000, main = "title")

legend("topleft", legend=c("group a", "group b"),
       col=c("red", "black"), lty = 1, cex=0.8)
title(
      sub = "some title")



cuts <- cut(h$breaks, c(-Inf,Inf))
plot(h, col=cuts)

But this also did not work. I think this might be because I am not using the "cut" function correctly?

Can someone please show me how to fix this?

Thanks

1
  • If you are selecting histogram bars at random you don't have 1)a cut point; 2) What is b meant for? It's not used in the rest of the code; 3) Do you really want 100K bars for 100K data points? Commented Oct 10, 2021 at 5:42

1 Answer 1

1

Here is what I understand of the question:

  1. Plot a vector's histogram;
  2. 10% of the bars are randomly selected;
  3. And have a different color.

First remake the example data set. Apparently, there is no need for a 2nd vector b. And the RNG seed is set, in order to make the results reproducible.

set.seed(2021)
a <- rnorm(100000, 60000, 1000)
c <- data.frame(a)
color <- c("black", "red")     
n_colors <- length(color)

Now get the histogram data but don't plot it. Then select as many color codes (at most n_colors) as counts. And plot the histogram.

h <- hist(c$a, breaks = "FD", plot = FALSE)
i_col <- sample(n_colors, length(h$counts), replace = TRUE, prob = c(0.6, 0.4))
plot(h, main = "title", col = color[i_col])

legend("topleft", legend=c("group a", "group b"),
       col=c("red", "black"), lty = 1, cex=0.8)
title(sub = "some title")

enter image description here

2
  • Thank you for your answer! I was able to figure it out another way - would you like to see my answer? (after a few minutes, i can accept your answer)
    – stats_noob
    Commented Oct 10, 2021 at 5:52
  • (can you take a look at this question if you have time later: stackoverflow.com/questions/69512664/… thank you!)
    – stats_noob
    Commented Oct 10, 2021 at 5:56

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.