mli
1
Why do we use average pooling rather than max pooling in the transition layer?
Hi @smizerex, great question! Max pooling was shown to have better performance than average pooling in AlexNet paper. While this is an architecture design choice, you are not limited to it. 
1 Like
wsuchy
4
The notation in ( 7.7.1) is misleading. It says that:
For the point x=0 it can be written as
f(x) = f(0) + f’(0)x …
therefore x on the RHS is confusing (as we’ve replaced it with 0).
The correct notation should use two variables, for example:
For the point a=0 it can be written as
f(x) = f(a) + f’(a)x + …
f(x) = f(0) + f’(0)x …
JH.Lam
6
similar to PageRank, use the word ‘contribution’ is very correct, or say ‘information’.
I think this DenseNet is the NN which closest to human brain in previous CNN cases.
JH.Lam
7
@goldpiggy I found this call costs most time and saw that it will copy data back to cpu, right? but even if it was that it still seems unacceptable .