How does eilu veilu work out with an absolute truth? Moshe, English isn't my strong suit but it sounds to me almost as if your question is based on the assumption that "absolute truth" is singular, that is, that absolute truth can represent only a single perspective Personally, I've always understood "eilu v'eilu" to represent a prismatic array of absolute truth, analagous to how white light can be refracted into a colored spectrum through a prism
Exponential Linear Units (ELU) vs $log (1+e^x)$ as the activation . . . About ELU: ELU has a log curve for all negative values which is $ y = \alpha ( e^x - 1 )$ It does not produce a saturated firing for some extent but saturates for larger negative values See here for more information Hence, $ y = log ( 1 + e^x ) $ is not used because of early saturation for negative values and also non linearity for values > 0
Why do many boys begin learning Gemara with Elu Metzios? There is a popular custom for boys to start their Gemara studies with Elu Metzios (the 2nd Perek in Bava Metzia) The Gemara (Bava Basra 175b) does say that financial laws are conducive to becomin
Why is Yeush seemingly used in place of Hefker? (Bava Metzia) In Elu Metziot, (Perek 2 of masechet Bava Metzia), It establishes that Maot mefuzarot is lo haveh yeush becuase of R' Yitzchak's baraisa of Mashmesh My only problem is that the baraisa adds "sha'ah vesha'ah", which means literally: "hour and hour", but it really means often
Why does it speed up gradient descent if the function is smooth? The intuitive explanation goes like the following In ELU, whenever x became small enough, the gradient became really small and saturated (in the same way it happens for Tanh and Sigmoid) The small gradient means that the learning algorithm can focus on the tuning of other weights without worrying about the interactivity with the saturated