How to handle non-determinism when training on a GPU? 29 TL;DR Non-determinism for a priori deterministic operations come from concurrent (multi-threaded) implementations Despite constant progress on that front, TensorFlow does not currently guarantee determinism for all of its operations After a quick search on the internet, it seems that the situation is similar to the other major toolkits
c# - The specified version string does not conform to the required . . . The maximum value for either of the parts is 65534, as you read here This is a limit imposed by the operating system, so not even specific to NET Windows puts the version numbers into two integers, which together form four unsigned shorts Adding some metadata to it (for the * option I guess) makes the maximum allowed value UInt16 MaxValue - 1 = 65534 (Thanks to Gary Walker for noticing
Auto Versioning in Visual Studio 2017 (. NET Core) - Stack Overflow Since auto-increment would break determinism (same input > same output) it is disallowed in that mode You can set <Deterministic>False< Deterministic> in the csproj to use it (or use any other MSbuild logic to calculate <VersionPrefix> <Version>)
How to Get Reproducible Results (Keras, Tensorflow): import tensorflow as tf tf keras utils set_random_seed(42) # sets seeds for base-python, numpy and tf tf config experimental enable_op_determinism() Note though, this comes at a significant performance penalty
How can floating point calculations be made deterministic? The hardware isn't determined by the program, though Therefore, "floating point operations" aren't deterministic The combined system is possibly deterministic, but you could say the same of the universe in general (Oh, scheduling of multiple threads is deterministic because we only use classical physics to implement the scheduler!) I'm still left wondering what is the best way to ensure
Replicating GPU environment across architectures - Stack Overflow Achieving bit-for-bit determinism across different GPU architectures is EXTREMELY hard, if not completely impossible In my experience, training a model on an a100 vs v100 for example with the same hyperparameters, seeds, etc can and more often than not will yield different results