Submissions by Ok_Net_1674

Ok_Net_1674

forsenCD

22h

:9678:actually sentient

Day 389 posting my parrot Lenny for the bajs

forsen

Ok_Net_1674

forsenCD

22h

A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. :9667:

I hate transformers

forsen

Ok_Net_1674

its was just a little airborne

A sad story in one picture

pics

Ok_Net_1674

Of course a cache miss can stall. It takes hundreds of cycles to fetch from RAM, for example.

But I was thinking of branch mispredicts, since for 100 elements a cache miss is rather unlikely to occur. For large arrays, linear search also has an advantage in that regard but its outweighed by the much higher complexity at that point.

O(1) is a lie

programminghumor

Ok_Net_1674

HA HA

ICH HAB'S SCHON IMMER GEWUSST🤣🤣🤣🤣🤣🤣 Frauen am Steuer🤣🤣🚗🚗🚗🚗💥💥💥💥euer Rolf

ichbin40undlustig

Ok_Net_1674

A linear search on such a small array can be super fast in hardware, it fits better in the pipeline (less pipeline stalls) and its possible to parallelize with SIMD instructions. Binary search is a lot harder for the compiler to optimize.

I found a cool blog post that compared exactly this, link the best binary search Implementation (which is kinda involved) is just as fast as the linear search for 100 elements.

O(1) is a lie

programminghumor

Ok_Net_1674

forsenCD

Quinn you look like a lion MegaLUL

Quinn Posting #164

forsen

Ok_Net_1674

You should probably put it in sideways like the picture said, as it has that hard knob thing at the top which might mess with the machine...

A machine to test if your avocado is ready to eat

Damnthatsinteresting

Ok_Net_1674

Stay young (=die young)

Insane Bike Loop

nextfuckinglevel

Ok_Net_1674

No way this is only gonna be five bucks. They've spent billions on R&D for this that they will need to recoup.

Amazing

BeAmazed

Ok_Net_1674

670

They produce the same result, but the one at the top keeps "jumping around" in memory, resulting in significantly worse performance.

Its quite possible that a modern compiler would produce the same code for both, however.

arraysAreFast

ProgrammerHumor

Ok_Net_1674

Its completely impossible to become an expert at python from a single university course. At least with no prior experience in another language. Also, in my experience, there is a hard limit to how much one can learn in a given time frame. You can't just double the hours and expect twice the progress.

28 days????? Are they going in the hyperbolic time chamber???

programminghumor

Ok_Net_1674

Warum verbietet die EU eigentlich nicht solche bescheuerten Mehr-Plastik-Als-Inhalt Packungen? Stattdessen der Schwachsinn mit den Flaschendeckeln, der zwar nett gemeint ist aber effektiv nichts bringt.

Aldi Prosciutto Cotto - jetzt mit 50% weniger Inhalt!

schrumpflation

Ok_Net_1674

Hat ja auch keiner gesagt?

ich_iel

Ok_Net_1674

Rhea Belle????

And the grin at the end

KidsAreFuckingStupid

Ok_Net_1674

Also electrocute and pta give everyone a three hit passive...

Just wait for it

LeagueOfMemes

Ok_Net_1674

10d

Beweis: Korrelation impliziert Kausalität

Die GRÜÜÜÜÜNEN

ichbin40undSchwurbler

Ok_Net_1674

forsenCD

10d

:9667:big ritard, those are ducks not geese

Posting my dog Otis until forsen plays black flag: day 238 (+some geese I saw on my trip today Pog)

forsen

Ok_Net_1674

11d

You win

Germany is playing tonight and I don’t care about soccer. What are the best ways to seize a partially empty city?

berlin

Ok_Net_1674

forsenCD

13d

:9668:

forsenE

forsen

Ok_Net_1674

forsenCD

13d

:9676:lennE

Day 376 posting my parrot Lenny for the bajs

forsen

Ok_Net_1674

14d

If I remember correctly, 0.5 can be an absolutely reasonable value, even for larger models. The PyTorch implementation of VGG for example also defaults to 0.5 dropout. VGG is far from small, the largest one has 150m Parameters.

As far as I can tell, they even used that value to create their pretrained ImageNet weights.

evaluatingModelsMakesMeLoseMyAppetite

ProgrammerHumor

Ok_Net_1674

14d

the default dropout probability in pytorch is 0.5, and it is also what the authors of the paper that invented it used.

evaluatingModelsMakesMeLoseMyAppetite

ProgrammerHumor

Ok_Net_1674

14d

What on earth does this comment mean??? I know what dropout is. But what the fuck are you saying? what half?

evaluatingModelsMakesMeLoseMyAppetite

ProgrammerHumor

Ok_Net_1674

17d

Die Arme Versicherung von dem Typ

Gehirnspagat Galore!

ichbin40undSchwurbler