This is a new function for me, so as far as i interpret it correctly flatten is more like -> no bottleneck.
Of course flatten happens always at the end of the encoder to make a long 1d vector, normally this goes into a smaller dense bottleneck or pooling bottleneck but with flatten you just pass this enormous vector strait to the dense layers.
I see in most cases it explodes the parameter count.
Can somebody explain what the best use case is for the use of flatten als bottleneck?
Edit 1, seems like it is for the transformers
Last edited by Ryzen1988 on Sat Aug 05, 2023 3:20 pm, edited 1 time in total.
Yes. This was added to effectively have as a "no bottleneck" option, which is only really relevant for the Visual Transformer encoder, but could conceivably be used for other encoders if they output a small enough tensor.
It essentially allows for bypassing a bottleneck layer, particularly useful for the visual transformer but potentially applicable to other encoders if they produce compact enough tensors. This option enables the direct passage of an extensive 1D vector to dense layers noticeably increasing parameter count.