The GPT-4 paper doesn't really give us any juicy details about the architecture or parameters, right? But, as someone who's into research or ML stuff, I can't help but wonder how they managed to bump up the context window to a whopping 32k tokens.
For the kind of work I'm into, a 4k or 8k token limit just doesn't cut it. I've noticed that open-source projects are all about matching the number of parameters and the quality of those top-secret models, but they're totally missing the big picture – the context window! I mean, not a single open-source model out there has a context window over 2k tokens.