Alibaba came out with another powerful vision large model. Seems just better than anything else out there across the board.
I tried its end to end audio module online and it is really good. We need audio models this good on the market. If they can ever get the size down a little bit so you can just run on a humanoid robot, it would be deadly. Right now, 30B imo is still a little large. 7B parameter is the most ideal size for edge devices.
somehow, it's OCR is better than google's OCR. which is just mind blowing.