Love, peace, and cat
Object 320 is about to enter Beta. It will be beyond Open-set and likely beyond OCR. Stay tunned.
320 finally starts to converge--- took me a while to hunt down the bugs...
No promise--- but i will try to contain 320 training in a single T4
vram signature of Object 320 is kinda wild (Full scale, training mode, testing mode will be light as always)
LSCT has finally been 50% modernized, hurrah!
There are alot happend behind the scene, but it should be worth the long journey.
Object 32x will come with a much more flexible framework (and lsct has be modernized), stay tunned :)
Live demo GUI of our icdar2025 paper (Object 310) will be updated soon.
Object 313 is still undergoing analysis. Will release after getting better consistency
Object 320 is reaching first prototype phase soon
Object Watch-and-Act (Object 310) has been released and can do smth VLMs can't. Much better than C4.
See you in Wuhan!
https://github.com/lancercat/wna
synthetic cotraining (C4) and multi-part representation (OAPR) will return in 32x. Hopefully in the first release.
Object 35x will come in early/mid 2026 (maybe). Let's move beyond OCR
32x will be a full fledged MARL system (likely).
Watch-and-Act+ (Object 313) is feature complete. We get some mild performance improvements beyond object 310
Development efforts now go to Object 32x, where we will stage a more flexible routing framework with a more inclusive protocol
Starting to document Watch-and-Act (Object 310), which is fully inductive and much more powerful than CFOR.
See you in Wuhan.
Branched for CFOR. Cleaning starts.
seriously speaking object 310 is far better that CFOR even with an inductive setup...
Plus LSCT is already modernized... So not really motivated to clean up this legacy... But a promise is a promise afterall.
framework 320 is taking form, after that i will go on clean up CFOR training code... I didn't forget it, just too may stuff going on and too tired to do the cleaning during weekends.
CFOR training code should be available one week after icdar ddl... (sigh)
OpenCCD (VSDF) is returning to framework NG, in a slightly different form
Object 310 is undergoing final reproducibility checking.... After this I should have the time to clean up C4.
Object 310 is happening. I want to put some sneak peeks but i cannot...
Writing proposal && a paper
Will proceed to release CFOR training code/data/documents once these shenenigans settle down...
That's why I only put a Q1-2025 DDL when its just a few days of work... You just don't know what other tasks can fly in...
The next release is internally frozen. There are several deep and winding rabbit holes to be dig into in the future works.
Expect CFOR level generalization performance while being fully inductive, and a big leap from Moose.
Revealing more will break anonymity so pardon me being vague.
Have to say that time flies fast.
The next release still needs to wait.
We are refactoring the full framework for a more elegant implement of [something]. ETA one or two months. Please also expect a big performance leap :)
Framework NG-> NG+
The next release will support bf16 and multi-gpu inference.
Multi-gpu training will be delayed to a future release (in a less usual manner).
The next release is scheduled in November, with more languages and better model flexibility.
Datasets start being added. Tuning for performance.
It's happening (maybe).
Proceeding to adopt new datasets like FudanVI, Union14M, and others.
Now we have some powerful devices. Time to scale up.
Due to a specific application need, C4-family may receive a weird DLC (object 282 305) in the next months,
which means we may or may not get the first of the "far more intresting stuffs" finished befor the AAAI ddl.
I would not say it is not interesting, it is,
but it still would only concern the OCR community, hence not that significant.
C4/LSCT family will be finally coming to light if I can pull off the Major Revision.
The stage has been set, buckle up.
And from stage 2 of the NG framework we are considering to stop support for GPUs with less than 16 GiB of VRAM
Code cleaned up and verified for Moose.
Start documentation process and quality check procedure.
Once done, we will start uploading things onto kaggle and github (hopefully before mayday).
OAPR released. Note it is still built with the first generation framework. The NG framework starts with Moose.
We started to tidy up and will release the first version of the NG-framework (likely before May).
See you in ICDAR24.
BTW, the whole NG framework is current going towards version two, and there will be a planned version 3 by halloween.
Hope we can show you ppl some thing far more interesting than this in near future.
Another project is about to reach training stage (coding is mostly done), which brings some interesting new features.
Hope we can get some results in May.
Happy the ⑨th day of a month~
The QA DLC will be delayed, as we find few benchmarks for an open-end VQA model.
The next relase will still be solely single-gpu OCR.
But changes are indeed happening, they are just slower than expected (partially bcs I am still adapting to the new lifestyle).
Allow me to expand this repo to QA a bit, before we resume on the multi-gpu approach.
The next revision will hopefully be ready by April Fool's day.
The core of the NG framework is taking shape.
The NG framework will come with a lot of DLCs, so expect some weird [optional] dependencies :-)
The NG framework may require pytorch>=2.1
The sparsity feature is not used in the end. The first method based on the NG framework is under ablative, see you ppl soon.
NG framework will be delayed for a while due to all kinds of paperwork, writing, and relocation preparations... The framework itself is 80% done (usable if you don't need multigpu that is), but I have no time and resources to deploy and tune it.
Spoiler: The NG framework will natively support multi-GPU parallel training but in a really weird way.
(you can have your guess now, we will hopefully ship that part in late 2024 if smooth.)
10GB cards will be supported till 2025.
We will drop the support of 8Gib GPUs for training in future iterations. We will try to make regular models fit into 10Gib cards for training (we will know by halloween)...
We recommend moving to 24Gb+ GPUs, like x090s, P40s, and M40s.
Considering the prices of P40s and M40s are pretty affordable these days, we will gradually drop training supports for 8Gib GPUs.
The training bar for future regular models would be 10Gib (p102, 1080tis, M40s), and 20Gib for large models.
The inferencing cost will also go up, however, we will make sure to support inference on 6G cards like P106-100s.
Hi community,
We are going to make a new multi-lingual OSOCR benchmark set.
We want to collect some ideas for the language list that people may want to recognize.
If you have a specific language in mind, please open an issue. We will pick some of the suggested languages to collect data and annotate.