Skip to content

Conversation

@lfoppiano
Copy link
Member

@lfoppiano lfoppiano commented May 11, 2025

There is a piece of code:

https://github.com/kermitt2/grobid/blob/92ea31edc391c56f6fd1eab61a16aaab5fda960c/grobid-core/src/main/java/org/grobid/core/document/Document.java#L1276

that decided whether blocks around a figure caption (something starting with fig, and tagged as <figure> is "close" enough to it (15px fixed distance) so that consider it or not as part of the figure. Before #1266, certain part of the document were discarded, however now are added as paragraphs.

Now, this works in many cases, but it relies on the fact that the blocks distance is consistent.

in this article, for example, the figure caption blocks are correctly recognized:

Figure 1 is correctly assembled:

image

Figure 2 is not, because the blocks dimensions are too "far" (15.16px > 15px):

image

By looking at few examples, if and ends near the column, it will make the block larger, and therefore, close to the other...

I did relax the rule of the 15px when the two blocks are at the same height, but is not a solution that will cover all cases.

any comment is welcome 😄

@lfoppiano lfoppiano changed the title Bugfix/adjust block distance when multi columns Relax block distance when multi columns and blocks at the same height May 11, 2025
@lfoppiano
Copy link
Member Author

Here a better solution: #683 (comment)

@coveralls
Copy link

coveralls commented Sep 23, 2025

Coverage Status

coverage: 40.657% (+0.2%) from 40.422%
when pulling af75090 on bugfix/adjust-block-distance-when-multi-columns
into 49df55c on master.

@lfoppiano lfoppiano force-pushed the bugfix/adjust-block-distance-when-multi-columns branch from 43fc01f to 49c3b2b Compare November 11, 2025 10:40
@lfoppiano lfoppiano changed the title Relax block distance when multi columns and blocks at the same height Make block distance dynamic in multi-columns articles Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants