#769 Add EBCDIC processor as a library routine #771

yruslan · 2025-07-29T08:55:06Z

Summary by CodeRabbit

New Features
- Added methods to get and set COBOL field values by name, supporting nested fields.
- Introduced encoding support for COBOL fields, enabling conversion from JVM types to raw data.
- Added handlers for processing COBOL records as arrays or maps.
- Provided a builder for processing COBOL data streams with customizable options.
- Introduced a trait for raw record processing and a stream processor for applying custom logic to records.
- Added a fixed-length record extractor for reading COBOL data.
- Enhanced stream classes with the ability to duplicate streams.
Bug Fixes
- Cleaned up and corrected parameter import paths across multiple files.
Tests
- Added and updated tests to cover new processing features, encoding functionality, and builder logic.
Documentation
- Updated documentation to reflect new methods and clarify parameter usage.

…n-Spark readers.

coderabbitai · 2025-07-29T08:55:13Z

Walkthrough

This update introduces new encoding and record-handling capabilities to the COBOL parser and processor modules. Key changes include the addition of an encoder selection mechanism, new public methods for field value manipulation in copybook records, and several new classes and traits for processing raw records and streams. Numerous test and package import adjustments were made to support these enhancements and maintain consistency across the codebase.

Changes

Cohort / File(s)	Change Summary
Copybook Field Value Access & Mutation `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala`	Added `getFieldValueByName` and `setFieldValueByName` for primitive fields; introduced private `setPrimitiveField` method; updated docs and removed redundant exception annotations.
Encoder Integration for Primitives `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala`, `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala`, `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala`	Integrated encoder selection and assignment into primitive AST construction; extended `Primitive` case class with encoder and metadata fields; updated imports and documentation.
EncoderSelector & Code Page Mapping `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala`, `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala`	Introduced `EncoderSelector` object for COBOL data encoding; added ASCII-to-EBCDIC mapping in `CodePageCommon`.
Record Processing & Handlers `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/ArrayOfAnyHandler.scala`, `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/MapOfAnyHandler.scala`, `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala`, `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala`, `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala`	Added new handler classes for array/map record structures; introduced trait for raw record processing; implemented builder for record processing pipelines; added stream processor for record transformation.
Raw Record Extraction `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala`	Added class for extracting fixed-length raw records from streams, with offset tracking and record caching.
Reader Parameter Package Refactoring `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala`, `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala`	Changed package declarations from `spark.cobol.parameters` to `cobol.reader.parameters`; removed redundant imports.
Stream Copying Support `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala`, `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala`	Added `copyStream()` method to `SimpleStream` trait and implemented it in `FSStream`.
Test Stream Copying `cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala`, `cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala`, `cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala`	Implemented `copyStream()` in test stream classes for testability.
Copybook Field Value Tests `cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala`	Updated tests to use encoder in `Primitive` construction; added test for `setFieldValueByName`.
Record Processor Builder Tests `cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala`	Added comprehensive test suite for `RecordProcessorBuilder`, including processing, schema, and extractor tests.
Spark-Cobol: Parameter Import Refactoring `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/builder/RddReaderParams.scala`, `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala`, `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala`, `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala`, `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala`, `spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSourceSpec.scala`, `spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala`	Unified and corrected imports for parameter-related classes and objects, replacing `spark.cobol.parameters` with `cobol.reader.parameters`.
Spark-Cobol: File Stream Copying `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala`	Added `copyStream()` method to `FileStreamer` for stream duplication.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Builder as RecordProcessorBuilder
    participant Stream as SimpleStream
    participant Extractor as RawRecordExtractor
    participant Processor as RawRecordProcessor
    participant Output as OutputStream

    User->>Builder: set copybookContents, options
    User->>Builder: process(Stream, Output)(Processor)
    Builder->>Extractor: getRecordExtractor(options, Stream)
    Builder->>StreamProcessor: processStream(copybook, options, Stream, Extractor, Processor, Output)
    StreamProcessor->>Extractor: hasNext/next()
    StreamProcessor->>Processor: processRecord(copybook, options, record, offset)
    Processor-->>StreamProcessor: processedRecord
    StreamProcessor->>Output: write(processedRecord)
    StreamProcessor->>Stream: advance to next record
    StreamProcessor->>Output: write footer

sequenceDiagram
    participant Copybook
    participant User

    User->>Copybook: setFieldValueByName(fieldName, recordBytes, value)
    Copybook->>Primitive: getFieldByName(fieldName)
    Copybook->>Primitive: setPrimitiveField(Primitive, recordBytes, value)
    Primitive->>EncoderSelector: encode(value)
    EncoderSelector-->>Primitive: encodedBytes
    Primitive->>recordBytes: update bytes
    Copybook-->>User: updated recordBytes

    User->>Copybook: getFieldValueByName(fieldName, recordBytes)
    Copybook->>Primitive: getFieldByName(fieldName)
    Copybook->>Primitive: extractPrimitiveField(Primitive, recordBytes)
    Primitive->>Decoder: decode(bytes)
    Decoder-->>Primitive: value
    Copybook-->>User: value

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit with code in its paws,
Hopped through the fields of COBOL's laws.
With encoders and streams,
And new handler schemes,
Now bytes dance in orderly rows!
Copybooks sing,
As new features spring—
Oh, what a joy when the carrot bell rings! 🥕

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d99038f and bfe7d70.

📒 Files selected for processing (3)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (1 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1 hunks)

✅ Files skipped from review due to trivial changes (1)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala

🚧 Files skipped from review as they are similar to previous changes (2)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.5.5 on Scala 2.12.20
GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: Spark 3.5.5 on Scala 2.13.16
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/769-ebcdic-processor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary or {PR Summary} to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2025-07-29T08:57:25Z

JaCoCo code coverage report - 'cobol-parser'

Overall Project	91.64% `-0.92%`	🍏
Files changed	77.69%	🍏

File	Coverage
NonTerminalsAdder.scala	100%	🍏
CodePageCommon.scala	100%	🍏
SimpleStream.scala	100%	🍏
EncoderSelector.scala	98.95% `-1.05%`	🍏
FixedRecordLengthRawRecordExtractor.scala	95.19%	🍏
RecordProcessorBuilder.scala	90.45% `-9.55%`	🍏
ParserVisitor.scala	89.49%	🍏
Copybook.scala	81.71% `-7.39%`	🍏
StreamProcessor.scala	78.21% `-21.79%`	🍏
Primitive.scala	74.95%	🍏
ArrayOfAnyHandler.scala	18.75% `-81.25%`	❌
MapOfAnyHandler.scala	0%	❌
FSStream.scala	0% `-5.26%`	❌

github-actions · 2025-07-29T08:57:26Z

JaCoCo code coverage report - 'spark-cobol'

File	Coverage [82.48%]	🍏
CobolSchema.scala	94.54%	🍏
FileStreamer.scala	93.98%	🍏
DefaultSource.scala	90.81%	🍏
RddReaderParams.scala	90.13%	🍏
CobolParametersValidator.scala	74.48%	❌
CobolStreamer.scala	0%	❌

Total Project Coverage	79.89%	🍏

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (8)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)
27-32: Consolidate duplicated imports from CobolParametersParser

You import both the whole object (CobolParametersParser) and a single member (getReaderProperties). One of these is redundant:
-import za.co.absa.cobrix.cobol.reader.parameters.{CobolParametersParser, Parameters}
-import za.co.absa.cobrix.cobol.reader.parameters.CobolParametersParser.getReaderProperties
+import za.co.absa.cobrix.cobol.reader.parameters.{CobolParametersParser => CP, Parameters}
+import CP.getReaderProperties
A single aliased import keeps the namespace flat and avoids accidental shadowing.
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (1)
24-26: Minor import duplication

CobolParametersParser is imported twice (line 24 and via the wildcard in line 25). Remove one to silence IntelliJ/Scalafix “unused import” warnings on some setups.
-import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser, Parameters}
-import za.co.absa.cobrix.cobol.reader.parameters.CobolParametersParser._
+import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser => CP, Parameters}
+import CP._
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/builder/RddReaderParams.scala (1)
19-21: Tidy up duplicate CobolParametersParser imports

Same duplication pattern as in DefaultSource.scala. Consider folding into a single aliased import:
-import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser, Parameters, ReaderParameters}
-import za.co.absa.cobrix.cobol.reader.parameters.CobolParametersParser._
+import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser => CP, Parameters, ReaderParameters}
+import CP._
This is only a style tidy-up; functionality is unaffected.
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
19-21: Remove unused imports.

The imports for FSStream and FileNotFoundException don't appear to be used anywhere in this class.
-import za.co.absa.cobrix.cobol.reader.stream.{FSStream, SimpleStream}
+import za.co.absa.cobrix.cobol.reader.stream.SimpleStream

-import java.io.FileNotFoundException
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (1)
93-110: Fix documentation inconsistency.

The method documentation states @return The value of the field but the method signature returns Unit. The documentation should be corrected to reflect that this is a setter method.
-    * @return The value of the field
+    * This method modifies the record in place and does not return a value.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (2)
28-37: Document current encoding limitations.

The getEncoder method is well-structured but currently only supports AlphaNumeric fields without compact usage. Consider adding documentation to clarify these limitations and future expansion plans.
  def getEncoder(dataType: CobolType,
                 ebcdicCodePage: CodePage = new CodePageCommon,
-                asciiCharset: Charset = StandardCharsets.US_ASCII): Option[Encoder] = {
+                asciiCharset: Charset = StandardCharsets.US_ASCII): Option[Encoder] = {
+    // Currently only supports AlphaNumeric fields without compact usage
     dataType match {
39-56: Consider implementing ASCII encoding or documenting the limitation.

The getStringEncoder method only implements EBCDIC encoding while returning None for ASCII. Consider either implementing ASCII encoding or adding documentation explaining why it's not supported.

For ASCII encoding, you could add:
      case ASCII =>
-        None
+        val encoder = (a: Any) => {
+          val str = a.toString
+          val bytes = str.getBytes(asciiCharset)
+          if (bytes.length > fieldLength) {
+            java.util.Arrays.copyOf(bytes, fieldLength)
+          } else {
+            val padded = new Array[Byte](fieldLength)
+            System.arraycopy(bytes, 0, padded, 0, bytes.length)
+            padded
+          }
+        }
+        Option(encoder)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (1)
107-125: Remove unused type parameter and consider stream efficiency.

The method handles extractor creation well with appropriate fallback logic, but has minor issues:

The ClassTag type parameter T is unused and should be removed

Multiple stream copies are created which may impact performance for large streams
-  private[processor] def getRecordExtractor[T: ClassTag](readerParameters: ReaderParameters, inputStream: SimpleStream): RawRecordExtractor = {
+  private[processor] def getRecordExtractor(readerParameters: ReaderParameters, inputStream: SimpleStream): RawRecordExtractor = {
The multiple stream copies pattern appears intentional for the reader architecture, but consider documenting why multiple copies are necessary.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e993e75 and 2cad3de.

📒 Files selected for processing (29)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala (2 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (2 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/ArrayOfAnyHandler.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/MapOfAnyHandler.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (2 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (1 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (2 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala (3 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala (1 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (2 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/builder/RddReaderParams.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (2 hunks)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSourceSpec.scala (1 hunks)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (12)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (3)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)

copyStream (54-56)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)

copyStream (53-55)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)

copyStream (114-116)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (5)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (3)

classOf (36-48)

classOf (50-56)

classOf (58-61)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)

copyStream (54-56)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)

copyStream (53-55)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)

copyStream (52-54)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)

copyStream (114-116)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (3)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (2)

ByteStreamMock (23-57)

copyStream (54-56)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)

copyStream (52-54)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)

copyStream (114-116)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala (3)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)

cobrix (132-139)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/index/IndexBuilder.scala (10)

cobol (68-107)

cobol (113-133)

cobol (138-153)

cobol (155-175)

cobol (177-199)

cobol (201-265)

cobol (267-287)

cobol (292-297)

cobol (302-314)

cobol (316-318)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1)

CobolParametersParser (39-986)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (3)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)

copyStream (54-56)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)

copyStream (53-55)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)

copyStream (52-54)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (4)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (1)

FSStream (21-62)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)

copyStream (53-55)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)

copyStream (52-54)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)

copyStream (114-116)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (3)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala (1)

CobolParameters (67-109)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1)

CobolParametersParser (39-986)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala (1)

Parameters (27-98)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (4)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)

copyStream (54-56)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)

copyStream (53-55)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)

copyStream (52-54)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)

copyStream (114-116)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (7)

parser (114-168)

parser (170-179)

parser (181-190)

parser (193-254)

parser (256-277)

DecoderSelector (30-352)

getDecoder (57-75)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (2)

EncoderSelector (25-80)

getEncoder (28-37)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala (2)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (5)

processor (97-99)

processor (101-105)

processor (107-125)

processor (127-127)

options (65-68)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala (1)

processRecord (40-42)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (1)

isEndOfStream (30-30)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (3)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/AlphaNumeric.scala (1)

AlphaNumeric (28-36)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (3)

CodePageCommon (24-26)

CodePageCommon (28-82)

asciiToEbcdicMapping (62-81)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/Encoding.scala (2)

ASCII (25-25)

EBCDIC (23-23)

🔇 Additional comments (41)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala (1)

17-17: LGTM! Clean package refactoring.

The package move from za.co.absa.cobrix.spark.cobol.parameters to za.co.absa.cobrix.cobol.reader.parameters aligns with separating reader-specific parameters from Spark integration code, supporting the library routine functionality.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1)

17-17: LGTM! Consistent package restructuring.

The package move aligns with the refactoring to make parameter parsing available as a library routine, consistent with the change in Parameters.scala.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala (1)

20-20: LGTM! Import updated for package refactoring.

The import correctly reflects the new package location after moving parameter classes to za.co.absa.cobrix.cobol.reader.parameters.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala (1)

25-25: LGTM! Import updated for package refactoring.

The import correctly updates to use the new package location for CobolParametersParser constants and methods used in validation.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala (1)

28-28: LGTM! Final import updated for package refactoring.

The import correctly reflects the new package location, completing the consistent refactoring across all files that use CobolParametersParser constants.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSourceSpec.scala (1)

29-29: Import path update looks correct

The new package path reflects the recent refactor and should compile fine. No other adjustments needed in this test file.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala (1)

19-24: Verify immediate header stream closure is intentional.

The constructor immediately closes the header stream on line 24. Ensure this is the intended behavior and that the header stream won't be needed later in the extraction process.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (1)

32-34: LGTM: Clean addition of stream duplication contract.

The addition of the abstract copyStream() method establishes a clear contract for stream duplication across all implementations. The method signature is appropriate with proper exception handling annotation.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)

52-54: LGTM: Consistent implementation of stream duplication.

The copyStream() implementation correctly creates a new instance with the same underlying string data, following the established pattern across other stream implementations.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)

54-56: LGTM: Consistent stream duplication implementation.

The copyStream() method correctly creates a new instance with the same byte array, maintaining consistency with other stream implementations.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)

114-116: LGTM: Proper stream duplication with all parameters preserved.

The copyStream() implementation correctly creates a new FileStreamer instance with all the same parameters (filePath, fileSystem, startOffset, maximumBytes), ensuring the duplicated stream has equivalent configuration.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (2)

19-19: LGTM! Import addition is consistent with method signature.

The FileNotFoundException import aligns with the @throws annotation on the new copyStream() method.

58-61: LGTM! Stream copying implementation is correct.

The copyStream() method correctly creates a new FSStream instance with the same fileName, following the established pattern used in other stream implementations. The @throws(classOf[FileNotFoundException]) annotation is appropriate since file access could fail.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (2)

23-23: LGTM! Import addition supports encoding functionality.

The EncoderSelector import is necessary for the new encoder assignment in the transform method.

77-85: LGTM! Encoder assignment is consistent with decoder pattern.

The encoder is properly obtained using EncoderSelector.getEncoder() with the same data type and code page parameters as the decoder, and correctly passed to the Primitive constructor. This maintains consistency with the bidirectional encoding/decoding capability being added.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)

53-55: LGTM! Stream copying implementation is correct.

The copyStream() method correctly creates a new TestByteStream instance with the same bytes array, maintaining the established pattern for stream duplication across different stream implementations.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (1)

59-81: Confirm full bidirectional mapping correctness

The table matches expected EBCDIC codes for key ASCII bytes (0x20→0x40, 0x30→0xF0, 0x41→0xC1, 0x61→0x81), but my invertibility script couldn’t extract the complete ebcdicToAsciiMapping and reported out-of-bounds. Please ensure there are no gaps in the 256-entry arrays:
Manually verify that for every byte x in 0..255:
val e = asciiToEbcdicMapping(x) & 0xFF
val back = ebcdicToAsciiMapping(e) & 0xFF
assert(back == x)
Add a unit test covering the full round-trip for all 256 values to catch any missing or mis-mapped entries.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)

26-26: LGTM! Import addition supports encoding functionality.

The EncoderSelector import is necessary for the new encoder assignment in the visitPrimitive method.

858-860: LGTM! Encoder integration is consistent and well-implemented.

The Primitive constructor now properly receives both decoder and encoder instances. The encoder uses the same effective code pages (effectiveEbcdicCodePage and effectiveAsciiCharset) as the decoder, ensuring consistency in encoding/decoding operations. This maintains the bidirectional conversion capability being added to the system.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala (1)

21-31: Well-designed functional interface.

The trait follows the strategy pattern effectively, providing a clean contract for record processing implementations. The method signature includes all necessary context (copybook, options, raw record data, and offset) for flexible processing logic.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala (3)

23-23: LGTM: Import updated for encoder support.

The addition of EncoderSelector to the import aligns with the new encoding functionality being added to the codebase.

162-163: LGTM: Primitive construction updated for encoder support.

The constructor call correctly includes the new encoder parameter obtained via EncoderSelector.getEncoder(dataType), maintaining consistency with the updated Primitive class definition.

208-219: LGTM: Comprehensive test for field value mutation.

The new test properly validates the setFieldValueByName functionality by:

Setting a new field value

Verifying the updated value is correctly retrieved

Checking that another field's encoder is empty as expected

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/ArrayOfAnyHandler.scala (1)

22-33: LGTM: Clean RecordHandler implementation for arrays.

The implementation correctly fulfills the RecordHandler[Array[Any]] contract with straightforward method implementations:

create returns the input array directly (appropriate for array representation)

toSeq and foreach provide expected collection operations

Clear documentation explains the array-based approach

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/MapOfAnyHandler.scala (1)

22-40: LGTM: Well-implemented map-based record handler.

The implementation correctly transforms COBOL group structures into maps with thoughtful design:

Maps group children names to their values using zip operation

Properly handles Array[Any] to Seq conversion for nested structures

toSeq and foreach operations work on map values as expected

Clear documentation explains the map-based field representation approach

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala (3)

28-54: LGTM: Comprehensive end-to-end processing test.

The test effectively validates the complete processing pipeline:

Uses a realistic custom RawRecordProcessor that transforms each byte

Verifies the processed output matches expected transformed values

Tests the integration between builder, processor, and stream handling

56-74: LGTM: Good coverage of builder configuration methods.

The tests properly validate:

Schema extraction from copybook contents

Reader parameter generation with custom options

Option handling and retrieval

76-116: LGTM: Thorough testing of record extractor creation.

The test suite comprehensively covers:

Fixed-length record extractor creation and iteration

Variable-length record extractor creation

Error handling for unsupported configurations with descriptive error messages

Different record format scenarios

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala (4)

20-20: LGTM: Import addition supports new encoding functionality.

The addition of EncoderSelector to the imports is necessary for the new encode field and is appropriately grouped with related decoder imports.

24-40: Excellent documentation enhancement.

The comprehensive parameter documentation significantly improves code readability and maintainability. The documentation clearly explains the purpose of each parameter, including the new encode field for bidirectional data conversion.

56-56: Well-designed encoder field addition.

The optional encode field appropriately mirrors the existing decode field pattern and enables bidirectional data conversion. Making it optional is a sound design decision as not all primitive fields may require encoding capabilities.

111-111: Note: @throws annotation removed.

The @throws annotation was removed from the decodeTypeValue method signature. Ensure this change aligns with the actual exception handling behavior of the method.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2)

74-91: Well-implemented field value getter method.

The getFieldValueByName method follows good design patterns by leveraging existing getFieldByName functionality and properly validating field types. The error handling is appropriate with descriptive messages.

217-246: Excellent defensive programming in setPrimitiveField.

The method demonstrates excellent defensive programming with comprehensive validation:

Encoder presence validation

Bounds checking to prevent buffer overflows

Size validation to ensure data integrity

Clear, descriptive error messages for debugging

The use of System.arraycopy for byte manipulation is efficient and appropriate.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (2)

19-26: Well-structured foundation for encoding functionality.

The Encoder type alias and imports are well-organized and provide a clean foundation for the encoding functionality. The type signature Any => Array[Byte] appropriately captures the encoding contract.

58-78: Robust EBCDIC string encoding implementation.

The encodeEbcdicString method demonstrates excellent defensive programming:

Proper input validation with descriptive error messages

Safe byte conversion using modulo arithmetic to handle negative values

Correct handling of strings shorter than the target length

Efficient implementation using while loop

The method correctly pads with zeros when the string is shorter than the target length, which is appropriate for COBOL field formatting.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala (1)

36-64: Add StreamProcessor test coverage for edge cases

There are no existing unit tests exercising StreamProcessor.processStream, so we need explicit test cases to validate its header/footer logic and guard against negative sizes or ordering issues.

Please add tests covering:

Streams with no headers or footers (single record only)

Multiple back-to-back records with zero‐byte gaps

Cases where headerSize would calculate as ≤ 0 (should skip header)

Proper consumption order: raw header → raw record → processed record

Footer handling when inputStream.size - inputStream.offset is zero or positive

Target the processStream method in
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (4)

30-45: Well-structured builder pattern implementation.

The companion object factory method and constructor provide a clean interface for creating builder instances. Using a case-insensitive options map is appropriate for configuration management.

47-68: Excellent fluent interface for option management.

The option and options methods provide a clean fluent interface with consistent case-insensitive key handling. The implementation correctly supports method chaining and bulk option assignment.

70-95: Excellent orchestration and resource management.

The process method effectively orchestrates the processing pipeline with proper component creation and coordination. The use of copyStream() ensures stream independence, and the try-finally block provides appropriate resource cleanup.

97-105: Clean helper methods with appropriate visibility.

The helper methods getCobolSchema and getReaderParameters provide clean abstractions and are appropriately marked as package-private for testing access. The implementation correctly leverages existing parsing infrastructure.

...cala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala

yruslan added 5 commits July 21, 2025 09:23

#769 Add a placeholder for encoders to be available in AST.

c2db7a1

#769 Add the encoder for EBCDIC common code page.

b703a40

#769 Move reader parameters parser to the parser module to support no…

2c068b7

…n-Spark readers.

#769 Add library methods to implement ebcdic to ebcdic file processor.

a0f4b2a

#769 Fix the processor not processing the last record, add unit tests.

2cad3de

coderabbitai bot reviewed Jul 29, 2025

View reviewed changes

yruslan added 2 commits July 30, 2025 10:36

#769 Remove redundant import.

d99038f

#769 Fix PR nitpick suggestions from CoPilot PR review.

bfe7d70

yruslan merged commit f899488 into master Jul 31, 2025
7 checks passed

yruslan deleted the feature/769-ebcdic-processor branch July 31, 2025 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#769 Add EBCDIC processor as a library routine #771

#769 Add EBCDIC processor as a library routine #771

Uh oh!

yruslan commented Jul 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 29, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

github-actions bot commented Jul 29, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

#769 Add EBCDIC processor as a library routine #771

#769 Add EBCDIC processor as a library routine #771

Uh oh!

Conversation

yruslan commented Jul 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

github-actions bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'cobol-parser'

Uh oh!

github-actions bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'spark-cobol'

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yruslan commented Jul 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 29, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

github-actions bot commented Jul 29, 2025 •

edited

Loading

github-actions bot commented Jul 29, 2025 •

edited

Loading