Skip to content

#769 Add EBCDIC processor as a library routine #771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 31, 2025

Conversation

yruslan
Copy link
Collaborator

@yruslan yruslan commented Jul 29, 2025

Summary by CodeRabbit

  • New Features

    • Added methods to get and set COBOL field values by name, supporting nested fields.
    • Introduced encoding support for COBOL fields, enabling conversion from JVM types to raw data.
    • Added handlers for processing COBOL records as arrays or maps.
    • Provided a builder for processing COBOL data streams with customizable options.
    • Introduced a trait for raw record processing and a stream processor for applying custom logic to records.
    • Added a fixed-length record extractor for reading COBOL data.
    • Enhanced stream classes with the ability to duplicate streams.
  • Bug Fixes

    • Cleaned up and corrected parameter import paths across multiple files.
  • Tests

    • Added and updated tests to cover new processing features, encoding functionality, and builder logic.
  • Documentation

    • Updated documentation to reflect new methods and clarify parameter usage.

Copy link

coderabbitai bot commented Jul 29, 2025

Walkthrough

This update introduces new encoding and record-handling capabilities to the COBOL parser and processor modules. Key changes include the addition of an encoder selection mechanism, new public methods for field value manipulation in copybook records, and several new classes and traits for processing raw records and streams. Numerous test and package import adjustments were made to support these enhancements and maintain consistency across the codebase.

Changes

Cohort / File(s) Change Summary
Copybook Field Value Access & Mutation
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala
Added getFieldValueByName and setFieldValueByName for primitive fields; introduced private setPrimitiveField method; updated docs and removed redundant exception annotations.
Encoder Integration for Primitives
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala,
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala,
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala
Integrated encoder selection and assignment into primitive AST construction; extended Primitive case class with encoder and metadata fields; updated imports and documentation.
EncoderSelector & Code Page Mapping
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala,
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala
Introduced EncoderSelector object for COBOL data encoding; added ASCII-to-EBCDIC mapping in CodePageCommon.
Record Processing & Handlers
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/ArrayOfAnyHandler.scala,
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/MapOfAnyHandler.scala,
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala,
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala,
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala
Added new handler classes for array/map record structures; introduced trait for raw record processing; implemented builder for record processing pipelines; added stream processor for record transformation.
Raw Record Extraction
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala
Added class for extracting fixed-length raw records from streams, with offset tracking and record caching.
Reader Parameter Package Refactoring
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala,
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala
Changed package declarations from spark.cobol.parameters to cobol.reader.parameters; removed redundant imports.
Stream Copying Support
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala,
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala
Added copyStream() method to SimpleStream trait and implemented it in FSStream.
Test Stream Copying
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala,
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala,
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala
Implemented copyStream() in test stream classes for testability.
Copybook Field Value Tests
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala
Updated tests to use encoder in Primitive construction; added test for setFieldValueByName.
Record Processor Builder Tests
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala
Added comprehensive test suite for RecordProcessorBuilder, including processing, schema, and extractor tests.
Spark-Cobol: Parameter Import Refactoring
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/builder/RddReaderParams.scala,
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala,
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala,
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala,
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala,
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSourceSpec.scala,
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala
Unified and corrected imports for parameter-related classes and objects, replacing spark.cobol.parameters with cobol.reader.parameters.
Spark-Cobol: File Stream Copying
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala
Added copyStream() method to FileStreamer for stream duplication.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Builder as RecordProcessorBuilder
    participant Stream as SimpleStream
    participant Extractor as RawRecordExtractor
    participant Processor as RawRecordProcessor
    participant Output as OutputStream

    User->>Builder: set copybookContents, options
    User->>Builder: process(Stream, Output)(Processor)
    Builder->>Extractor: getRecordExtractor(options, Stream)
    Builder->>StreamProcessor: processStream(copybook, options, Stream, Extractor, Processor, Output)
    StreamProcessor->>Extractor: hasNext/next()
    StreamProcessor->>Processor: processRecord(copybook, options, record, offset)
    Processor-->>StreamProcessor: processedRecord
    StreamProcessor->>Output: write(processedRecord)
    StreamProcessor->>Stream: advance to next record
    StreamProcessor->>Output: write footer
Loading
sequenceDiagram
    participant Copybook
    participant User

    User->>Copybook: setFieldValueByName(fieldName, recordBytes, value)
    Copybook->>Primitive: getFieldByName(fieldName)
    Copybook->>Primitive: setPrimitiveField(Primitive, recordBytes, value)
    Primitive->>EncoderSelector: encode(value)
    EncoderSelector-->>Primitive: encodedBytes
    Primitive->>recordBytes: update bytes
    Copybook-->>User: updated recordBytes

    User->>Copybook: getFieldValueByName(fieldName, recordBytes)
    Copybook->>Primitive: getFieldByName(fieldName)
    Copybook->>Primitive: extractPrimitiveField(Primitive, recordBytes)
    Primitive->>Decoder: decode(bytes)
    Decoder-->>Primitive: value
    Copybook-->>User: value
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit with code in its paws,
Hopped through the fields of COBOL's laws.
With encoders and streams,
And new handler schemes,
Now bytes dance in orderly rows!
Copybooks sing,
As new features spring—
Oh, what a joy when the carrot bell rings! 🥕

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d99038f and bfe7d70.

📒 Files selected for processing (3)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (1 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala
🚧 Files skipped from review as they are similar to previous changes (2)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Spark 3.5.5 on Scala 2.12.20
  • GitHub Check: Spark 3.4.4 on Scala 2.12.20
  • GitHub Check: Spark 2.4.8 on Scala 2.11.12
  • GitHub Check: Spark 3.5.5 on Scala 2.13.16
  • GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/769-ebcdic-processor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary or {PR Summary} to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

github-actions bot commented Jul 29, 2025

JaCoCo code coverage report - 'cobol-parser'

Overall Project 91.64% -0.92% 🍏
Files changed 77.69% 🍏

File Coverage
NonTerminalsAdder.scala 100% 🍏
CodePageCommon.scala 100% 🍏
SimpleStream.scala 100% 🍏
EncoderSelector.scala 98.95% -1.05% 🍏
FixedRecordLengthRawRecordExtractor.scala 95.19% 🍏
RecordProcessorBuilder.scala 90.45% -9.55% 🍏
ParserVisitor.scala 89.49% 🍏
Copybook.scala 81.71% -7.39% 🍏
StreamProcessor.scala 78.21% -21.79% 🍏
Primitive.scala 74.95% 🍏
ArrayOfAnyHandler.scala 18.75% -81.25%
MapOfAnyHandler.scala 0%
FSStream.scala 0% -5.26%

Copy link

github-actions bot commented Jul 29, 2025

JaCoCo code coverage report - 'spark-cobol'

File Coverage [82.48%] 🍏
CobolSchema.scala 94.54% 🍏
FileStreamer.scala 93.98% 🍏
DefaultSource.scala 90.81% 🍏
RddReaderParams.scala 90.13% 🍏
CobolParametersValidator.scala 74.48%
CobolStreamer.scala 0%
Total Project Coverage 79.89% 🍏

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (8)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)

27-32: Consolidate duplicated imports from CobolParametersParser

You import both the whole object (CobolParametersParser) and a single member (getReaderProperties). One of these is redundant:

-import za.co.absa.cobrix.cobol.reader.parameters.{CobolParametersParser, Parameters}
-import za.co.absa.cobrix.cobol.reader.parameters.CobolParametersParser.getReaderProperties
+import za.co.absa.cobrix.cobol.reader.parameters.{CobolParametersParser => CP, Parameters}
+import CP.getReaderProperties

A single aliased import keeps the namespace flat and avoids accidental shadowing.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (1)

24-26: Minor import duplication

CobolParametersParser is imported twice (line 24 and via the wildcard in line 25). Remove one to silence IntelliJ/Scalafix “unused import” warnings on some setups.

-import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser, Parameters}
-import za.co.absa.cobrix.cobol.reader.parameters.CobolParametersParser._
+import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser => CP, Parameters}
+import CP._
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/builder/RddReaderParams.scala (1)

19-21: Tidy up duplicate CobolParametersParser imports

Same duplication pattern as in DefaultSource.scala. Consider folding into a single aliased import:

-import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser, Parameters, ReaderParameters}
-import za.co.absa.cobrix.cobol.reader.parameters.CobolParametersParser._
+import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser => CP, Parameters, ReaderParameters}
+import CP._

This is only a style tidy-up; functionality is unaffected.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)

19-21: Remove unused imports.

The imports for FSStream and FileNotFoundException don't appear to be used anywhere in this class.

-import za.co.absa.cobrix.cobol.reader.stream.{FSStream, SimpleStream}
+import za.co.absa.cobrix.cobol.reader.stream.SimpleStream

-import java.io.FileNotFoundException
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (1)

93-110: Fix documentation inconsistency.

The method documentation states @return The value of the field but the method signature returns Unit. The documentation should be corrected to reflect that this is a setter method.

-    * @return The value of the field
+    * This method modifies the record in place and does not return a value.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (2)

28-37: Document current encoding limitations.

The getEncoder method is well-structured but currently only supports AlphaNumeric fields without compact usage. Consider adding documentation to clarify these limitations and future expansion plans.

  def getEncoder(dataType: CobolType,
                 ebcdicCodePage: CodePage = new CodePageCommon,
-                asciiCharset: Charset = StandardCharsets.US_ASCII): Option[Encoder] = {
+                asciiCharset: Charset = StandardCharsets.US_ASCII): Option[Encoder] = {
+    // Currently only supports AlphaNumeric fields without compact usage
     dataType match {

39-56: Consider implementing ASCII encoding or documenting the limitation.

The getStringEncoder method only implements EBCDIC encoding while returning None for ASCII. Consider either implementing ASCII encoding or adding documentation explaining why it's not supported.

For ASCII encoding, you could add:

      case ASCII =>
-        None
+        val encoder = (a: Any) => {
+          val str = a.toString
+          val bytes = str.getBytes(asciiCharset)
+          if (bytes.length > fieldLength) {
+            java.util.Arrays.copyOf(bytes, fieldLength)
+          } else {
+            val padded = new Array[Byte](fieldLength)
+            System.arraycopy(bytes, 0, padded, 0, bytes.length)
+            padded
+          }
+        }
+        Option(encoder)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (1)

107-125: Remove unused type parameter and consider stream efficiency.

The method handles extractor creation well with appropriate fallback logic, but has minor issues:

  1. The ClassTag type parameter T is unused and should be removed
  2. Multiple stream copies are created which may impact performance for large streams
-  private[processor] def getRecordExtractor[T: ClassTag](readerParameters: ReaderParameters, inputStream: SimpleStream): RawRecordExtractor = {
+  private[processor] def getRecordExtractor(readerParameters: ReaderParameters, inputStream: SimpleStream): RawRecordExtractor = {

The multiple stream copies pattern appears intentional for the reader architecture, but consider documenting why multiple copies are necessary.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e993e75 and 2cad3de.

📒 Files selected for processing (29)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/ArrayOfAnyHandler.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/MapOfAnyHandler.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (2 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (1 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (2 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala (3 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala (1 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (2 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/builder/RddReaderParams.scala (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (2 hunks)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSourceSpec.scala (1 hunks)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (12)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (3)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
  • copyStream (54-56)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
  • copyStream (53-55)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
  • copyStream (114-116)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (5)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (3)
  • classOf (36-48)
  • classOf (50-56)
  • classOf (58-61)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
  • copyStream (54-56)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
  • copyStream (53-55)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
  • copyStream (52-54)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
  • copyStream (114-116)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (3)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (2)
  • ByteStreamMock (23-57)
  • copyStream (54-56)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
  • copyStream (52-54)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
  • copyStream (114-116)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala (3)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)
  • cobrix (132-139)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/index/IndexBuilder.scala (10)
  • cobol (68-107)
  • cobol (113-133)
  • cobol (138-153)
  • cobol (155-175)
  • cobol (177-199)
  • cobol (201-265)
  • cobol (267-287)
  • cobol (292-297)
  • cobol (302-314)
  • cobol (316-318)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1)
  • CobolParametersParser (39-986)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (3)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
  • copyStream (54-56)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
  • copyStream (53-55)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
  • copyStream (52-54)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (4)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (1)
  • FSStream (21-62)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
  • copyStream (53-55)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
  • copyStream (52-54)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
  • copyStream (114-116)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (3)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala (1)
  • CobolParameters (67-109)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1)
  • CobolParametersParser (39-986)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala (1)
  • Parameters (27-98)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (4)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
  • copyStream (54-56)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
  • copyStream (53-55)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
  • copyStream (52-54)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
  • copyStream (114-116)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (7)
  • parser (114-168)
  • parser (170-179)
  • parser (181-190)
  • parser (193-254)
  • parser (256-277)
  • DecoderSelector (30-352)
  • getDecoder (57-75)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (2)
  • EncoderSelector (25-80)
  • getEncoder (28-37)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (5)
  • processor (97-99)
  • processor (101-105)
  • processor (107-125)
  • processor (127-127)
  • options (65-68)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala (1)
  • processRecord (40-42)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (1)
  • isEndOfStream (30-30)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (3)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/AlphaNumeric.scala (1)
  • AlphaNumeric (28-36)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (3)
  • CodePageCommon (24-26)
  • CodePageCommon (28-82)
  • asciiToEbcdicMapping (62-81)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/Encoding.scala (2)
  • ASCII (25-25)
  • EBCDIC (23-23)
🔇 Additional comments (41)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala (1)

17-17: LGTM! Clean package refactoring.

The package move from za.co.absa.cobrix.spark.cobol.parameters to za.co.absa.cobrix.cobol.reader.parameters aligns with separating reader-specific parameters from Spark integration code, supporting the library routine functionality.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1)

17-17: LGTM! Consistent package restructuring.

The package move aligns with the refactoring to make parameter parsing available as a library routine, consistent with the change in Parameters.scala.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala (1)

20-20: LGTM! Import updated for package refactoring.

The import correctly reflects the new package location after moving parameter classes to za.co.absa.cobrix.cobol.reader.parameters.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala (1)

25-25: LGTM! Import updated for package refactoring.

The import correctly updates to use the new package location for CobolParametersParser constants and methods used in validation.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala (1)

28-28: LGTM! Final import updated for package refactoring.

The import correctly reflects the new package location, completing the consistent refactoring across all files that use CobolParametersParser constants.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSourceSpec.scala (1)

29-29: Import path update looks correct

The new package path reflects the recent refactor and should compile fine. No other adjustments needed in this test file.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala (1)

19-24: Verify immediate header stream closure is intentional.

The constructor immediately closes the header stream on line 24. Ensure this is the intended behavior and that the header stream won't be needed later in the extraction process.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (1)

32-34: LGTM: Clean addition of stream duplication contract.

The addition of the abstract copyStream() method establishes a clear contract for stream duplication across all implementations. The method signature is appropriate with proper exception handling annotation.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)

52-54: LGTM: Consistent implementation of stream duplication.

The copyStream() implementation correctly creates a new instance with the same underlying string data, following the established pattern across other stream implementations.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)

54-56: LGTM: Consistent stream duplication implementation.

The copyStream() method correctly creates a new instance with the same byte array, maintaining consistency with other stream implementations.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)

114-116: LGTM: Proper stream duplication with all parameters preserved.

The copyStream() implementation correctly creates a new FileStreamer instance with all the same parameters (filePath, fileSystem, startOffset, maximumBytes), ensuring the duplicated stream has equivalent configuration.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (2)

19-19: LGTM! Import addition is consistent with method signature.

The FileNotFoundException import aligns with the @throws annotation on the new copyStream() method.


58-61: LGTM! Stream copying implementation is correct.

The copyStream() method correctly creates a new FSStream instance with the same fileName, following the established pattern used in other stream implementations. The @throws(classOf[FileNotFoundException]) annotation is appropriate since file access could fail.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (2)

23-23: LGTM! Import addition supports encoding functionality.

The EncoderSelector import is necessary for the new encoder assignment in the transform method.


77-85: LGTM! Encoder assignment is consistent with decoder pattern.

The encoder is properly obtained using EncoderSelector.getEncoder() with the same data type and code page parameters as the decoder, and correctly passed to the Primitive constructor. This maintains consistency with the bidirectional encoding/decoding capability being added.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)

53-55: LGTM! Stream copying implementation is correct.

The copyStream() method correctly creates a new TestByteStream instance with the same bytes array, maintaining the established pattern for stream duplication across different stream implementations.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (1)

59-81: Confirm full bidirectional mapping correctness

The table matches expected EBCDIC codes for key ASCII bytes (0x20→0x40, 0x30→0xF0, 0x41→0xC1, 0x61→0x81), but my invertibility script couldn’t extract the complete ebcdicToAsciiMapping and reported out-of-bounds. Please ensure there are no gaps in the 256-entry arrays:

  • Manually verify that for every byte x in 0..255:
    val e = asciiToEbcdicMapping(x) & 0xFF
    val back = ebcdicToAsciiMapping(e) & 0xFF
    assert(back == x)
  • Add a unit test covering the full round-trip for all 256 values to catch any missing or mis-mapped entries.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)

26-26: LGTM! Import addition supports encoding functionality.

The EncoderSelector import is necessary for the new encoder assignment in the visitPrimitive method.


858-860: LGTM! Encoder integration is consistent and well-implemented.

The Primitive constructor now properly receives both decoder and encoder instances. The encoder uses the same effective code pages (effectiveEbcdicCodePage and effectiveAsciiCharset) as the decoder, ensuring consistency in encoding/decoding operations. This maintains the bidirectional conversion capability being added to the system.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala (1)

21-31: Well-designed functional interface.

The trait follows the strategy pattern effectively, providing a clean contract for record processing implementations. The method signature includes all necessary context (copybook, options, raw record data, and offset) for flexible processing logic.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala (3)

23-23: LGTM: Import updated for encoder support.

The addition of EncoderSelector to the import aligns with the new encoding functionality being added to the codebase.


162-163: LGTM: Primitive construction updated for encoder support.

The constructor call correctly includes the new encoder parameter obtained via EncoderSelector.getEncoder(dataType), maintaining consistency with the updated Primitive class definition.


208-219: LGTM: Comprehensive test for field value mutation.

The new test properly validates the setFieldValueByName functionality by:

  • Setting a new field value
  • Verifying the updated value is correctly retrieved
  • Checking that another field's encoder is empty as expected
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/ArrayOfAnyHandler.scala (1)

22-33: LGTM: Clean RecordHandler implementation for arrays.

The implementation correctly fulfills the RecordHandler[Array[Any]] contract with straightforward method implementations:

  • create returns the input array directly (appropriate for array representation)
  • toSeq and foreach provide expected collection operations
  • Clear documentation explains the array-based approach
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/MapOfAnyHandler.scala (1)

22-40: LGTM: Well-implemented map-based record handler.

The implementation correctly transforms COBOL group structures into maps with thoughtful design:

  • Maps group children names to their values using zip operation
  • Properly handles Array[Any] to Seq conversion for nested structures
  • toSeq and foreach operations work on map values as expected
  • Clear documentation explains the map-based field representation approach
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala (3)

28-54: LGTM: Comprehensive end-to-end processing test.

The test effectively validates the complete processing pipeline:

  • Uses a realistic custom RawRecordProcessor that transforms each byte
  • Verifies the processed output matches expected transformed values
  • Tests the integration between builder, processor, and stream handling

56-74: LGTM: Good coverage of builder configuration methods.

The tests properly validate:

  • Schema extraction from copybook contents
  • Reader parameter generation with custom options
  • Option handling and retrieval

76-116: LGTM: Thorough testing of record extractor creation.

The test suite comprehensively covers:

  • Fixed-length record extractor creation and iteration
  • Variable-length record extractor creation
  • Error handling for unsupported configurations with descriptive error messages
  • Different record format scenarios
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala (4)

20-20: LGTM: Import addition supports new encoding functionality.

The addition of EncoderSelector to the imports is necessary for the new encode field and is appropriately grouped with related decoder imports.


24-40: Excellent documentation enhancement.

The comprehensive parameter documentation significantly improves code readability and maintainability. The documentation clearly explains the purpose of each parameter, including the new encode field for bidirectional data conversion.


56-56: Well-designed encoder field addition.

The optional encode field appropriately mirrors the existing decode field pattern and enables bidirectional data conversion. Making it optional is a sound design decision as not all primitive fields may require encoding capabilities.


111-111: Note: @throws annotation removed.

The @throws annotation was removed from the decodeTypeValue method signature. Ensure this change aligns with the actual exception handling behavior of the method.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2)

74-91: Well-implemented field value getter method.

The getFieldValueByName method follows good design patterns by leveraging existing getFieldByName functionality and properly validating field types. The error handling is appropriate with descriptive messages.


217-246: Excellent defensive programming in setPrimitiveField.

The method demonstrates excellent defensive programming with comprehensive validation:

  • Encoder presence validation
  • Bounds checking to prevent buffer overflows
  • Size validation to ensure data integrity
  • Clear, descriptive error messages for debugging

The use of System.arraycopy for byte manipulation is efficient and appropriate.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (2)

19-26: Well-structured foundation for encoding functionality.

The Encoder type alias and imports are well-organized and provide a clean foundation for the encoding functionality. The type signature Any => Array[Byte] appropriately captures the encoding contract.


58-78: Robust EBCDIC string encoding implementation.

The encodeEbcdicString method demonstrates excellent defensive programming:

  • Proper input validation with descriptive error messages
  • Safe byte conversion using modulo arithmetic to handle negative values
  • Correct handling of strings shorter than the target length
  • Efficient implementation using while loop

The method correctly pads with zeros when the string is shorter than the target length, which is appropriate for COBOL field formatting.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala (1)

36-64: Add StreamProcessor test coverage for edge cases

There are no existing unit tests exercising StreamProcessor.processStream, so we need explicit test cases to validate its header/footer logic and guard against negative sizes or ordering issues.

Please add tests covering:

  • Streams with no headers or footers (single record only)
  • Multiple back-to-back records with zero‐byte gaps
  • Cases where headerSize would calculate as ≤ 0 (should skip header)
  • Proper consumption order: raw header → raw record → processed record
  • Footer handling when inputStream.size - inputStream.offset is zero or positive

Target the processStream method in
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (4)

30-45: Well-structured builder pattern implementation.

The companion object factory method and constructor provide a clean interface for creating builder instances. Using a case-insensitive options map is appropriate for configuration management.


47-68: Excellent fluent interface for option management.

The option and options methods provide a clean fluent interface with consistent case-insensitive key handling. The implementation correctly supports method chaining and bulk option assignment.


70-95: Excellent orchestration and resource management.

The process method effectively orchestrates the processing pipeline with proper component creation and coordination. The use of copyStream() ensures stream independence, and the try-finally block provides appropriate resource cleanup.


97-105: Clean helper methods with appropriate visibility.

The helper methods getCobolSchema and getReaderParameters provide clean abstractions and are appropriately marked as package-private for testing access. The implementation correctly leverages existing parsing infrastructure.

@yruslan yruslan merged commit f899488 into master Jul 31, 2025
7 checks passed
@yruslan yruslan deleted the feature/769-ebcdic-processor branch July 31, 2025 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant