-
Notifications
You must be signed in to change notification settings - Fork 82
#769 Add EBCDIC processor as a library routine #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…n-Spark readers.
WalkthroughThis update introduces new encoding and record-handling capabilities to the COBOL parser and processor modules. Key changes include the addition of an encoder selection mechanism, new public methods for field value manipulation in copybook records, and several new classes and traits for processing raw records and streams. Numerous test and package import adjustments were made to support these enhancements and maintain consistency across the codebase. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Builder as RecordProcessorBuilder
participant Stream as SimpleStream
participant Extractor as RawRecordExtractor
participant Processor as RawRecordProcessor
participant Output as OutputStream
User->>Builder: set copybookContents, options
User->>Builder: process(Stream, Output)(Processor)
Builder->>Extractor: getRecordExtractor(options, Stream)
Builder->>StreamProcessor: processStream(copybook, options, Stream, Extractor, Processor, Output)
StreamProcessor->>Extractor: hasNext/next()
StreamProcessor->>Processor: processRecord(copybook, options, record, offset)
Processor-->>StreamProcessor: processedRecord
StreamProcessor->>Output: write(processedRecord)
StreamProcessor->>Stream: advance to next record
StreamProcessor->>Output: write footer
sequenceDiagram
participant Copybook
participant User
User->>Copybook: setFieldValueByName(fieldName, recordBytes, value)
Copybook->>Primitive: getFieldByName(fieldName)
Copybook->>Primitive: setPrimitiveField(Primitive, recordBytes, value)
Primitive->>EncoderSelector: encode(value)
EncoderSelector-->>Primitive: encodedBytes
Primitive->>recordBytes: update bytes
Copybook-->>User: updated recordBytes
User->>Copybook: getFieldValueByName(fieldName, recordBytes)
Copybook->>Primitive: getFieldByName(fieldName)
Copybook->>Primitive: extractPrimitiveField(Primitive, recordBytes)
Primitive->>Decoder: decode(bytes)
Decoder-->>Primitive: value
Copybook-->>User: value
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
JaCoCo code coverage report - 'cobol-parser'
|
JaCoCo code coverage report - 'spark-cobol'
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (8)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)
27-32
: Consolidate duplicated imports fromCobolParametersParser
You import both the whole object (
CobolParametersParser
) and a single member (getReaderProperties
). One of these is redundant:-import za.co.absa.cobrix.cobol.reader.parameters.{CobolParametersParser, Parameters} -import za.co.absa.cobrix.cobol.reader.parameters.CobolParametersParser.getReaderProperties +import za.co.absa.cobrix.cobol.reader.parameters.{CobolParametersParser => CP, Parameters} +import CP.getReaderPropertiesA single aliased import keeps the namespace flat and avoids accidental shadowing.
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (1)
24-26
: Minor import duplication
CobolParametersParser
is imported twice (line 24 and via the wildcard in line 25). Remove one to silence IntelliJ/Scalafix “unused import” warnings on some setups.-import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser, Parameters} -import za.co.absa.cobrix.cobol.reader.parameters.CobolParametersParser._ +import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser => CP, Parameters} +import CP._spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/builder/RddReaderParams.scala (1)
19-21
: Tidy up duplicateCobolParametersParser
importsSame duplication pattern as in
DefaultSource.scala
. Consider folding into a single aliased import:-import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser, Parameters, ReaderParameters} -import za.co.absa.cobrix.cobol.reader.parameters.CobolParametersParser._ +import za.co.absa.cobrix.cobol.reader.parameters.{CobolParameters, CobolParametersParser => CP, Parameters, ReaderParameters} +import CP._This is only a style tidy-up; functionality is unaffected.
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
19-21
: Remove unused imports.The imports for
FSStream
andFileNotFoundException
don't appear to be used anywhere in this class.-import za.co.absa.cobrix.cobol.reader.stream.{FSStream, SimpleStream} +import za.co.absa.cobrix.cobol.reader.stream.SimpleStream -import java.io.FileNotFoundExceptioncobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (1)
93-110
: Fix documentation inconsistency.The method documentation states
@return The value of the field
but the method signature returnsUnit
. The documentation should be corrected to reflect that this is a setter method.- * @return The value of the field + * This method modifies the record in place and does not return a value.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (2)
28-37
: Document current encoding limitations.The
getEncoder
method is well-structured but currently only supports AlphaNumeric fields without compact usage. Consider adding documentation to clarify these limitations and future expansion plans.def getEncoder(dataType: CobolType, ebcdicCodePage: CodePage = new CodePageCommon, - asciiCharset: Charset = StandardCharsets.US_ASCII): Option[Encoder] = { + asciiCharset: Charset = StandardCharsets.US_ASCII): Option[Encoder] = { + // Currently only supports AlphaNumeric fields without compact usage dataType match {
39-56
: Consider implementing ASCII encoding or documenting the limitation.The
getStringEncoder
method only implements EBCDIC encoding while returningNone
for ASCII. Consider either implementing ASCII encoding or adding documentation explaining why it's not supported.For ASCII encoding, you could add:
case ASCII => - None + val encoder = (a: Any) => { + val str = a.toString + val bytes = str.getBytes(asciiCharset) + if (bytes.length > fieldLength) { + java.util.Arrays.copyOf(bytes, fieldLength) + } else { + val padded = new Array[Byte](fieldLength) + System.arraycopy(bytes, 0, padded, 0, bytes.length) + padded + } + } + Option(encoder)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (1)
107-125
: Remove unused type parameter and consider stream efficiency.The method handles extractor creation well with appropriate fallback logic, but has minor issues:
- The
ClassTag
type parameterT
is unused and should be removed- Multiple stream copies are created which may impact performance for large streams
- private[processor] def getRecordExtractor[T: ClassTag](readerParameters: ReaderParameters, inputStream: SimpleStream): RawRecordExtractor = { + private[processor] def getRecordExtractor(readerParameters: ReaderParameters, inputStream: SimpleStream): RawRecordExtractor = {The multiple stream copies pattern appears intentional for the reader architecture, but consider documenting why multiple copies are necessary.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala
(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala
(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala
(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala
(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/ArrayOfAnyHandler.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/MapOfAnyHandler.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala
(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala
(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala
(1 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala
(2 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala
(3 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala
(1 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala
(2 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala
(1 hunks)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/builder/RddReaderParams.scala
(1 hunks)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala
(1 hunks)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala
(1 hunks)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala
(1 hunks)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala
(1 hunks)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala
(2 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSourceSpec.scala
(1 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (12)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (3)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
copyStream
(54-56)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
copyStream
(53-55)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
copyStream
(114-116)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (5)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (3)
classOf
(36-48)classOf
(50-56)classOf
(58-61)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
copyStream
(54-56)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
copyStream
(53-55)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
copyStream
(52-54)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
copyStream
(114-116)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (3)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (2)
ByteStreamMock
(23-57)copyStream
(54-56)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
copyStream
(52-54)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
copyStream
(114-116)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala (3)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)
cobrix
(132-139)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/index/IndexBuilder.scala (10)
cobol
(68-107)cobol
(113-133)cobol
(138-153)cobol
(155-175)cobol
(177-199)cobol
(201-265)cobol
(267-287)cobol
(292-297)cobol
(302-314)cobol
(316-318)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1)
CobolParametersParser
(39-986)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (3)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
copyStream
(54-56)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
copyStream
(53-55)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
copyStream
(52-54)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (4)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (1)
FSStream
(21-62)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
copyStream
(53-55)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
copyStream
(52-54)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
copyStream
(114-116)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (3)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala (1)
CobolParameters
(67-109)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1)
CobolParametersParser
(39-986)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala (1)
Parameters
(27-98)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (4)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
copyStream
(54-56)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
copyStream
(53-55)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
copyStream
(52-54)spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
copyStream
(114-116)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scala (7)
parser
(114-168)parser
(170-179)parser
(181-190)parser
(193-254)parser
(256-277)DecoderSelector
(30-352)getDecoder
(57-75)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (2)
EncoderSelector
(25-80)getEncoder
(28-37)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (5)
processor
(97-99)processor
(101-105)processor
(107-125)processor
(127-127)options
(65-68)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala (1)
processRecord
(40-42)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (1)
isEndOfStream
(30-30)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (3)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/AlphaNumeric.scala (1)
AlphaNumeric
(28-36)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (3)
CodePageCommon
(24-26)CodePageCommon
(28-82)asciiToEbcdicMapping
(62-81)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/Encoding.scala (2)
ASCII
(25-25)EBCDIC
(23-23)
🔇 Additional comments (41)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/Parameters.scala (1)
17-17
: LGTM! Clean package refactoring.The package move from
za.co.absa.cobrix.spark.cobol.parameters
toza.co.absa.cobrix.cobol.reader.parameters
aligns with separating reader-specific parameters from Spark integration code, supporting the library routine functionality.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala (1)
17-17
: LGTM! Consistent package restructuring.The package move aligns with the refactoring to make parameter parsing available as a library routine, consistent with the change in
Parameters.scala
.spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala (1)
20-20
: LGTM! Import updated for package refactoring.The import correctly reflects the new package location after moving parameter classes to
za.co.absa.cobrix.cobol.reader.parameters
.spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala (1)
25-25
: LGTM! Import updated for package refactoring.The import correctly updates to use the new package location for
CobolParametersParser
constants and methods used in validation.spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/CobolStreamer.scala (1)
28-28
: LGTM! Final import updated for package refactoring.The import correctly reflects the new package location, completing the consistent refactoring across all files that use
CobolParametersParser
constants.spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSourceSpec.scala (1)
29-29
: Import path update looks correctThe new package path reflects the recent refactor and should compile fine. No other adjustments needed in this test file.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedRecordLengthRawRecordExtractor.scala (1)
19-24
: Verify immediate header stream closure is intentional.The constructor immediately closes the header stream on line 24. Ensure this is the intended behavior and that the header stream won't be needed later in the extraction process.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/SimpleStream.scala (1)
32-34
: LGTM: Clean addition of stream duplication contract.The addition of the abstract
copyStream()
method establishes a clear contract for stream duplication across all implementations. The method signature is appropriate with proper exception handling annotation.cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestStringStream.scala (1)
52-54
: LGTM: Consistent implementation of stream duplication.The
copyStream()
implementation correctly creates a new instance with the same underlying string data, following the established pattern across other stream implementations.cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/mock/ByteStreamMock.scala (1)
54-56
: LGTM: Consistent stream duplication implementation.The
copyStream()
method correctly creates a new instance with the same byte array, maintaining consistency with other stream implementations.spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala (1)
114-116
: LGTM: Proper stream duplication with all parameters preserved.The
copyStream()
implementation correctly creates a newFileStreamer
instance with all the same parameters (filePath
,fileSystem
,startOffset
,maximumBytes
), ensuring the duplicated stream has equivalent configuration.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (2)
19-19
: LGTM! Import addition is consistent with method signature.The
FileNotFoundException
import aligns with the@throws
annotation on the newcopyStream()
method.
58-61
: LGTM! Stream copying implementation is correct.The
copyStream()
method correctly creates a newFSStream
instance with the samefileName
, following the established pattern used in other stream implementations. The@throws(classOf[FileNotFoundException])
annotation is appropriate since file access could fail.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/NonTerminalsAdder.scala (2)
23-23
: LGTM! Import addition supports encoding functionality.The
EncoderSelector
import is necessary for the new encoder assignment in thetransform
method.
77-85
: LGTM! Encoder assignment is consistent with decoder pattern.The encoder is properly obtained using
EncoderSelector.getEncoder()
with the same data type and code page parameters as the decoder, and correctly passed to thePrimitive
constructor. This maintains consistency with the bidirectional encoding/decoding capability being added.cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/memorystream/TestByteStream.scala (1)
53-55
: LGTM! Stream copying implementation is correct.The
copyStream()
method correctly creates a newTestByteStream
instance with the samebytes
array, maintaining the established pattern for stream duplication across different stream implementations.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (1)
59-81
: Confirm full bidirectional mapping correctnessThe table matches expected EBCDIC codes for key ASCII bytes (0x20→0x40, 0x30→0xF0, 0x41→0xC1, 0x61→0x81), but my invertibility script couldn’t extract the complete
ebcdicToAsciiMapping
and reported out-of-bounds. Please ensure there are no gaps in the 256-entry arrays:
- Manually verify that for every byte
x
in 0..255:val e = asciiToEbcdicMapping(x) & 0xFF val back = ebcdicToAsciiMapping(e) & 0xFF assert(back == x)- Add a unit test covering the full round-trip for all 256 values to catch any missing or mis-mapped entries.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)
26-26
: LGTM! Import addition supports encoding functionality.The
EncoderSelector
import is necessary for the new encoder assignment in thevisitPrimitive
method.
858-860
: LGTM! Encoder integration is consistent and well-implemented.The
Primitive
constructor now properly receives both decoder and encoder instances. The encoder uses the same effective code pages (effectiveEbcdicCodePage
andeffectiveAsciiCharset
) as the decoder, ensuring consistency in encoding/decoding operations. This maintains the bidirectional conversion capability being added to the system.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RawRecordProcessor.scala (1)
21-31
: Well-designed functional interface.The trait follows the strategy pattern effectively, providing a clean contract for record processing implementations. The method signature includes all necessary context (copybook, options, raw record data, and offset) for flexible processing logic.
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala (3)
23-23
: LGTM: Import updated for encoder support.The addition of
EncoderSelector
to the import aligns with the new encoding functionality being added to the codebase.
162-163
: LGTM: Primitive construction updated for encoder support.The constructor call correctly includes the new encoder parameter obtained via
EncoderSelector.getEncoder(dataType)
, maintaining consistency with the updatedPrimitive
class definition.
208-219
: LGTM: Comprehensive test for field value mutation.The new test properly validates the
setFieldValueByName
functionality by:
- Setting a new field value
- Verifying the updated value is correctly retrieved
- Checking that another field's encoder is empty as expected
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/ArrayOfAnyHandler.scala (1)
22-33
: LGTM: Clean RecordHandler implementation for arrays.The implementation correctly fulfills the
RecordHandler[Array[Any]]
contract with straightforward method implementations:
create
returns the input array directly (appropriate for array representation)toSeq
andforeach
provide expected collection operations- Clear documentation explains the array-based approach
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/MapOfAnyHandler.scala (1)
22-40
: LGTM: Well-implemented map-based record handler.The implementation correctly transforms COBOL group structures into maps with thoughtful design:
- Maps group children names to their values using zip operation
- Properly handles Array[Any] to Seq conversion for nested structures
toSeq
andforeach
operations work on map values as expected- Clear documentation explains the map-based field representation approach
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilderSuite.scala (3)
28-54
: LGTM: Comprehensive end-to-end processing test.The test effectively validates the complete processing pipeline:
- Uses a realistic custom
RawRecordProcessor
that transforms each byte- Verifies the processed output matches expected transformed values
- Tests the integration between builder, processor, and stream handling
56-74
: LGTM: Good coverage of builder configuration methods.The tests properly validate:
- Schema extraction from copybook contents
- Reader parameter generation with custom options
- Option handling and retrieval
76-116
: LGTM: Thorough testing of record extractor creation.The test suite comprehensively covers:
- Fixed-length record extractor creation and iteration
- Variable-length record extractor creation
- Error handling for unsupported configurations with descriptive error messages
- Different record format scenarios
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala (4)
20-20
: LGTM: Import addition supports new encoding functionality.The addition of
EncoderSelector
to the imports is necessary for the newencode
field and is appropriately grouped with related decoder imports.
24-40
: Excellent documentation enhancement.The comprehensive parameter documentation significantly improves code readability and maintainability. The documentation clearly explains the purpose of each parameter, including the new
encode
field for bidirectional data conversion.
56-56
: Well-designed encoder field addition.The optional
encode
field appropriately mirrors the existingdecode
field pattern and enables bidirectional data conversion. Making it optional is a sound design decision as not all primitive fields may require encoding capabilities.
111-111
: Note: @throws annotation removed.The
@throws
annotation was removed from thedecodeTypeValue
method signature. Ensure this change aligns with the actual exception handling behavior of the method.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala (2)
74-91
: Well-implemented field value getter method.The
getFieldValueByName
method follows good design patterns by leveraging existinggetFieldByName
functionality and properly validating field types. The error handling is appropriate with descriptive messages.
217-246
: Excellent defensive programming in setPrimitiveField.The method demonstrates excellent defensive programming with comprehensive validation:
- Encoder presence validation
- Bounds checking to prevent buffer overflows
- Size validation to ensure data integrity
- Clear, descriptive error messages for debugging
The use of
System.arraycopy
for byte manipulation is efficient and appropriate.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/EncoderSelector.scala (2)
19-26
: Well-structured foundation for encoding functionality.The
Encoder
type alias and imports are well-organized and provide a clean foundation for the encoding functionality. The type signatureAny => Array[Byte]
appropriately captures the encoding contract.
58-78
: Robust EBCDIC string encoding implementation.The
encodeEbcdicString
method demonstrates excellent defensive programming:
- Proper input validation with descriptive error messages
- Safe byte conversion using modulo arithmetic to handle negative values
- Correct handling of strings shorter than the target length
- Efficient implementation using while loop
The method correctly pads with zeros when the string is shorter than the target length, which is appropriate for COBOL field formatting.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala (1)
36-64
: Add StreamProcessor test coverage for edge casesThere are no existing unit tests exercising
StreamProcessor.processStream
, so we need explicit test cases to validate its header/footer logic and guard against negative sizes or ordering issues.Please add tests covering:
- Streams with no headers or footers (single record only)
- Multiple back-to-back records with zero‐byte gaps
- Cases where
headerSize
would calculate as ≤ 0 (should skip header)- Proper consumption order: raw header → raw record → processed record
- Footer handling when
inputStream.size - inputStream.offset
is zero or positiveTarget the
processStream
method in
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/StreamProcessor.scala
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/RecordProcessorBuilder.scala (4)
30-45
: Well-structured builder pattern implementation.The companion object factory method and constructor provide a clean interface for creating builder instances. Using a case-insensitive options map is appropriate for configuration management.
47-68
: Excellent fluent interface for option management.The
option
andoptions
methods provide a clean fluent interface with consistent case-insensitive key handling. The implementation correctly supports method chaining and bulk option assignment.
70-95
: Excellent orchestration and resource management.The
process
method effectively orchestrates the processing pipeline with proper component creation and coordination. The use ofcopyStream()
ensures stream independence, and the try-finally block provides appropriate resource cleanup.
97-105
: Clean helper methods with appropriate visibility.The helper methods
getCobolSchema
andgetReaderParameters
provide clean abstractions and are appropriately marked as package-private for testing access. The implementation correctly leverages existing parsing infrastructure.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation