Conversation
Move ldi r16 to inside of set_watchdog_timer Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Instead of using a separate register for the loop counter, compare the value in the XL register with the expected end address. This saves one instruction (2 bytes). Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Reorder the descriptor handling code so that the part that falls through to `process_descriptor` is handling a single USB descriptor prefixed with its length. Does not save any bytes by itself, just prepares the code for subsequent changes. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
In two cases (`send_hid_descriptor` and `send_device_descriptor`) the returned data contains just a single USB descriptor, which contains its length in bytes in the first data byte. Replace two instructions to load the descriptor lengths with one instruction to read the length from the first data byte, saving 2 bytes. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
No size or behavior changes, just a preparation to save some bytes with fallthrough. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Restructure conditional jumps to use fallthrough for the `bRequest == 9` case; this removes one jump instruction, saving 2 bytes. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
No size or behavior changes, just a preparation to save some bytes with fallthrough. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Restructure conditional jumps to use fallthrough for the `HANDLE_USB_CLAS_INTERFACE` case; this removes one jump instruction, saving 2 bytes. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Restructure conditional jumps to use fallthrough for the `GET_DESCRIPTOR` case; this removes one jump instruction, saving 2 bytes. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
`SET_CONFIGURATION` is basically a noop - the only thing that needs to be done is `process_Host2Device`. Instead of handling it in two instructions, reuse the `SET_ADDRESS` code for it, passing the current UDADDR value to it (so it effectively does nothing too), which can be done in a single instruction, thus saving 2 bytes. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Although the subset of USB HID protocol that is actually used by the bootloader uses only the default control endpoint (0), the HID spec requires the device to have an interrupt in endpoint, and the host can poll that endpoint even when the HID report descriptor does not actually declare any input reports. Polling an unconfigured endpoint causes USB errors, which may prevent the bootloader from functioning properly. Apparently this was happening in Mac OS, making the bootloader unusable there (however, Windows and Linux did not expose the problem; on Linux it was possible to provoke these errors by opening the `/dev/hidrawN` device corresponding to the bootloader, but existing flashing tools do not use that method to access the bootloader device). Add the code to configure endpoint 1 as Interrupt IN, matching the USB descriptors; this is enough to make the USB controller generate NAK replies for that endpoint correctly, and the rest of bootloader code may continue using just endpoint 0 as before. The binary size increases by 12 bytes. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Replace Windows-specific `NUL` with `/dev/null` to fix building the code in Linux (the Windows build environment requires MSYS anyway, therefore `/dev/null` should work there too). Signed-off-by: Sergey Vlasov <sigprof@gmail.com> I have cherry-picked Sergey's commits up to 79f1be5 ("Makefile: Use /dev/null instead of NUL", 2021-08-20) with linear history excluding 64ae729 ("Remove nonstandard avr-size options", 2021-08-20) onto my led branch. With the excluded commit, I observe the normal latest GCC 'size' command can be used for nanoBoot case to produce desired result as I experimented. Since we are generating $(TARGET).elf using forked avr-gcc, I chose to keep using 'avr-size'. Thus options are kept by excluding the commit. Signed-off-by: Osamu Aoki <osamu@debian.org>
Instead of looping until one of two bits in UEINTX is set, and then retesting the value after the loop, just do `reti` inside the loop if `RXSTPI` is set. This saves 4 bytes by removing both a duplicate bit test instruction and an extra jump instruction.
The `wait_TXINI` and `wait_RXOUTI` subroutines were always called after a call to some other `clear_XXX` subroutine to clear a bit in `UEINTX`. Replace those two subroutines with `clear_bit_and_wait_TXINI` and `clear_bit_and_wait_RXOUTI`, which get the bit to be cleared as a parameter in `r17`, and then inline the remaining `clear_XXX` subroutines (a subroutine with just 2 instructions in its body actually takes more space than inline code if it is called less than 3 times).
Rewrite the part of the GET_DESCRIPTOR handling code that loads the address and length of the requested descriptor to use less jumps; the resulting code consumes 2 bytes less, even though it is actually more correct (no longer replies with some descriptor to requests for an unknown descriptor type). The new code also avoids hardcoding the high address byte, and no longer depends on the fact that all descriptors have the same high address byte (but it depends on the fact that all offsets between adjacent descriptors can fit into the `adiw` constant argument (0...63)).
No size or behavior changes, just a preparation to save some bytes with fallthrough.
Restructure conditional jumps to use fallthrough for the `SET_HID_REPORT` case; this removes one jump instruction, saving 2 bytes.
The original LUFA code performed `UDCON |= (1 << DETACH);` to detach the USB device; however, the other bits of `UDCON` are known to be 0 at this time, therefore a simple write of a constant value could be performed here. In addition, the value of `(1 << DETACH)` is 1 on all AVR chips that could be potentially supported by this code, therefore even the instruction to load the constant value into a register could be omitted. Doing these changes removes 2 instructions, saving 4 bytes.
The initialization of USBCON does not need to read the current register value - just writing the reset value into that register is enough. This removes one instruction, saving 2 bytes. In theory this write could even be omitted completely, saving 4 more bytes, but this would make the code less robust if the application code attempting to enter the bootloader does not initialize some USB controller registers properly, therefore leaving that USBCON write there is safer.
The instruction which applied the mask of `(CONTROL_REQTYPE_TYPE | CONTROL_REQTYPE_RECIPIENT)` to bmRequestType was not actually needed, because the value of this mask is 0x7F, and the only bit which is not covered by the mask (0x80) is already known to be 0, therefore the masking did not actually change anything. Removing this instruction saves 2 bytes. Signed-off-by: Sergey Vlasov <sigprof@gmail.com> Since following commits are skipped previously when cherry-picking. 97c8d96 ("TEMPORARY: EEPROM code which does not fit", 2021-08-25) e48801a ("WIP: Optimize some more code to make the EEPROM support fit", 2021-08-25) 78b804d ("WIP: Redo the EEPROM write implementation", 2021-08-29) e7e94f3 ("Save 2 bytes by reusing the TXINI clearing mask", 2021-08-29) Adjusted to keep branch to thunk code. Signed-off-by: Osamu Aoki <osamu@debian.org>
The DEVICE_TO_HOST handling code needs to handle only the GET_DESCRIPTOR
requests, which may come with two possible bmRequestType values:
- 0x80 (IN direction, USB Standard Request, Recipient is the device) if
a descriptor which applies to the device as a whole is requested (this
code is used for the device and configuration descriptors);
- 0x81 (IN direction, USB Standard Request, Recipient is the interface)
if a descriptor specific to a particular interface is requested (this
code is used for the HID class and HID report descriptors).
Because these codes are numerically sequential, and the direction bit has
already been tested (therefore it is known that bmRequestType >= 0x80), it
is enough to make a single comparison with 0x82 to detemine whether
bmRequestType has one of the above values. Removing the bit masking
operation which is not needed after that change saves 2 bytes.
…ytes Remove the code which tried to check whether the SETUP packet has been handled - now UNHANDLED_SETUP_REQUEST can be used only if the SETUP packed needs to be acknowledged and replied with STALL, and in other cases the code should just do `reti` directly. In addition, UNHANDLED_SETUP_REQUEST is placed in the middle of the code, so that it would be reachable by conditional branches directly. Signed-off-by: Sergey Vlasov <sigprof@gmail.com> This is based on 6bc21dd ("Save 6 bytes by refactoring UNHANDLED_SETUP_REQUEST usage", 2021-09-04) Adjusted for thunk entry choice differences, both of them go to UNHANDLED_SETUP_REQUEST: * UNHANDLED_SETUP_REQUEST_1 * UNHANDLED_DEVICE_TO_HOST Signed-off-by: Osamu Aoki <osamu@debian.org>
Instead of saving the initial address value in a separate register pair, and then reloading it after the flash write loop, keep the address in the Z register and subtract 128 from it after the data write loop - this should give the same result, because the loop counter is explicitly initialized by the bootloader code. However, because the `sbiw` instruction supports only the 0...63 range for its constant argument, another optimization trick is also used - instead of incrementing the whole 16-bit address, only the low byte is incremented; this should give the same result, because the block start address must be aligned to the flash page size (128 bytes), therefore any carry to the high byte should not happen within the page (it may happen just past the end of the page, but if both instruction will not perform that carry, the final result will be the same). Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
To protect the user from accidents (or even deliberate attempts to brick the device), compare the specified page address with the bootloader start address, and skip the flash write if the address could overlap with the bootloader. Handling the error case by jumping to `finish_hid_request` did not work (it returned an error, but then the bootloader stopped responding to any further USB requests); apparently it is important to consume the proper number of bytes from the USB FIFO. Because of that, the bad address case is handled by running the same loop that is used for the normal flash write case, but with all `spm` instructions disabled by setting a flag bit. Bit 7 of `reg_bRequest` is chosen for that role (that bit is guaranteed to be 0 when starting to handle a normal `SET_HID_REPORT` request without needing to add any instructions to clear it). The `START_APPLICATION` command handling is also changed to use the same code path to save space. Because of additional optimization of the address comparison and the `START_APPLICATION` handling code, the binary size is increased by just 2 bytes. Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Equivalent of: b6cb76e ("Fix a typo in the comment for set_watchdog_timer", 2021-09-05) Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
276259f to
0585969
Compare
Signed-off-by: Osamu Aoki <osamu@debian.org>
|
Hey @osamuaoki! I am really sorry I ignored this PR for so long. I have been very busy with other things (mostly work), but I finally have the time to work on nanoBoot once again. I have already merged @sigprof's PRs (#12 and #13), mainly because they were easier to review and merge. I want to incorporate your latest changes as well, but this PR is now very hard to follow, and the title ("Led support") no longer reflects all the actual changes being made. I hope you don't mind, but could you redo the PR so that it is more limited in scope to just adding LED support? I would guess it makes sense to include a lot of the size optimizations from both you and @sigprof first, so we should probably take care of that prior to merging the LED support itself. I think I need @sigprof's help to understand which branches should be merged first (looks like tmp2 and string-descriptors) are further along, but I see some commits are shared with tmp branch, so it would be good to know what has been tested that can be integrated back. I look forward to hearing back from you.... |
|
Let's keep this PR as is for now until my new PR is tested and reviewed. |
|
Since I added clean set of size reduction patches in my lednew branch (PR Lednew #16), I am asking this branch to be cloased. |
(Originally part of PR volium#10) (The original commit said "506 bytes", because there was another commit before it that added 6 more bytes to the binary size.) Signed-off-by: Osamu Aoki <osamu@debian.org> Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Here is a patch to add LED support (+6 bytes)
I also squeezed code (-6 bytes)
So it is still 506 bytes
I checked and works nicely with LED ;-)