Skip to content

Led support#10

Closed
osamuaoki wants to merge 35 commits intovolium:mainfrom
osamuaoki:led
Closed

Led support#10
osamuaoki wants to merge 35 commits intovolium:mainfrom
osamuaoki:led

Conversation

@osamuaoki
Copy link
Collaborator

@osamuaoki osamuaoki commented Apr 22, 2020

Here is a patch to add LED support (+6 bytes)
I also squeezed code (-6 bytes)
So it is still 506 bytes

I checked and works nicely with LED ;-)

Move ldi r16 to inside of set_watchdog_timer

Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
@osamuaoki osamuaoki mentioned this pull request Dec 6, 2021
sigprof and others added 26 commits December 8, 2021 09:05
Instead of using a separate register for the loop counter, compare the
value in the XL register with the expected end address.  This saves one
instruction (2 bytes).

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Reorder the descriptor handling code so that the part that falls
through to `process_descriptor` is handling a single USB descriptor
prefixed with its length.  Does not save any bytes by itself, just
prepares the code for subsequent changes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
In two cases (`send_hid_descriptor` and `send_device_descriptor`) the
returned data contains just a single USB descriptor, which contains its
length in bytes in the first data byte.  Replace two instructions to
load the descriptor lengths with one instruction to read the length from
the first data byte, saving 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
No size or behavior changes, just a preparation to save some bytes with
fallthrough.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Restructure conditional jumps to use fallthrough for the `bRequest == 9`
case; this removes one jump instruction, saving 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
No size or behavior changes, just a preparation to save some bytes with
fallthrough.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Restructure conditional jumps to use fallthrough for the
`HANDLE_USB_CLAS_INTERFACE` case; this removes one jump instruction,
saving 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Restructure conditional jumps to use fallthrough for the
`GET_DESCRIPTOR` case; this removes one jump instruction, saving 2
bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
`SET_CONFIGURATION` is basically a noop - the only thing that needs to
be done is `process_Host2Device`.  Instead of handling it in two
instructions, reuse the `SET_ADDRESS` code for it, passing the current
UDADDR value to it (so it effectively does nothing too), which can be
done in a single instruction, thus saving 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Although the subset of USB HID protocol that is actually used by the
bootloader uses only the default control endpoint (0), the HID spec
requires the device to have an interrupt in endpoint, and the host can
poll that endpoint even when the HID report descriptor does not actually
declare any input reports.  Polling an unconfigured endpoint causes USB
errors, which may prevent the bootloader from functioning properly.
Apparently this was happening in Mac OS, making the bootloader unusable
there (however, Windows and Linux did not expose the problem; on Linux
it was possible to provoke these errors by opening the `/dev/hidrawN`
device corresponding to the bootloader, but existing flashing tools do
not use that method to access the bootloader device).

Add the code to configure endpoint 1 as Interrupt IN, matching the USB
descriptors; this is enough to make the USB controller generate NAK
replies for that endpoint correctly, and the rest of bootloader code may
continue using just endpoint 0 as before.

The binary size increases by 12 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Replace Windows-specific `NUL` with `/dev/null` to fix building the code
in Linux (the Windows build environment requires MSYS anyway, therefore
`/dev/null` should work there too).

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>

I have cherry-picked Sergey's commits up to
  79f1be5 ("Makefile: Use /dev/null instead of NUL", 2021-08-20)
with linear history excluding
  64ae729 ("Remove nonstandard avr-size options", 2021-08-20)
onto my led branch.

With the excluded commit, I observe the normal latest GCC 'size' command
can be used for nanoBoot case to produce desired result as I
experimented.

Since we are generating $(TARGET).elf using forked avr-gcc, I chose to
keep using 'avr-size'.  Thus options are kept by excluding the commit.

Signed-off-by: Osamu Aoki <osamu@debian.org>
Instead of looping until one of two bits in UEINTX is set, and then
retesting the value after the loop, just do `reti` inside the loop if
`RXSTPI` is set.  This saves 4 bytes by removing both a duplicate bit
test instruction and an extra jump instruction.
The `wait_TXINI` and `wait_RXOUTI` subroutines were always called after
a call to some other `clear_XXX` subroutine to clear a bit in `UEINTX`.
Replace those two subroutines with `clear_bit_and_wait_TXINI` and
`clear_bit_and_wait_RXOUTI`, which get the bit to be cleared as a
parameter in `r17`, and then inline the remaining `clear_XXX`
subroutines (a subroutine with just 2 instructions in its body actually
takes more space than inline code if it is called less than 3 times).
This reverts commit d7dc792.
which interferes with cherry picking

  ed07e05 ("Save 2 bytes by refactoring the GET_DESCRIPTOR code", 2021-08-24)

The reverted commit will be applied later.

Signed-off-by: Osamu Aoki <osamu@debian.org>
Rewrite the part of the GET_DESCRIPTOR handling code that loads the
address and length of the requested descriptor to use less jumps; the
resulting code consumes 2 bytes less, even though it is actually more
correct (no longer replies with some descriptor to requests for an
unknown descriptor type).  The new code also avoids hardcoding the high
address byte, and no longer depends on the fact that all descriptors
have the same high address byte (but it depends on the fact that all
offsets between adjacent descriptors can fit into the `adiw` constant
argument (0...63)).
No size or behavior changes, just a preparation to save some bytes with
fallthrough.
Restructure conditional jumps to use fallthrough for the
`SET_HID_REPORT` case; this removes one jump instruction, saving 2
bytes.
The original LUFA code performed `UDCON |= (1 << DETACH);` to detach the
USB device; however, the other bits of `UDCON` are known to be 0 at this
time, therefore a simple write of a constant value could be performed
here.  In addition, the value of `(1 << DETACH)` is 1 on all AVR chips
that could be potentially supported by this code, therefore even the
instruction to load the constant value into a register could be omitted.
Doing these changes removes 2 instructions, saving 4 bytes.
The initialization of USBCON does not need to read the current register
value - just writing the reset value into that register is enough.  This
removes one instruction, saving 2 bytes.

In theory this write could even be omitted completely, saving 4 more
bytes, but this would make the code less robust if the application code
attempting to enter the bootloader does not initialize some USB
controller registers properly, therefore leaving that USBCON write there
is safer.
The instruction which applied the mask of `(CONTROL_REQTYPE_TYPE |
CONTROL_REQTYPE_RECIPIENT)` to bmRequestType was not actually needed,
because the value of this mask is 0x7F, and the only bit which is not
covered by the mask (0x80) is already known to be 0, therefore the
masking did not actually change anything.  Removing this instruction
saves 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>

Since following commits are skipped previously when cherry-picking.
  97c8d96 ("TEMPORARY: EEPROM code which does not fit", 2021-08-25)
  e48801a ("WIP: Optimize some more code to make the EEPROM support fit", 2021-08-25)
  78b804d ("WIP: Redo the EEPROM write implementation", 2021-08-29)
  e7e94f3 ("Save 2 bytes by reusing the TXINI clearing mask", 2021-08-29)

Adjusted to keep branch to thunk code.

Signed-off-by: Osamu Aoki <osamu@debian.org>
The DEVICE_TO_HOST handling code needs to handle only the GET_DESCRIPTOR
requests, which may come with two possible bmRequestType values:

  - 0x80 (IN direction, USB Standard Request, Recipient is the device) if
    a descriptor which applies to the device as a whole is requested (this
    code is used for the device and configuration descriptors);

  - 0x81 (IN direction, USB Standard Request, Recipient is the interface)
    if a descriptor specific to a particular interface is requested (this
    code is used for the HID class and HID report descriptors).

Because these codes are numerically sequential, and the direction bit has
already been tested (therefore it is known that bmRequestType >= 0x80), it
is enough to make a single comparison with 0x82 to detemine whether
bmRequestType has one of the above values.  Removing the bit masking
operation which is not needed after that change saves 2 bytes.
…ytes

Remove the code which tried to check whether the SETUP packet has been
handled - now UNHANDLED_SETUP_REQUEST can be used only if the SETUP
packed needs to be acknowledged and replied with STALL, and in other
cases the code should just do `reti` directly.  In addition,
UNHANDLED_SETUP_REQUEST is placed in the middle of the code, so that it
would be reachable by conditional branches directly.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>

This is based on
6bc21dd ("Save 6 bytes by refactoring UNHANDLED_SETUP_REQUEST usage", 2021-09-04)

Adjusted for thunk entry choice differences, both of them go to
UNHANDLED_SETUP_REQUEST:

 * UNHANDLED_SETUP_REQUEST_1
 * UNHANDLED_DEVICE_TO_HOST

Signed-off-by: Osamu Aoki <osamu@debian.org>
Instead of saving the initial address value in a separate register pair,
and then reloading it after the flash write loop, keep the address in
the Z register and subtract 128 from it after the data write loop - this
should give the same result, because the loop counter is explicitly
initialized by the bootloader code.  However, because the `sbiw`
instruction supports only the 0...63 range for its constant argument,
another optimization trick is also used - instead of incrementing the
whole 16-bit address, only the low byte is incremented; this should give
the same result, because the block start address must be aligned to the
flash page size (128 bytes), therefore any carry to the high byte should
not happen within the page (it may happen just past the end of the page,
but if both instruction will not perform that carry, the final result
will be the same).

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
To protect the user from accidents (or even deliberate attempts to brick
the device), compare the specified page address with the bootloader
start address, and skip the flash write if the address could overlap
with the bootloader.

Handling the error case by jumping to `finish_hid_request` did not work
(it returned an error, but then the bootloader stopped responding to any
further USB requests); apparently it is important to consume the proper
number of bytes from the USB FIFO.  Because of that, the bad address
case is handled by running the same loop that is used for the normal
flash write case, but with all `spm` instructions disabled by setting a
flag bit.  Bit 7 of `reg_bRequest` is chosen for that role (that bit is
guaranteed to be 0 when starting to handle a normal `SET_HID_REPORT`
request without needing to add any instructions to clear it).

The `START_APPLICATION` command handling is also changed to use the same
code path to save space.

Because of additional optimization of the address comparison and the
`START_APPLICATION` handling code, the binary size is increased by just
2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Equivalent of:
  b6cb76e ("Fix a typo in the comment for set_watchdog_timer", 2021-09-05)

Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
@osamuaoki osamuaoki force-pushed the led branch 2 times, most recently from 276259f to 0585969 Compare December 8, 2021 13:34
Signed-off-by: Osamu Aoki <osamu@debian.org>
@volium
Copy link
Owner

volium commented Jan 19, 2022

Hey @osamuaoki!

I am really sorry I ignored this PR for so long. I have been very busy with other things (mostly work), but I finally have the time to work on nanoBoot once again.

I have already merged @sigprof's PRs (#12 and #13), mainly because they were easier to review and merge.

I want to incorporate your latest changes as well, but this PR is now very hard to follow, and the title ("Led support") no longer reflects all the actual changes being made.

I hope you don't mind, but could you redo the PR so that it is more limited in scope to just adding LED support?

I would guess it makes sense to include a lot of the size optimizations from both you and @sigprof first, so we should probably take care of that prior to merging the LED support itself.

I think I need @sigprof's help to understand which branches should be merged first (looks like tmp2 and string-descriptors) are further along, but I see some commits are shared with tmp branch, so it would be good to know what has been tested that can be integrated back.

I look forward to hearing back from you....

@osamuaoki
Copy link
Collaborator Author

Let's keep this PR as is for now until my new PR is tested and reviewed.

@osamuaoki
Copy link
Collaborator Author

Since I added clean set of size reduction patches in my lednew branch (PR Lednew #16), I am asking this branch to be cloased.

@osamuaoki osamuaoki closed this Feb 5, 2022
sigprof pushed a commit to sigprof/nanoBoot that referenced this pull request May 11, 2024
(Originally part of PR volium#10)

(The original commit said "506 bytes", because there was another commit
before it that added 6 more bytes to the binary size.)

Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants