Skip to content

Led#1

Closed
osamuaoki wants to merge 44 commits intosigprof:masterfrom
osamuaoki:led
Closed

Led#1
osamuaoki wants to merge 44 commits intosigprof:masterfrom
osamuaoki:led

Conversation

@osamuaoki
Copy link

You have been making interesting updates.

(I haven't tested yet.)

I have LED mod code which upstream didn't merge volium#10

Please check. This is handy :-)

@osamuaoki osamuaoki force-pushed the led branch 3 times, most recently from 0585969 to f069340 Compare December 8, 2021 13:44
Rodrigo Torres and others added 27 commits January 14, 2022 14:33
Updates build status badge to reflect status of new build.yml workflow.
Deletes .travis.yml now that CI has been moved to github workflow (.github/workflows/build.yml)
Instead of using a separate register for the loop counter, compare the
value in the XL register with the expected end address.  This saves one
instruction (2 bytes).

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Reorder the descriptor handling code so that the part that falls
through to `process_descriptor` is handling a single USB descriptor
prefixed with its length.  Does not save any bytes by itself, just
prepares the code for subsequent changes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
In two cases (`send_hid_descriptor` and `send_device_descriptor`) the
returned data contains just a single USB descriptor, which contains its
length in bytes in the first data byte.  Replace two instructions to
load the descriptor lengths with one instruction to read the length from
the first data byte, saving 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
No size or behavior changes, just a preparation to save some bytes with
fallthrough.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Restructure conditional jumps to use fallthrough for the `bRequest == 9`
case; this removes one jump instruction, saving 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
No size or behavior changes, just a preparation to save some bytes with
fallthrough.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Restructure conditional jumps to use fallthrough for the
`HANDLE_USB_CLAS_INTERFACE` case; this removes one jump instruction,
saving 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Restructure conditional jumps to use fallthrough for the
`GET_DESCRIPTOR` case; this removes one jump instruction, saving 2
bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
`SET_CONFIGURATION` is basically a noop - the only thing that needs to
be done is `process_Host2Device`.  Instead of handling it in two
instructions, reuse the `SET_ADDRESS` code for it, passing the current
UDADDR value to it (so it effectively does nothing too), which can be
done in a single instruction, thus saving 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Although the subset of USB HID protocol that is actually used by the
bootloader uses only the default control endpoint (0), the HID spec
requires the device to have an interrupt in endpoint, and the host can
poll that endpoint even when the HID report descriptor does not actually
declare any input reports.  Polling an unconfigured endpoint causes USB
errors, which may prevent the bootloader from functioning properly.
Apparently this was happening in Mac OS, making the bootloader unusable
there (however, Windows and Linux did not expose the problem; on Linux
it was possible to provoke these errors by opening the `/dev/hidrawN`
device corresponding to the bootloader, but existing flashing tools do
not use that method to access the bootloader device).

Add the code to configure endpoint 1 as Interrupt IN, matching the USB
descriptors; this is enough to make the USB controller generate NAK
replies for that endpoint correctly, and the rest of bootloader code may
continue using just endpoint 0 as before.

The binary size increases by 12 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Instead of saving the initial address value in a separate register pair,
and then reloading it after the flash write loop, keep the address in
the Z register and subtract 128 from it after the data write loop - this
should give the same result, because the loop counter is explicitly
initialized by the bootloader code.  However, because the `sbiw`
instruction supports only the 0...63 range for its constant argument,
another optimization trick is also used - instead of incrementing the
whole 16-bit address, only the low byte is incremented; this should give
the same result, because the block start address must be aligned to the
flash page size (128 bytes), therefore any carry to the high byte should
not happen within the page (it may happen just past the end of the page,
but if both instruction will not perform that carry, the final result
will be the same).

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
To protect the user from accidents (or even deliberate attempts to brick
the device), compare the specified page address with the bootloader
start address, and skip the flash write if the address could overlap
with the bootloader.

Handling the error case by jumping to `finish_hid_request` did not work
(it returned an error, but then the bootloader stopped responding to any
further USB requests); apparently it is important to consume the proper
number of bytes from the USB FIFO.  Because of that, the bad address
case is handled by running the same loop that is used for the normal
flash write case, but with all `spm` instructions disabled by setting a
flag bit.  Bit 7 of `reg_bRequest` is chosen for that role (that bit is
guaranteed to be 0 when starting to handle a normal `SET_HID_REPORT`
request without needing to add any instructions to clear it).

The `START_APPLICATION` command handling is also changed to use the same
code path to save space.

Because of additional optimization of the address comparison and the
`START_APPLICATION` handling code, the binary size is increased by just
2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
bootloader size 512 bytes

Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Updates some outdated comments and fixes typos after latest round of merges.
Makes LED support "opt-in" by defining "LED_ENABLED". Feature takes 6 bytes when enabled.
Adds macros to handle active high and active low differences.
Compared to Osamu's original implementation, moves turning on the LED to "run_bootloader", instead of "send_descriptor"; the behavior is pretty much the same, but it saves two bytes for the active-low case, since we only call `cbi` once when the bootloader becomes active (after setting T flag in SREG).
Removes old and unused LED_PIN declaration.
Moves "TURN_LED_ON" within "SET_CONFIGURATION" so LED is turned on ONLY when device has gone through enumeration.
Adds check for "LED_ACTIVE_LEVEL" to turn LED off at the beginning of "run_bootloader" if LED is active low. This has a 2-byte penalty which makes nanoBoot go beyond 512 bytes on ProMicro-compatible boards.
`set_watchdog_timer` was called in two places, but the parameter value
passed in r16 (the value used to unlock the watchdog configuration) was
the same in both cases.  Move the duplicated initialization of r16 into
the `set_watchdog_timer` function itself, saving 2 bytes.

Cherry-picked from testing branch of Sergey Vlasov <sigprof@gmail.com>
  c2b4ba5 ("Save 2 bytes in the watchdog handling code", 2021-08-20)

Trivial merge conflict resolution of a comment string applied.
This enables LED for promicro while keeping size within 512 bytes.

Signed-off-by: Osamu Aoki <osamu@debian.org>
Instead of looping until one of two bits in UEINTX is set, and then
retesting the value after the loop, just do `reti` inside the loop if
`RXSTPI` is set.  This saves 4 bytes by removing both a duplicate bit
test instruction and an extra jump instruction.
The `wait_TXINI` and `wait_RXOUTI` subroutines were always called after
a call to some other `clear_XXX` subroutine to clear a bit in `UEINTX`.
Replace those two subroutines with `clear_bit_and_wait_TXINI` and
`clear_bit_and_wait_RXOUTI`, which get the bit to be cleared as a
parameter in `r17`, and then inline the remaining `clear_XXX`
subroutines (a subroutine with just 2 instructions in its body actually
takes more space than inline code if it is called less than 3 times).
Rewrite the part of the GET_DESCRIPTOR handling code that loads the
address and length of the requested descriptor to use less jumps; the
resulting code consumes 2 bytes less, even though it is actually more
correct (no longer replies with some descriptor to requests for an
unknown descriptor type).  The new code also avoids hardcoding the high
address byte, and no longer depends on the fact that all descriptors
have the same high address byte (but it depends on the fact that all
offsets between adjacent descriptors can fit into the `adiw` constant
argument (0...63)).
osamuaoki and others added 12 commits February 5, 2022 12:56
Signed-off-by: Osamu Aoki <osamu@debian.org>
The original LUFA code performed `UDCON |= (1 << DETACH);` to detach the
USB device; however, the other bits of `UDCON` are known to be 0 at this
time, therefore a simple write of a constant value could be performed
here.  In addition, the value of `(1 << DETACH)` is 1 on all AVR chips
that could be potentially supported by this code, therefore even the
instruction to load the constant value into a register could be omitted.
Doing these changes removes 2 instructions, saving 4 bytes.
The initialization of USBCON does not need to read the current register
value - just writing the reset value into that register is enough.  This
removes one instruction, saving 2 bytes.

In theory this write could even be omitted completely, saving 4 more
bytes, but this would make the code less robust if the application code
attempting to enter the bootloader does not initialize some USB
controller registers properly, therefore leaving that USBCON write there
is safer.
The instruction which applied the mask of `(CONTROL_REQTYPE_TYPE |
CONTROL_REQTYPE_RECIPIENT)` to bmRequestType was not actually needed,
because the value of this mask is 0x7F, and the only bit which is not
covered by the mask (0x80) is already known to be 0, therefore the
masking did not actually change anything.  Removing this instruction
saves 2 bytes.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>

Since following commits are skipped previously when cherry-picking.
  97c8d96 ("TEMPORARY: EEPROM code which does not fit", 2021-08-25)
  e48801a ("WIP: Optimize some more code to make the EEPROM support fit", 2021-08-25)
  78b804d ("WIP: Redo the EEPROM write implementation", 2021-08-29)
  e7e94f3 ("Save 2 bytes by reusing the TXINI clearing mask", 2021-08-29)

Adjusted to keep branch to thunk code.

Signed-off-by: Osamu Aoki <osamu@debian.org>
The DEVICE_TO_HOST handling code needs to handle only the GET_DESCRIPTOR
requests, which may come with two possible bmRequestType values:

  - 0x80 (IN direction, USB Standard Request, Recipient is the device) if
    a descriptor which applies to the device as a whole is requested (this
    code is used for the device and configuration descriptors);

  - 0x81 (IN direction, USB Standard Request, Recipient is the interface)
    if a descriptor specific to a particular interface is requested (this
    code is used for the HID class and HID report descriptors).

Because these codes are numerically sequential, and the direction bit has
already been tested (therefore it is known that bmRequestType >= 0x80), it
is enough to make a single comparison with 0x82 to detemine whether
bmRequestType has one of the above values.  Removing the bit masking
operation which is not needed after that change saves 2 bytes.
Signed-off-by: Osamu Aoki <osamu@debian.org>
Access code to data in the extended IO address range can be made compact
by using Y+.  Both WDT and USB registers are in extended IO address.
Fortunately, calls to these 2 types of addresses do not overlap.

 * initial code calls WDT
 * main code calls USB
 * exit code calls WDT

This patch enables to use Y+ for WDT in addtion to USB.

 * YH is common to USB and WDT and 0 (very first).
 * YL is different.  It's initialization is moved to each subroutine to
   avoid code duplication.
   * inside of the set_watchdog_timer subroutine
   * USB Initialization section

Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
No size or behavior changes, just a preparation to save some bytes with
fallthrough.

Cherry picked from Sergey's repo

LED code properly moved.

Signed-off-by: Osamu Aoki <osamu@debian.org>
Restructure conditional jumps to use fallthrough for the
`SET_HID_REPORT` case; this removes one jump instruction, saving 2
bytes.

Cherry picked from Sergey Vlasov <sigprof@gmail.com>:
a358e89 ("Save 2 bytes by using fallthrough for SET_HID_REPORT", 2021-08-24)

Comment by Osamu (conflict resolution)

UNHANDLED_SETUP_REQUEST_1 was used since br command PC(word) offset is 6
bit

    offset      address
 00             00007ede first call ending up at UNHANDLED_SETUP_REQUEST
                This requires thunk to reach with br** command

 4A  00         00007f28 old UNHANDLED_DEVICE_TO_HOST entry ending up at UNHANDLED_SETUP_REQUEST
 5E  14  00     00007f3c UNHANDLED_SETUP_REQUEST_1    entry ending up at UNHANDLED_SETUP_REQUEST
!B8  6E  5A     00007f96 UNHANDLED_SETUP_REQUEST      entry

    br** commands
       6 bit offset PC offset (64)
       +/- 2^7 byte offset +7E, -80 (126)

    rjmp command
      11 bit offset PC offset (2K)
      +/- 2^12 byte offset +FFE, -1000 (4094)

Signed-off-by: Osamu Aoki <osamu@debian.org>
Due to reordering of code, UNHANDLED_SETUP_REQUEST is within scope of
direct jump from br** command now.

00007f3e <DEVICE_TO_HOST>:
      ;  - 0x80 - IN Type Request, USB Standard Request, Recipient is the device
      ;  - 0x81 - IN Type Request, USB Standard Request, Recipient is the interface
      ; At this step it is known that bmRequestType >= 0x80, therefore checking for bmRequestType < 0x82
      ; is enough to detect whether bmRequestType has one of the above values.

      cpi         reg_bmRequestType, 0x82       ; Check whether bmRequestType is less than 0x82 (then it must be either 0x80 or 0x81)
    7f3e:       22 38           cpi     r18, 0x82       ; 130
      brcc        UNHANDLED_SETUP_REQUEST       ; If bmRequestType >= 0x82, this request type is not handled here (it's not a GET_DESCRIPTOR request)
    7f40:       50 f5           brcc    .+84            ; 0x7f96 <UNHANDLED_SETUP_REQUEST>

      cpi         reg_bRequest, 0x06            ; Compare bRequest with value 0x06 (REQ_GetDescriptor)
    7f42:       36 30           cpi     r19, 0x06       ; 6
      brne        UNHANDLED_SETUP_REQUEST       ; jump to UNHANDLED_SETUP_REQUEST through f not equal
    7f44:       41 f5           brne    .+80            ; 0x7f96 <UNHANDLED_SETUP_REQUEST>

Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Remove the code which tried to check whether the SETUP packet has been
handled - now UNHANDLED_SETUP_REQUEST can be used only if the SETUP
packed needs to be acknowledged and replied with STALL, and in other
cases the code should just do `reti` directly.  In addition,
UNHANDLED_SETUP_REQUEST is placed in the middle of the code, so that it
would be reachable by conditional branches directly.

Signed-off-by: Sergey Vlasov <sigprof@gmail.com>

This is based on
6bc21dd ("Save 6 bytes by refactoring UNHANDLED_SETUP_REQUEST usage", 2021-09-04)

UNHANDLED_SETUP_REQUEST is moved up so we don't need to branch through
thunk like UNHANDLED_SETUP_REQUEST_1.

So drop UNHANDLED_SETUP_REQUEST_1 usage.

Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
Signed-off-by: Osamu Aoki <osamu@debian.org>
@osamuaoki
Copy link
Author

Upstream is in process of merging updated set of patches on to of code you submitted.
I also forwarded many of your code size reduction to upstream.
So let me close this ...

@osamuaoki osamuaoki closed this Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants