Skip to content

Discussion on interrupts during startup #205

@Mellvik

Description

@Mellvik

@ghaerr, this issue is opened to continue the discussion from ghaerr/elks#2441 (comment)

If interrupts are supposed to be off during kernel initialization, why keep the irq off/on sequences in the driver drv_init code?

Yes, exactly. The problem was that before PR ghaerr/elks#2355 to ensure correctness in kernel startup, one could not be absolutely sure of the CPU interrupt state, both when they were required to be off, and then later, when they are required to be on (for instance, in the IDE probe routine where precision timing requires an operating jiffies timer).

PR ghaerr/elks#2355 ensures startup correctness, with changes so that interrupts remain disabled until after all character device initialization, and then enabled before any block device initialization:

void INITPROC device_init(void)
{
    chr_dev_init();
    
    set_irq();          /* interrupts enabled for possible disk I/O or timers */

    blk_dev_init();
...

Given that we now know that interrupts are off during char device initialization, why not remove all the save_flags/clr_irq/restore_flags sequences from the chr driver initialization code?

Also, why not keep interrupts off during block device initialization too. As far as I can tell there are no block device drivers that need interrupts enabled during initialization. An ISA probe doesn't need a precision timer.

After kernel initialization, there are a few rules that can be followed to ensure the system will operate correctly when writing device drivers or interrupt handlers:

Interrupt handlers are always called with interrupts enabled (although the PIC may have less-priority hardware interrupts disabled). Critical sections that require the use of disabled interrupts should be very short. All interrupt handlers should save the original interrupt state using save_flags() / restore_flags, since the handler may itself be interrupted by another interrupt handler.

OK, I'm trying to envision this.

  • Interrupt handler A is running, and has not touched the IRQ enable bit so interrupts are on.
  • Interrupt handler B is interrupting A because it has higher priority.
  • B has a critical section and turns off interrupts. All interrupts are blocked except NMI which does not affect the IRQ enable bit.
  • When B's critical section ends, it can safely call set_irq since it knows interrupts were enabled when it got control and it knows no one could have interrupted I for the duration of the critical section.
  • If A was in a critical section (had called clr_irq) at the time B's interrupt came, B's interrupt would be blocked until A called set_irq.

So what I'm missing here is at what point or in what situation the state of the interrupt enable bit is unknown in an interrupt handler, qualifying the insurance provided by save_flags/restore_flags.

Non-interrupt handler kernel routines can disable interrupts for short periods, but the interrupt state should be handled the same way, since although the kernel is not reentrant, hardware interrupts can interrupt kernel code execution, and certain kernel routines can be called from others with interrupts required to be disabled the entire time.

Yes, this is a different setting since the IRQ state is unknown at the time of entry.

Upon any return by the kernel to user mode code, or interrupt handler entry, it is guaranteed that interrupts will be enabled. This means that a mistake saving a previous interrupt state above by leaving interrupt disabled will only cost a max (possibly recurring) 10ms until the next clock interrupt. Enabling interrupts too early will just defeat the purpose of the critical section, since interrupt handlers all run with interrupts enabled.

Agreed.

Therefore, the quick rule now is that all character and network device driver init routines can assume interrupts are disabled, whereas block device drivers and all other critical sections must save/restore interrupt state.

Like I said above, I'm not seeing the need for interrupts in block device init either... It would be simpler (and save code) to just leave them off.

NOTE however that shared routines, (such as dma_read in NE2K driver) should save/restore interrupt state since they're used at both init and run time. Always saving and restoring state ensures that the routine can be reimagined elsewhere without having to be concerned about interrupt status issues, and costs very little (see below).

A very pertinent example, @ghaerr. I want to take a close look at whether/which of these 'critical sections' are actually 'critical'. My suspicion is that some parts are overly protected (experiments that never ended) and could safely run with interrupts enabled. I'll be back on that.

Possibly a chance to do some cleanup, which I intend to do.
Further, there are experimental IRQ disable/enable pairs in most drivers. It's time to find out what's required and what's not.

Great, no problem - happy to wait while you do the cleanup in TLVC. If in doubt, it can never hurt to save/restore interrupts rather than cli/sti, the extra cost is only a single PUSHF for ASM routines and a PUSHF/POPF n(%bp) for C routines using save_flags().

Right! Good.

In order to check that all NIC drivers leave interrupts disabled through the init sequence, a quick check just in front of set_irq() can be made after chr_dev_init():

    flag_t flags;
    save_flags(flags);
    if (flags & 0x0200) printk("ERROR INTS ENABLED\n");
    set_irq();          /* interrupts enabled for possible disk I/O or timers */

Thanks, yes, this is good!

I compared the entire drivers/net and elkscmd/ktcp directories yesterday between TLVC and ELKS, and there's quite a few mods. In addition, there was one ELKS ktcp corruption fix ghaerr/elks#2267 and duplicate assignment ghaerr/elks@bc325ce that I haven't yet checked whether TLVC has fixed. There are some cases of lots of commented out debug statements in some files - not sure whether you plan on leaving those in longer term or not.

Appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DiscussionTechnical discussion related to other issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions