Skip to content

Added HAL_Delay_us() for BK7231 #1579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rpv-tomsk
Copy link
Contributor

@rpv-tomsk rpv-tomsk commented Mar 24, 2025

Added HAL_Delay_us() based on timer, not nop.
Timings are much better now.

20250324_205949

@rpv-tomsk
Copy link
Contributor Author

@NonPIayerCharacter
Copy link
Contributor

@rpv-tomsk
Can you check #1538?
It uses backported calendar driver from beken_freertos_sdk, but i ported/tested it only on N, and not T.
I don't have an oscilloscope, so can't test it properly myself.

@rpv-tomsk
Copy link
Contributor Author

Quick look on source shows:

#define CAL_3125_TU_VAL (3125)

uint64_t cal_get_time_us(void) {
  ...
  val = (uint64_t)cnt_s * 1000000 + (uint64_t)cnt_us * CAL_3125_TU_VAL / 100;
	  return val;
}

uint64_t rtos_get_time_us(void)
{
     return (uint64_t)cal_get_time_us();
}

So quantum/step will be 31us.

@NonPIayerCharacter
Copy link
Contributor

Looking in calendar.h it seems 3125 is just a misspelling, it was meant to be 31_25, meaning a step will be 0.3125us. Or am i thinking it wrong?

@rpv-tomsk
Copy link
Contributor Author

uint64_t cal_get_time_us(void) {
  ...
  val = (uint64_t)cnt_s * 1000000 + (uint64_t)cnt_us * CAL_3125_TU_VAL / 100;
	  return val;
}

My thoughts ares following:
Assume cnt_s = 0. Then, if cnt_us = 1 ; , returned value will be 1 * 3125 / 100 = 31. If cnt_us = 2, then returned value will be 62. Function returns time in us, so step is 31us.

@NonPIayerCharacter
Copy link
Contributor

I understand what you meant now, and i didn't bother to check it in code - since this is default implementation for delayMicroseconds for bk7238 arduino core.
But it worked 90% of the time with ds18b20 on 7231n, and that is why i requested an oscilloscope test.

@rpv-tomsk
Copy link
Contributor Author

For me that does not work. Sensor does not reply.
As predicted, delay is multiple of 31us
26 03 2025 11 23 54

Compare with this implementation:
26 03 2025 11 58 26

Slot time is much better (should be 60+10us while writing zero to bus, got 66+17)
26 03 2025 12 02 39

You don't need expensive scope, cheap logic analyzer is good too.
saleae

@rpv-tomsk
Copy link
Contributor Author

Also want to notice, current implementation of OWReset() is incorrect.
HAL_PIN_SetOutputValue(Pin, 1); // Releases the bus -- does not releases bus. It turns output to strong "1" level.
But sensor is strong enough too, it starts to fight and win. Level is nearby zero.

image

Things are better with patch applied:

--- src/driver/drv_ds1820_simple.c
+++ src/driver/drv_ds1820_simple.c
@@ -69,9 +69,8 @@ static int OWReset(int Pin)
        HAL_PIN_Setup_Output(Pin);
        HAL_PIN_SetOutputValue(Pin, 0); // Drives DQ low
        HAL_Delay_us(OWtimeH);
-       HAL_PIN_SetOutputValue(Pin, 1); // Releases the bus
+       HAL_PIN_Setup_Input_Pullup(Pin);  // Releases the bus
        HAL_Delay_us(OWtimeI);
-       HAL_PIN_Setup_Input_Pullup(Pin);
        result = HAL_PIN_ReadDigitalInput(Pin) ^ 0x01; // Sample for presence pulse from slave


@@ -120,9 +119,8 @@ static int OWReadBit(int Pin)
        HAL_PIN_Setup_Output(Pin);
        HAL_PIN_SetOutputValue(Pin, 0); // Drives DQ low
        HAL_Delay_us(OWtimeA);
-       HAL_PIN_SetOutputValue(Pin, 1); // Releases the bus
+       HAL_PIN_Setup_Input_Pullup(Pin); // Releases the bus
        HAL_Delay_us(OWtimeE);
-       HAL_PIN_Setup_Input_Pullup(Pin);
        result = HAL_PIN_ReadDigitalInput(Pin); // Sample for presence pulse from slave
        interrupts();   // hope for the best for the following timer and keep CRITICAL as short as possible
        HAL_Delay_us(OWtimeF); // Complete the time slot and 10us recovery

image

@LynxChaus
Copy link

I don't know, which arduino soldered on my WB3S module, but simple flip-flop code

    HAL_PIN_Setup_Output(Pin);
    for (int i = 0; i <= 10; i++) {
        HAL_PIN_SetOutputValue(Pin, 1);
        HAL_PIN_SetOutputValue(Pin, 0);
    }

pic_44_6
shows MINIMUM 50us delay in every state (and some time it floating - rtos switch context?)
Running this flip-flop under disabled interrupts:
pic_44_7
show 6us overhead of HAL pin manipulation code.
With this random floating delays not surprise why it not work.

Also, on bk7231/bl602/espidf HAL_PIN_Setup_Output() sets pin value to 0, all other platforms - sets only PULL_UP. Totally mess.

@rpv-tomsk
Copy link
Contributor Author

Checked timings on 7231N. Code:

         noInterrupts();
         HAL_PIN_SetOutputValue(Pin, 1);
         HAL_Delay_us(2);
         HAL_PIN_SetOutputValue(Pin, 0);
         HAL_Delay_us(5);
         HAL_PIN_SetOutputValue(Pin, 1);
         HAL_Delay_us(10);
         HAL_PIN_SetOutputValue(Pin, 0);
         HAL_Delay_us(50);
         HAL_PIN_SetOutputValue(Pin, 1);
         HAL_Delay_us(100);
         HAL_PIN_SetOutputValue(Pin, 0);
         interrupts();

produces results:
delay(2) - 5.5us
delay(5) - 8.8us
delay(10) - 13.5us
delay(50) - 56.8us
delay(100) - 103.4us

So there is ~3.5us overhead only.

@rpv-tomsk
Copy link
Contributor Author

--- src/hal/bk7231/hal_pins_bk7231.c
+++ src/hal/bk7231/hal_pins_bk7231.c
@@ -65,11 +65,11 @@ int HAL_PIN_CanThisPinBePWM(int index) {
        return 1;
 }
 void HAL_PIN_SetOutputValue(int index, int iVal) {
-       bk_gpio_output(index, iVal);
+       gpio_output(index, iVal);
 }

 int HAL_PIN_ReadDigitalInput(int index) {
-       return bk_gpio_input(index);
+       return gpio_input(index);
 }
 void HAL_PIN_Setup_Input_Pullup(int index) {
        bk_gpio_config_input_pup(index);
@@ -82,7 +82,7 @@ void HAL_PIN_Setup_Input(int index) {
 }
 void HAL_PIN_Setup_Output(int index) {
        bk_gpio_config_output(index);
-       bk_gpio_output(index, 0);
+       gpio_output(index, 0);
 }
 void HAL_PIN_PWM_Stop(int index) {
        int pwmIndex;

Simple patch reduces that time to another 1us... Thanks for pointing.

@rpv-tomsk
Copy link
Contributor Author

Would these changes be acceptable to merge?

@LynxChaus
Copy link

LynxChaus commented Mar 27, 2025

Simple patch reduces that time to another 1us... Thanks for pointing.

Wow. For my BK7231T gpio overhead reduced to 1.6us.

@rpv-tomsk
Copy link
Contributor Author

Does DS1820 now works for you?

@LynxChaus
Copy link

device is Tuya WiFi IR Blaster S08 pro
image
Yes, reading one DS18B20 is stable now, I''m end with this variant of patch

diff --git a/src/driver/drv_ds1820_simple.c b/src/driver/drv_ds1820_simple.c
index f24312c3..10ab0f1d 100644
--- a/src/driver/drv_ds1820_simple.c
+++ b/src/driver/drv_ds1820_simple.c
@@ -22,67 +22,6 @@ static int DS1820_DiscoverFamily();
 
 #define DS1820_LOG(x, fmt, ...) addLogAdv(LOG_##x, LOG_FEATURE_SENSOR, "DS1820[%i] - " fmt, Pin, ##__VA_ARGS__)
 
-// usleep adopted from DHT driver
-static void usleepds(int r)
-{
-	HAL_Delay_us(r);
-}
-
-// add some "special timing" for Beken - works w/o and with powerSave 1 for me
-static void usleepshort(int r) //delay function do 10*r nops, because rtos_delay_milliseconds is too much
-{
-#if PLATFORM_BEKEN
-	int newr = r / (3 * g_powersave + 1);		// devide by 4 if powerSave set to 1
-	for(volatile int i = 0; i < newr; i++)
-	{
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		//__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		__asm__("nop\nnop\nnop\nnop");
-	}
-
-#else
-	usleepds(r);
-#endif
-}
-
-static void usleepmed(int r) //delay function do 10*r nops, because rtos_delay_milliseconds is too much
-{
-#if PLATFORM_BEKEN
-	int newr = 10 * r / (10 + 5 * g_powersave);		// devide by 1.5 powerSave set to 1
-	for(volatile int i = 0; i < newr; i++)
-	{
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");	// 5
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-	}
-
-#else
-	usleepds(r);
-#endif
-}
-
-static void usleeplong(int r) //delay function do 10*r nops, because rtos_delay_milliseconds is too much
-{
-#if PLATFORM_BEKEN
-	int newr = 10 * r / (10 + 5 * g_powersave);		// devide by 1.5 powerSave set to 1
-	for(volatile int i = 0; i < newr; i++)
-	{
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");
-		//		__asm__("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop");	// 5
-		__asm__("nop\nnop\nnop\nnop\nnop");	// 5
-	}
-
-#else
-	usleepds(r);
-#endif
-}
-
 /*
 
 timing numbers and general code idea from
@@ -103,16 +42,17 @@ J 		Standard 	410
 
 */
 
-#define OWtimeA	6
+#define GPIO_OVERHEAD	2	/* 6us overhead with bk_gpio_output(), 2us with gpio_output() while switch GPIO pins */
+#define OWtimeA	(6 - GPIO_OVERHEAD)
 #define OWtimeB	64
 #define OWtimeC	60
 #define OWtimeD	10
-#define OWtimeE	9
+#define OWtimeE	(9 - GPIO_OVERHEAD)
 #define OWtimeF	55
 #define OWtimeG	0
-#define OWtimeH	480
-#define OWtimeI	70
-#define OWtimeJ	410
+#define OWtimeH	(480 - GPIO_OVERHEAD)
+#define OWtimeI	(70 - GPIO_OVERHEAD)
+#define OWtimeJ	(410 - GPIO_OVERHEAD)
 
 #define READ_ROM 0x33
 #define SKIP_ROM 0xCC
@@ -124,15 +64,17 @@ static int OWReset(int Pin)
 {
 	int result;
 
-	//usleep(OWtimeG);
-	HAL_PIN_Setup_Output(Pin);
-	HAL_PIN_SetOutputValue(Pin, 0); // Drives DQ low
-	usleeplong(OWtimeH);
+	noInterrupts();
+	//HAL_Delay_us(OWtimeG);
+	HAL_PIN_Setup_OutputValue(Pin, 0); // Drives DQ low
+	HAL_Delay_us(OWtimeH);
 	HAL_PIN_SetOutputValue(Pin, 1); // Releases the bus
-	usleepmed(OWtimeI);
-	HAL_PIN_Setup_Input(Pin);
+	HAL_PIN_Setup_Input_Pullup(Pin); // Releases the bus
+	HAL_Delay_us(OWtimeI);
 	result = HAL_PIN_ReadDigitalInput(Pin) ^ 0x01; // Sample for presence pulse from slave
-	usleeplong(OWtimeJ); // Complete the reset sequence recovery
+	interrupts();	// hope for the best for the following timer and keep CRITICAL as short as possible
+
+	HAL_Delay_us(OWtimeJ); // Complete the reset sequence recovery
 	return result; // Return sample presence pulse result
 }
 
@@ -141,28 +83,25 @@ static int OWReset(int Pin)
 //-----------------------------------------------------------------------------
 static void OWWriteBit(int Pin, int bit)
 {
+
+	noInterrupts();
+	HAL_PIN_Setup_OutputValue(Pin, 0); // Drives DQ low
+
 	if(bit)
 	{
 		// Write '1' bit
-		HAL_PIN_Setup_Output(Pin);
-		noInterrupts();
-		HAL_PIN_SetOutputValue(Pin, 0); // Drives DQ low
-		usleepshort(OWtimeA);
+		HAL_Delay_us(OWtimeA);
 		HAL_PIN_SetOutputValue(Pin, 1); // Releases the bus
-		interrupts();	// hope for the best for the following timer and keep CRITICAL as short as possible
-		usleepmed(OWtimeB); // Complete the time slot and 10us recovery
+		HAL_Delay_us(OWtimeB); // Complete the time slot and 10us recovery
 	}
 	else
 	{
 		// Write '0' bit
-		HAL_PIN_Setup_Output(Pin);
-		noInterrupts();
-		HAL_PIN_SetOutputValue(Pin, 0); // Drives DQ low
-		usleepmed(OWtimeC);
+		HAL_Delay_us(OWtimeC);
 		HAL_PIN_SetOutputValue(Pin, 1); // Releases the bus
-		interrupts();	// hope for the best for the following timer and keep CRITICAL as short as possible
-		usleepshort(OWtimeD);
+		HAL_Delay_us(OWtimeD);
 	}
+	interrupts();
 }
 
 //-----------------------------------------------------------------------------
@@ -173,15 +112,13 @@ static int OWReadBit(int Pin)
 	int result;
 
 	noInterrupts();
-	HAL_PIN_Setup_Output(Pin);
-	HAL_PIN_SetOutputValue(Pin, 0); // Drives DQ low
-	usleepshort(OWtimeA);
-	HAL_PIN_SetOutputValue(Pin, 1); // Releases the bus
-	usleepshort(OWtimeE);
-	HAL_PIN_Setup_Input(Pin);
+	HAL_PIN_Setup_OutputValue(Pin, 0); // Drives DQ low
+	HAL_Delay_us(OWtimeA);
+	HAL_PIN_Setup_Input_Pullup(Pin); // Releases the bus
+	HAL_Delay_us(OWtimeE);
 	result = HAL_PIN_ReadDigitalInput(Pin); // Sample for presence pulse from slave
 	interrupts();	// hope for the best for the following timer and keep CRITICAL as short as possible
-	usleepmed(OWtimeF); // Complete the time slot and 10us recovery
+	HAL_Delay_us(OWtimeF); // Complete the time slot and 10us recovery
 	return result;
 }
 
@@ -204,6 +141,7 @@ static void OWWriteByte(int Pin, int data)
 {
 	int loop;
 
+	vTaskSuspendAll();
 	// Loop to write each bit in the byte, LS-bit first
 	for(loop = 0; loop < 8; loop++)
 	{
@@ -212,6 +150,7 @@ static void OWWriteByte(int Pin, int data)
 		// shift the data byte for the next bit
 		data >>= 1;
 	}
+	xTaskResumeAll();
 }
 
 //-----------------------------------------------------------------------------
@@ -221,6 +160,8 @@ static int OWReadByte(int Pin)
 {
 	int loop, result = 0;
 
+	vTaskSuspendAll();
+
 	for(loop = 0; loop < 8; loop++)
 	{
 		// shift the result to get it ready for the next bit
@@ -230,6 +171,8 @@ static int OWReadByte(int Pin)
 		if(OWReadBit(Pin))
 			result |= 0x80;
 	}
+
+	xTaskResumeAll();
 	return result;
 }
 
@@ -343,7 +286,7 @@ void DS1820_driver_Init()
 	Pin = PIN_FindPinIndexForRole(IOR_DS1820_IO, -1);
 	if (Pin >= 0)
 		DS1820_DiscoverFamily();
-};
+}
 
 void DS1820_AppendInformationToHTTPIndexPage(http_request_t* request)
 {
@@ -473,7 +416,6 @@ void DS1820_OnEverySecond()
 		lastconv = g_secondsElapsed;
 		CHANNEL_Set(g_cfg.pins.channels[Pin], t, CHANNEL_SET_FLAG_SILENT);
 		DS1820_LOG(INFO, "Temp=%i.%02i", (int)t / 100, (int)t % 100);
-		
 		return;
 	}
 	
diff --git a/src/hal/bk7231/hal_generic_bk7231.c b/src/hal/bk7231/hal_generic_bk7231.c
index 1cfdbd71..92862e35 100644
--- a/src/hal/bk7231/hal_generic_bk7231.c
+++ b/src/hal/bk7231/hal_generic_bk7231.c
@@ -2,6 +2,11 @@
 #include "../hal_generic.h"
 #include <BkDriverWdg.h>
 
+#include "../../beken378/driver/include/arm_arch.h"
+#include "../../beken378/driver/pwm/bk_timer.h"
+#include "../../beken378/driver/include/bk_timer_pub.h"
+#include "../../beken378/func/include/fake_clock_pub.h"
+
 // from wlan_ui.c
 void bk_reboot(void);
 extern bool g_powersave;
@@ -11,16 +16,50 @@ void HAL_RebootModule()
 	bk_reboot();
 }
 
-void HAL_Delay_us(int delay)
-{
-	float adj = 1;
-	if(g_powersave) adj = 1.5;
-#if PLATFORM_BK7238
-	//  current n/t are for 120mhz, BK7238 freq is 160mhz
-	usleep((23 * delay * adj) / 10); // "1" is to fast and "2" to slow, 1.7 seems better than 1.5 (only from observing readings, no scope involved)
-#else
-	usleep((17 * delay * adj) / 10); // "1" is to fast and "2" to slow, 1.7 seems better than 1.5 (only from observing readings, no scope involved)
+#if !defined TIMER0_2_READ_VALUE && defined TIMERR0_2_READ_VALUE
+//typo in sdk
+#define TIMER0_2_READ_VALUE TIMERR0_2_READ_VALUE
+#endif
+
+static uint32_t getTicksCount() {
+	uint32_t timeout = 0;
+	REG_WRITE(TIMER0_2_READ_CTL, (CAL_TIMER_ID << 2) | 1);
+	while (REG_READ(TIMER0_2_READ_CTL) & 1) {
+		timeout++;
+		if (timeout > (120 * 1000))
+		return 0;
+	}
+	return REG_READ(TIMER0_2_READ_VALUE);
+}
+
+#if !defined CFG_XTAL_FREQUENCE
+//26MHz, as per bk7231t_os/beken378/driver/sys_ctrl/sys_ctrl.c
+#define CFG_XTAL_FREQUENCE 26000000
 #endif
+
+#define TICKS_PER_US       (CFG_XTAL_FREQUENCE / 1000 / 1000)
+#define US_PER_OVERFLOW    (portTICK_PERIOD_MS * 1000)
+#define TICKS_PER_OVERFLOW (TICKS_PER_US * US_PER_OVERFLOW)
+
+//https://github.com/libretiny-eu/libretiny
+void HAL_Delay_us(int delay) {
+	if (delay == 0)
+		return;
+	delay--;
+	uint32_t startTick = getTicksCount();
+	/* startTick2 accounts for the case where the timer counter overflows */
+	uint32_t startTick2 = startTick - TICKS_PER_OVERFLOW;
+	uint32_t delayTicks = TICKS_PER_US * delay;
+	while (delayTicks > TICKS_PER_OVERFLOW) {
+		// The delay is longer than what the timer can count.
+		// Let it overflow until only a fraction of TICKS_PER_OVERFLOW remain.
+		while (getTicksCount() > startTick) {__asm__("nop");}
+		while (getTicksCount() < startTick) {__asm__("nop");}
+			delayTicks -= TICKS_PER_OVERFLOW;
+	}
+	while ((getTicksCount() - startTick < delayTicks) || // normal case
+		  (getTicksCount() - startTick2 < delayTicks) // counter overflow case
+	) {__asm__("nop");}
 }
 
 void HAL_Configure_WDT()
diff --git a/src/hal/bk7231/hal_pins_bk7231.c b/src/hal/bk7231/hal_pins_bk7231.c
index f139a8df..ee97afce 100644
--- a/src/hal/bk7231/hal_pins_bk7231.c
+++ b/src/hal/bk7231/hal_pins_bk7231.c
@@ -65,11 +65,11 @@ int HAL_PIN_CanThisPinBePWM(int index) {
 	return 1;
 }
 void HAL_PIN_SetOutputValue(int index, int iVal) {
-	bk_gpio_output(index, iVal);
+	gpio_output(index, iVal);
 }
 
 int HAL_PIN_ReadDigitalInput(int index) {
-	return bk_gpio_input(index);
+	return gpio_input(index);
 }
 void HAL_PIN_Setup_Input_Pullup(int index) {
 	bk_gpio_config_input_pup(index);
@@ -82,8 +82,14 @@ void HAL_PIN_Setup_Input(int index) {
 }
 void HAL_PIN_Setup_Output(int index) {
 	bk_gpio_config_output(index);
-	bk_gpio_output(index, 0);
+	gpio_output(index, 0);
 }
+
+void HAL_PIN_Setup_OutputValue(int index, int val) {
+	bk_gpio_config_output(index);
+	gpio_output(index, val);
+}
+
 void HAL_PIN_PWM_Stop(int index) {
 	int pwmIndex;
 
diff --git a/src/hal/generic/hal_pins_generic.c b/src/hal/generic/hal_pins_generic.c
index 2be202a3..35034a95 100644
--- a/src/hal/generic/hal_pins_generic.c
+++ b/src/hal/generic/hal_pins_generic.c
@@ -49,6 +49,11 @@ void __attribute__((weak)) HAL_PIN_Setup_Output(int index)
 	return;
 }
 
+void __attribute__((weak)) HAL_PIN_Setup_OutputValue(int index, int val)
+{
+	return;
+}
+
 void __attribute__((weak)) HAL_PIN_PWM_Stop(int index)
 {
 	return;
diff --git a/src/hal/hal_pins.h b/src/hal/hal_pins.h
index d279ea34..393575d1 100644
--- a/src/hal/hal_pins.h
+++ b/src/hal/hal_pins.h
@@ -5,6 +5,7 @@ void HAL_PIN_Setup_Input_Pulldown(int index);
 void HAL_PIN_Setup_Input_Pullup(int index);
 void HAL_PIN_Setup_Input(int index);
 void HAL_PIN_Setup_Output(int index);
+void HAL_PIN_Setup_OutputValue(int index, int val);
 void HAL_PIN_PWM_Stop(int index);
 void HAL_PIN_PWM_Start(int index, int freq);
 // Value range is 0 to 100, value is clamped

@rpv-tomsk
Copy link
Contributor Author

I reviewed HAL_Delay_us() a bit. One major change is timer overflow value. It seems to have another value.
Timer BKTIMER2 aka CAL_TIMER_ID set in bk7231t_os/beken378/func/misc/fake_clock.c and it has period set as ONE_CAL_TIME ms. So define of TICKS_PER_OVERFLOW needs to be updated.

Also, reviewed delayTicks > TICKS_PER_OVERFLOW condition. Overflow period is 15 seconds, so this condition is excessive, I think.

Also, I had thoughts about vTaskDelay use, like done in Arduino code.
But it might be excessive here too... Noone should use usleep for milliseconds? )

Also, I thought that getTicksCount() adds his overhead, so removed second call from while loop...

So, added changes and the code now looks like shown below. Please review.

#if !defined CFG_XTAL_FREQUENCE
//26MHz, as per bk7231t_os/beken378/driver/sys_ctrl/sys_ctrl.c
#define CFG_XTAL_FREQUENCE 26000000
#endif

//ONE_CAL_TIME - from fake_clock.c
#define ONE_CAL_TIME       15000
#define TICKS_PER_US       (CFG_XTAL_FREQUENCE / 1000 / 1000)
#define US_PER_OVERFLOW    (ONE_CAL_TIME * 1000)
// 26 ticks per us * 15 000 000 us per overflow
#define TICKS_PER_OVERFLOW (TICKS_PER_US * US_PER_OVERFLOW)

//https://github.com/libretiny-eu/libretiny
void HAL_Delay_us(int delay) {
        if (delay > 1)
          delay -= 2;
        else
          delay = 0;

        const int us_per_tick = portTICK_PERIOD_MS * 1000;
        if (delay > us_per_tick) {
                vTaskDelay((delay + us_per_tick - 1) / us_per_tick);
                return;
        }

        /* startTick2 accounts for the case where the timer counter overflows */
        uint32_t startTick = getTicksCount();
        uint32_t startTick2 = startTick - TICKS_PER_OVERFLOW;

        uint32_t delayTicks = TICKS_PER_US * delay;
        while (1) {
                uint32_t t = getTicksCount();
                if ((t - startTick >= delayTicks) && // normal case
                    (t - startTick2 >= delayTicks))  // counter overflow case
                        break;
        }
}

@rpv-tomsk
Copy link
Contributor Author

And some changes with GPIO implementation:

--- src/hal/bk7231/hal_pins_bk7231.c
+++ src/hal/bk7231/hal_pins_bk7231.c
@@ -65,24 +65,28 @@ int HAL_PIN_CanThisPinBePWM(int index) {
        return 1;
 }
 void HAL_PIN_SetOutputValue(int index, int iVal) {
-       bk_gpio_output(index, iVal);
+       gpio_output(index, iVal);
 }

 int HAL_PIN_ReadDigitalInput(int index) {
-       return bk_gpio_input(index);
+       return gpio_input(index);
 }
 void HAL_PIN_Setup_Input_Pullup(int index) {
-       bk_gpio_config_input_pup(index);
+       //bk_gpio_config_input_pup(index);
+       gpio_config(index, GMODE_INPUT_PULLUP);
 }
 void HAL_PIN_Setup_Input_Pulldown(int index) {
-       bk_gpio_config_input_pdwn(index);
+       //bk_gpio_config_input_pdwn(index);
+       gpio_config(index, GMODE_INPUT_PULLDOWN);
 }
 void HAL_PIN_Setup_Input(int index) {
-       bk_gpio_config_input(index);
+       //bk_gpio_config_input(index);
+       gpio_config(index, GMODE_INPUT);
 }
 void HAL_PIN_Setup_Output(int index) {
-       bk_gpio_config_output(index);
-       bk_gpio_output(index, 0);
+       //bk_gpio_config_output(index);
+       gpio_config(index, GMODE_OUTPUT);
+       gpio_output(index, 0);
 }
 void HAL_PIN_PWM_Stop(int index) {
        int pwmIndex;

@LynxChaus
Copy link

So, added changes and the code now looks like shown below. Please review.

Overcomplicated.

  1. why "delay -= 2" ? this is caller problem.
  2. vTaskDelay() work only with ms delays, what about 10500 us delay? 500us will lost. And vTaskDelay() not guarantee accuracy.

@rpv-tomsk
Copy link
Contributor Author

Yes, for now I don't think anybody should use usleep() for microseconds, so these several lines of vTaskDelay() should be removed when doing PR code update. As for delay =-2 - it was added then I don't knew about real overhead reasons, then I thought that getTicksCount() is slow (because it has loop inside). And I liked your solution with GPIO_OVERHEAD define.

@rpv-tomsk rpv-tomsk force-pushed the usleep branch 2 times, most recently from 240a95a to ded3289 Compare April 4, 2025 16:10
@rpv-tomsk rpv-tomsk mentioned this pull request Apr 4, 2025
@rpv-tomsk rpv-tomsk force-pushed the usleep branch 3 times, most recently from 2550485 to 703cf31 Compare April 4, 2025 18:54
 * Do not use HAL_PIN_SetOutputValue() when reading sensor response - output must be open-drain.
 * Use Input_Pullup when reading sensor response instead. That allows to connect sensor w/o adding pullup resistor.
 * Fixed OWReadBit timing, read must be done within 15us, instead of after. Closes: openshwprojects#1582
@rpv-tomsk
Copy link
Contributor Author

Code prepared for merge.
It works on my BK7231N device.

@MaxineMuster
Copy link
Contributor

Sorry for my late feedback, thank you so much for your work on this!
There is only one point I'm not sure about, the change of the timing definitions with "GPIO_OVERHEAD"

I didn't test, but I'm quite sure the overhead is not necessarily equal for all platforms, and there are quite a lot around.
So I would propose to "fix" the overhead per platform.

@rpv-tomsk
Copy link
Contributor Author

So I would propose to "fix" the overhead per platform.

Yes, I have same thoughts so left overhead in hal/ files, left comment about this in code too.

@LynxChaus
Copy link

So I would propose to "fix" the overhead per platform.

#if defined(PLATFORM_BK7231T)
#define GPIO_OVERHEAD   2       /* 6us overhead with bk_gpio_output(), 2us with gpio_output() for GPIO pins */
#elif defined(PLATFORM_BK7231N)
#define GPIO_OVERHEAD   1
#else
#define GPIO_OVERHEAD   0
#endif

something like this maybe ?

@rpv-tomsk
Copy link
Contributor Author

For N I also have 2us overhead, not 1us.

@rpv-tomsk
Copy link
Contributor Author

What is needed for this PR gets merged?

@MaxineMuster
Copy link
Contributor

MaxineMuster commented Apr 12, 2025

Please note that this version will not work on BK7238 - it will crash.
Reason is that a timeout in getTicksCount() is not handled but ignored - leading to an endless loop and crash on BK7238.

I didn't get it to work on this platform with this approach (every call to read the ticks will timeout, so even if you do some checks that it wont crash, usleep won't work). I also didn't manage to get DS18B20 working with the idea with calendar from @NonPIayerCharacter (this version for me on BK7238 only works o.k. with delays > 20us, all values smaller ~ 20us will [don't ask me, why] result in some longer(!) delays, e.g. for me 5us/6us/10s will all take approx 35us. Starting with a delay of 20 the values are "near" the expected ones: 20 -> 24 / 50 -> 58 / 100->115 / 200->214 / 500->500).

So for now I simply made a special case for BK7238 which is at least working with DS18B20 for me, though timing is not too good.

There is also some sort of error handling for getTicksCount() - this can be streamlined for sure...

diff -ur src/hal/bk7231/hal_generic_bk7231.c src/hal/bk7231/hal_generic_bk7231.c
--- src/hal/bk7231/hal_generic_bk7231.c	2025-04-04 20:57:48.000000000 +0200
+++ src/hal/bk7231/hal_generic_bk7231.c	2025-04-12 19:26:12.126106097 +0200
@@ -23,11 +23,14 @@
 
 static uint32_t getTicksCount() {
 	uint32_t timeout = 0;
+	if (CAL_TIMER_ID >2) bk_printf("ERROR - in getTicksCount() CAL_TIMER_ID = %i\r\n",CAL_TIMER_ID);
 	REG_WRITE(TIMER0_2_READ_CTL, (CAL_TIMER_ID << 2) | 1);
 	while (REG_READ(TIMER0_2_READ_CTL) & 1) {
 		timeout++;
-		if (timeout > (120 * 1000))
-		return 0;
+		if (timeout > (120 * 1000)){
+			bk_printf("ERROR - timeout in getTicksCount()\r\n");
+			return BK_TIMER_FAILURE;
+		}
 	}
 	return REG_READ(TIMER0_2_READ_VALUE);
 }
@@ -46,25 +49,34 @@
 
 //https://github.com/libretiny-eu/libretiny
 void HAL_Delay_us(int delay) {
+#if PLATFORM_BK7238
+	usleep((17*delay)/10);
+#else
 	// 2us with gpio_output() while switch GPIO pins.
 	// This is platform-specific, so put it here.
 	// us-range delays are for bitbang in most cases.
-	if (delay > 1)
-	  delay -= 2;
-	else
-	  delay = 0;
-
-	/* startTick2 accounts for the case where the timer counter overflows */
 	uint32_t startTick = getTicksCount();
-	uint32_t startTick2 = startTick - TICKS_PER_OVERFLOW;
+	if (delay > 1 && startTick != BK_TIMER_FAILURE ){
+		/* startTick2 accounts for the case where the timer counter overflows */
+
+		uint32_t failed_getTicks = 0;
+		uint32_t startTick2 = startTick - TICKS_PER_OVERFLOW;
+
+		uint32_t delayTicks = TICKS_PER_US * delay;
+		while (1) {
+			uint32_t t = getTicksCount();
+			if (t == BK_TIMER_FAILURE) failed_getTicks ++;	
+			if (failed_getTicks > 1) {
+				bk_printf("ERROR in HAL_Delay_us() - too many timeouts for getTicksCount()\r\n");
+				break;
+			}	
+			if ((t - startTick >= delayTicks) && // normal case
+			    (t - startTick2 >= delayTicks))  // counter overflow case
+				break;
+		}
 
-	uint32_t delayTicks = TICKS_PER_US * delay;
-	while (1) {
-		uint32_t t = getTicksCount();
-		if ((t - startTick >= delayTicks) && // normal case
-		    (t - startTick2 >= delayTicks))  // counter overflow case
-			break;
 	}
+#endif
 }
 
 void HAL_Configure_WDT()

EDIT: This will not work for bigger values on BK7238, e.g. used for DHT.
Trying to combine the old usleep-idea wiht calender for bigger values ...

@MaxineMuster
Copy link
Contributor

MaxineMuster commented Apr 12, 2025

This works for me on BK7238 with DS1820 and DHT11 (didn't test the other Beken platforms):


--- src/hal/bk7231/hal_generic_bk7231.c
+++ src/hal/bk7231/hal_generic_bk7231.c
@@ -23,11 +23,14 @@
 
 static uint32_t getTicksCount() {
 	uint32_t timeout = 0;
+	if (CAL_TIMER_ID >2) bk_printf("ERROR - in getTicksCount() CAL_TIMER_ID = %i\r\n",CAL_TIMER_ID);
 	REG_WRITE(TIMER0_2_READ_CTL, (CAL_TIMER_ID << 2) | 1);
 	while (REG_READ(TIMER0_2_READ_CTL) & 1) {
 		timeout++;
-		if (timeout > (120 * 1000))
-		return 0;
+		if (timeout > (120 * 1000)){
+			bk_printf("ERROR - timeout in getTicksCount()\r\n");
+			return BK_TIMER_FAILURE;
+		}
 	}
 	return REG_READ(TIMER0_2_READ_VALUE);
 }
@@ -46,25 +49,50 @@
 
 //https://github.com/libretiny-eu/libretiny
 void HAL_Delay_us(int delay) {
+#if PLATFORM_BK7238
+	if (delay < 100){
+		usleep((17*delay)/10);
+	}else{
+		uint64_t m = (uint64_t)rtos_get_time_us();
+		uint64_t e = (m + delay);
+		if(m > e)
+		{ //overflow
+			while((uint64_t)rtos_get_time_us() > e)
+			{
+				__asm ("NOP");
+			}
+		}
+		while((uint64_t)rtos_get_time_us() < e)
+		{
+			__asm ("NOP");
+		}
+	}
+#else
 	// 2us with gpio_output() while switch GPIO pins.
 	// This is platform-specific, so put it here.
 	// us-range delays are for bitbang in most cases.
-	if (delay > 1)
-	  delay -= 2;
-	else
-	  delay = 0;
-
-	/* startTick2 accounts for the case where the timer counter overflows */
 	uint32_t startTick = getTicksCount();
-	uint32_t startTick2 = startTick - TICKS_PER_OVERFLOW;
+	if (delay > 1 && startTick != BK_TIMER_FAILURE ){
+		/* startTick2 accounts for the case where the timer counter overflows */
+
+		uint32_t failed_getTicks = 0;
+		uint32_t startTick2 = startTick - TICKS_PER_OVERFLOW;
+
+		uint32_t delayTicks = TICKS_PER_US * delay;
+		while (1) {
+			uint32_t t = getTicksCount();
+			if (t == BK_TIMER_FAILURE) failed_getTicks ++;	
+			if (failed_getTicks > 1) {
+				bk_printf("ERROR in HAL_Delay_us() - too many timeouts for getTicksCount()\r\n");
+				break;
+			}	
+			if ((t - startTick >= delayTicks) && // normal case
+			    (t - startTick2 >= delayTicks))  // counter overflow case
+				break;
+		}
 
-	uint32_t delayTicks = TICKS_PER_US * delay;
-	while (1) {
-		uint32_t t = getTicksCount();
-		if ((t - startTick >= delayTicks) && // normal case
-		    (t - startTick2 >= delayTicks))  // counter overflow case
-			break;
 	}
+#endif
 }
 
 void HAL_Configure_WDT()

BK2738_DHT11_DS18B20

@rpv-tomsk
Copy link
Contributor Author

BK7238 - how is it possible that timer is not running there? What SDK does it use?

@NonPIayerCharacter
Copy link
Contributor

NonPIayerCharacter commented Apr 12, 2025

@rpv-tomsk
This one: beken_freertos_sdk
You can also compile OpenBK7231N_ALT to check if it works with this sdk on N.
Or for T here: https://github.com/NonPIayerCharacter/OpenBK7231T_App/tree/_7231u_t
And i can already see in sdk code that it won't work for BK7252, it would need the old nop way.

@MaxineMuster
Copy link
Contributor

MaxineMuster commented Apr 13, 2025

I reviewed HAL_Delay_us() a bit. One major change is timer overflow value. It seems to have another value. Timer BKTIMER2 aka CAL_TIMER_ID set in bk7231t_os/beken378/func/misc/fake_clock.c and it has period set as ONE_CAL_TIME ms. So define of TICKS_PER_OVERFLOW needs to be updated.

Also, reviewed delayTicks > TICKS_PER_OVERFLOW condition. Overflow period is 15 seconds, so this condition is excessive, I think.

Also, I had thoughts about vTaskDelay use, like done in Arduino code. But it might be excessive here too... Noone should use usleep for milliseconds? )

Also, I thought that getTicksCount() adds his overhead, so removed second call from while loop...

So, added changes and the code now looks like shown below. Please review.

#if !defined CFG_XTAL_FREQUENCE
//26MHz, as per bk7231t_os/beken378/driver/sys_ctrl/sys_ctrl.c
#define CFG_XTAL_FREQUENCE 26000000
#endif

//ONE_CAL_TIME - from fake_clock.c
#define ONE_CAL_TIME       15000
#define TICKS_PER_US       (CFG_XTAL_FREQUENCE / 1000 / 1000)
#define US_PER_OVERFLOW    (ONE_CAL_TIME * 1000)
// 26 ticks per us * 15 000 000 us per overflow
#define TICKS_PER_OVERFLOW (TICKS_PER_US * US_PER_OVERFLOW)

//https://github.com/libretiny-eu/libretiny
void HAL_Delay_us(int delay) {
        if (delay > 1)
          delay -= 2;
        else
          delay = 0;

        const int us_per_tick = portTICK_PERIOD_MS * 1000;
        if (delay > us_per_tick) {
                vTaskDelay((delay + us_per_tick - 1) / us_per_tick);
                return;
        }

        /* startTick2 accounts for the case where the timer counter overflows */
        uint32_t startTick = getTicksCount();
        uint32_t startTick2 = startTick - TICKS_PER_OVERFLOW;

        uint32_t delayTicks = TICKS_PER_US * delay;
        while (1) {
                uint32_t t = getTicksCount();
                if ((t - startTick >= delayTicks) && // normal case
                    (t - startTick2 >= delayTicks))  // counter overflow case
                        break;
        }
}

Back to the original topic: I don't know if it's worth trying, since it seems to work well, but the code for the actual test if delay-counter is reached is surely correct, but seems (too) "expensive" in many cases:

in each run of the while-loop there is at least one subtraction and one comparison (if the compiler ignores second parameter in an AND statement where the first test is "false"). Worst case there are two subtractions and two comparisons in every loop.

If we add a little more code, we can make the loops much simpler.

There are three cases, since we need to take care about a possible overflow.

The most common case:
The tick count at the end of the delay is (far) below the overflow value.
To check if delay is done: Simply test, if this count is reached or passed

The other "easy" case:
The delay causes an overflow. The counter value we need to check is "end value - TICKS_PER_OVERFLOW"
To check: First wait, until overflow happened, then test, if (end value - TICKS_PER_OVERFLOW) is reached or passed

Then there is the seldom but "tricky" case:
There is no overflow, but the end counter value is "near" the overflow value, so while testing the tick counter, we might experience an overflow.
Only in this case we must do some "expensive" testing for two cases to find is delay is done:
Is the counter >= end counter OR was there an overflow

So in most cases, we will only need one comparison inside the loop (though we need two loops for the overflow case) and only the seldom case with a possible overflow needs multiple comparisons.

My proposal is (only for the non-BK7238 platforms):

void HAL_Delay_us(int delay) {
	uint32_t startTick = getTicksCount();
	if (delay > 1 && startTick != BK_TIMER_FAILURE ){		// be sure, timer works -- GPIO-delay can/should be handeled if GPIO is involved, not here
		uint32_t endTicks = startTick + TICKS_PER_US * delay;	// end tick value after delay
		// 
		// there are three possible cases: 
		//	1. a delay with overflow 
		//	2. a delay surely without overflow
		//	3. a delay with a possible overflow
		// 
		// Case 1. with overflow:
		//	quite "easy":
		//		wait for overflow, then wait until counter > calculated end value (after overflow)
		if (endTicks >= TICKS_PER_OVERFLOW){
			while (getTicksCount() > startTick) {};	// until overflow happens, the actual count is > than start
			endTicks -= TICKS_PER_OVERFLOW;		// calculate endTicks (subtracting maximum value
			while (getTicksCount() < endTicks) {};	// wait, until end counter is reached
			
		} 
		else {
		// 
		//	For the other two cases we need to define a "safe distance" away from overflow
		//		if the end value is lower, we define it as "safe" - there can be no overflow when testing 
		//	 
		
			uint32_t safeTick = TICKS_PER_OVERFLOW - (TICKS_PER_US * 10);	// an end value "10 us away" from overflow should be safe 

		// Case 2. surely no overflow:
		//	very easy:
		//		since end value is a "safe distance" away from overflow we can simply wait, until end value is reached 
		
			if (endTicks < safeTick){			// so end value is "safe" distance away from overflow
				while (getTicksCount() < endTicks) {};	// just wait, until end counter is reached
			}
			else {
			
		// Case 3. the tricky one: end value is "near" to overflow - so we can't simply wait, until end value is reached, 
		//	because we might miss the check, were counter is > endTicks and the next test is after an overflow and (since the value returend is way below our end point) we missed the end condition.
		//	So we need two checks: is the value still below endTicks AND was there no overflow (we test, if the value is still > startTick)
		// 
		// 	to make the "expensive" tests as low as possible, devide the test:
		//		first test "simple" inside the safe boundaries
		//		then do the test for end value and check, if overflow happened   

				while (getTicksCount() < safeTick) {};	// "simple" wait, until safeTick is reached
				uint32_t t = getTicksCount();
				while ( t < endTicks && t > startTick){	// when "near" overflow, check if endTicks is reached or overflow happened (then t < startTick)
					t = getTicksCount();
				};

			}
		}

	}
}

As I said, it's a bit more code for the different cases, but the loops should be faster in most cases...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants