Skip to content

Conversation

@d33psky
Copy link
Owner

@d33psky d33psky commented Aug 18, 2025

Summary

This PR implements a comprehensive sensor reliability system that prevents single sensor failures from making the observatory inoperable while maintaining safety for critical systems.

Changes Made

🎯 Sensor Criticality System

  • Critical sensors (cloud detection): 5-minute tolerance → assume dangerous conditions if failed
    • BAA1/BCC1 infrared sky temperature sensors
    • Essential for detecting clouds that could damage equipment
  • Non-critical sensors: 2-3 minute tolerance → assume safe conditions if failed
    • SQM (Sky Quality Meter): 3-minute tolerance
    • Rain sensor: 2-minute tolerance
    • UPS monitoring: 2-minute tolerance
    • Temperature/humidity: Eternal tolerance (monitoring only)

🛠️ Error Handling Improvements

  • Fix TypeError: unorderable types: NoneType() >= float() crashes
  • Fix SQL syntax errors from empty sensor values (e.g., N:::-95.08)
  • Added check_sensor_data_age() function for data freshness validation
  • Graceful degradation when sensors fail

📝 Enhanced Documentation

  • Added comprehensive collector/CLAUDE.md with system architecture
  • Detailed troubleshooting guide with specific error messages
  • Sensor criticality matrix and timeout configurations

Problem Solved

Previously, any sensor failure would cause:

TypeError: unorderable types: NoneType() >= float()

or SQL errors:

ProgrammingError: (1064, "You have an error in your SQL syntax... near 'observatory_humidity1=,observatory_dewpoint1=-95.08'

This made the entire observatory inoperable due to single sensor issues.

Test Plan

  • Test with missing SQM data → continues operation
  • Test with missing rain sensor data → continues operation
  • Test with missing UPS data → continues operation
  • Test with missing infrared sensor data → safely closes (critical sensor)
  • Test with empty sensor values → properly handles NULL conversion
  • Verify all functions handle NoneType data gracefully

🤖 Generated with Claude Code

d33psky and others added 4 commits August 18, 2025 21:26
- Add sensor criticality system with configurable timeouts:
  * Critical sensors (cloud detection): 5min tolerance, assume dangerous if failed
  * Non-critical sensors (SQM, rain, UPS): 2-3min tolerance, assume safe if failed
- Fix NoneType comparison crashes in weather safety evaluation
- Fix SQL syntax errors from empty sensor values (convert to NULL)
- Add comprehensive error handling and graceful degradation
- Update documentation with troubleshooting guide

This prevents single sensor failures from making the observatory inoperable
while maintaining safety for critical cloud detection systems.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace empty values (::) with 'U' (unknown) before passing to rrdtool
- MySQL pipeline still receives original data for failure tracking
- Prevents 'Cannot convert empty string to float' RRD errors

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Use while loop to handle multiple consecutive empty values
- Converts N:::-95.08 to N:U:U:-95.08 properly
- Ensures RRDtool gets the expected number of data points

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add debug logging to see what values are being returned from database
- Will help identify why sensor data appears missing when it exists in DB

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants