diff --git a/docs/RedisQueue.md b/docs/RedisQueue.md new file mode 100644 index 000000000..51782e401 --- /dev/null +++ b/docs/RedisQueue.md @@ -0,0 +1,113 @@ +# Redis Pending Request Queue Beta + +Halibut provides a Redis backed pending request queue for multi node setups. This solves the problem where +a cluster of multiple clients need to send commands to polling services which connect to only one of the +clients. + +For example if we have two clients ClientA and ClientB and the Service connects to B, yet A wants +to execute an RPC. Currently that won't work as the request will end up in the in memory queue for ClientA +but it needs to be accessible to ClientB. + +The Redis queue solves this, as the request is placed into Redis allowing ClientB to access the request and +so send it to the Service. + +## How to run Redis for this queue. + +Redis can be started by running the following command in the root of the directory: + +``` +docker run -v `pwd`/redis-conf:/usr/local/etc/redis -p 6379:6379 --name redis -d redis redis-server /usr/local/etc/redis/redis.conf +``` + +Note that Redis is configured to have no backup, everything must be in memory. The queue makes this assumption to function. + +# Design + +## Background +### What is a Pending Request Queue. + +Halibut turns an RPC call into a RequestMessage which is placed into the Pending Request Queue. This is done by calling: `ResponseMessage QueueAndWait(RequestMessage)`. Which is a blocking call that queues the RequestMessage and waits for the ResponseMessage before returning. + +Polling service, e.g, Tentacle, call into the `Dequeue` method of the queue to get the next `RequestMessage` to processing. It then responds by calling `ApplyResponse(ResponseMessage)`, doing so results in `QueueAndWait()` returning the ResponseMessage. This in turn results in the RPC call completing. + +The Redis Pending Request Queue solves the problem where we have multiple clients, that wish to execute RPC calls to a single Polling Service that is connected to exactly one client. For example Client A makes an RPC call, but the service is connected to Client B. The Redis Pending Request Queue is what moves the `RequestMessage` from Client A to Client B to be sent to the service. + +### Redis specific details relevant to the queue. + +First we need to understand just a little about Redis and how we are using redis: + - Redis may have data lose. + - Pub/Sub does not have guaranteed delivery, we can miss publication. + - Pub/Sub channels are not pets in Redis, they can be created simply by "subscribing" and are "deleted" when there are no subscribers to that channel. + - Redis is connected to via the network, which can be flaky we will make retries to Redis when we can. + +## High Level design. + +Setup: + - Client A is executing the RPC call + - Client B has the Polling service connected to it. + +At a high level steps the Redis Queue goes through to execute an RPC are: + + 1. Client B subscribes to the unique "RequestMessage Pulse Channel", as the client service is connected to it. The channel is keyed by the polling client id e.g. "poll://123" + 2. Client A executes an RPC and so Calls QueueAndWait with a RequestMessage. Each RequestMessage has a unique `GUID`. + 3. Client A subscribes to the `ResponseMessage channel` keyed by `GUID` to be notified when a response is available. + 4. Client A serialises the message and places the message into a hash in Redis keyed by the RequestMessage `Guid`. + 5. Client A Adds the `GUID` to the polling clients unique Redis list (aka queue). The key is the polling client id e.g. "poll://123". + 6. Client A pulses the polling clients unique "RequestMessage Pulse Channel", to alert to it that it has work to do. + 7. Client B receives the Pulse message and tries to dequeue a `GUID` from the polling clients unique Redis list (aka queue). + 8. Client B now has the `GUID` of the request and so atomically gets and deletes the RequestMessage from the Redis Hash using that guid. + 9. Client B sends the request to the tentacle, waits for the response, and calls `ApplyResponse()` with the ResponseMessage. + 10. Client B writes the `ResponseMessage` to redis in a hash using the `GUID` as the key. + 11. Client B Pulses the `ResponseMessage channel` keyed by the RequestMessage `GUID`, that a Response is available. + 12. Client A receives a pulse on the `ResponseMessage channel` and so knows a Response is available, it reads the response from Redis and returns from the `QueueAndWait()` method. + +## Cancellation support. + +The Redis PRQ supports cancellation, even for collected requests. This is done by the RequestReceiverNode (ie the node connected to the Service) subscribing to the request cancellation channel and polling for request cancellation. + +## Dealing with minor network interruptions to Redis. + +All operations to redis are retried for up to 30s, this allows connections to Redis to go down briefly with impacting RPCs even for non idempotent RPCs. + +### Pub/Sub and Poll. + +Since Pub/Sub does not have guaranteed delivery in Redis, in any place that we do Pub/Sub we must also have a form of polling. For example: + - When Dequeuing work not only are we subscribed but when `Dequeue()` is called we also check for work on the queue anyway. (Note that Dequeue() returns every 30s if there is no work, and thus we have polling.) + - When waiting for a Response, we are not only subscribed to the response channel we also poll to see if the Response has been sent back. + +## Dealing with nodes that disappear mid request. + +Either node could go offline at any time, including during execution of an RPC. For example: + - The node executing the RPC could go offline, when the node with the Service connected is sending the Request to the Service. + - The node sending the Request to the Service could go offline. + +To handle this case in a way that allows for large file transfers aka request that take a long time, we have a concept of "heart beats". + +When executing an RPC both nodes involved will send heart beats to a unique channel keyed by the request ID AND the nodes role in the RPC. For example: +- The node executing RPC will pulse heart beats to a channel with a key such as `NodeSendingRequest:GUID` +- The node sending the request to the service will pulse heart beats to a channel with a key such as: `NodeReceivingRequest:GUID` + +Now each node can watch for heart beats from the other node, when heart beats stop being sent they can assume it is offline and cancel/abandon the request. + +## Dealing with Redis losing its data. + +Since redis can lose data at anytime the queue is able to detect data lose and cancel any inflight requests when data lose occurs. + +## Message serialisation + +Message serialisation is provided by re-using the serialiser halibut uses for transferring requests/responses over the wire. + +## Cleanup of old data in Redis. + +All values in redis have a TTL applied, so redis will automatically clean up old keys if Halibut does not. + +Request message TTL: request pickup timeout + 2 minutes. +Response TTL: default 20 minutes. +Pending GUID list TTL: 1 day. +Heartbeat rates: 15s; timeouts: sender 90s, processor 60s. + +### DataStream + +DataStreams are not stored in the queue, instead an implementation of `IStoreDataStreamsForDistributedQueues` must be provided. It will be called with the DataStreams that are to be stored, and will be called again with the "husks" of a DataStream that needs to be re-hydrated. DataStreams have unique GUIDs which make it easier to find the data for re-hydration. + +Sub classing DataStream is a useful technique for avoiding the storage of DataStream data when it is trivial to read the data from some known places. For example a DataStream might be subclassed to hold the file location on disk that should be read when sending the data for a data stream. The halibut serialiser has been updated to work with sub classes of DataStream, in that it will ignore the sub class and send just the DataStream across the wire. This makes it safe to sub class DataStream for efficient storage and have that work with both listening and polling clients. diff --git a/redis-conf/redis.conf b/redis-conf/redis.conf new file mode 100644 index 000000000..90a3e5edf --- /dev/null +++ b/redis-conf/redis.conf @@ -0,0 +1,598 @@ +# Redis configuration file example + +# Note on units: when memory size is needed, it is possible to specify +# it in the usual form of 1k 5GB 4M and so forth: +# +# 1k => 1000 bytes +# 1kb => 1024 bytes +# 1m => 1000000 bytes +# 1mb => 1024*1024 bytes +# 1g => 1000000000 bytes +# 1gb => 1024*1024*1024 bytes +# +# units are case insensitive so 1GB 1Gb 1gB are all the same. + +# By default Redis does not run as a daemon. Use 'yes' if you need it. +# Note that Redis will write a pid file in /var/run/redis.pid when daemonized. +daemonize no + +# When running daemonized, Redis writes a pid file in /var/run/redis.pid by +# default. You can specify a custom pid file location here. +pidfile /var/run/redis.pid + +# Accept connections on the specified port, default is 6379. +# If port 0 is specified Redis will not listen on a TCP socket. +port 6379 + +# If you want you can bind a single interface, if the bind option is not +# specified all the interfaces will listen for incoming connections. +# +# bind 127.0.0.1 + +# Specify the path for the unix socket that will be used to listen for +# incoming connections. There is no default, so Redis will not listen +# on a unix socket when not specified. +# +# unixsocket /tmp/redis.sock +# unixsocketperm 755 + +# Close the connection after a client is idle for N seconds (0 to disable) +timeout 0 + +# TCP keepalive. +# +# If non-zero, use SO_KEEPALIVE to send TCP ACKs to clients in absence +# of communication. This is useful for two reasons: +# +# 1) Detect dead peers. +# 2) Take the connection alive from the point of view of network +# equipment in the middle. +# +# On Linux, the specified value (in seconds) is the period used to send ACKs. +# Note that to close the connection the double of the time is needed. +# On other kernels the period depends on the kernel configuration. +# +# A reasonable value for this option is 60 seconds. +tcp-keepalive 0 + +# Specify the server verbosity level. +# This can be one of: +# debug (a lot of information, useful for development/testing) +# verbose (many rarely useful info, but not a mess like the debug level) +# notice (moderately verbose, what you want in production probably) +# warning (only very important / critical messages are logged) +loglevel notice + +# Specify the log file name. Also 'stdout' can be used to force +# Redis to log on the standard output. Note that if you use standard +# output for logging but daemonize, logs will be sent to /dev/null +logfile stdout + +# To enable logging to the system logger, just set 'syslog-enabled' to yes, +# and optionally update the other syslog parameters to suit your needs. +# syslog-enabled no + +# Specify the syslog identity. +# syslog-ident redis + +# Specify the syslog facility. Must be USER or between LOCAL0-LOCAL7. +# syslog-facility local0 + +# Set the number of databases. The default database is DB 0, you can select +# a different one on a per-connection basis using SELECT where +# dbid is a number between 0 and 'databases'-1 +databases 16 + +################################ SNAPSHOTTING ################################# +# +# Save the DB on disk: +# +# save +# +# Will save the DB if both the given number of seconds and the given +# number of write operations against the DB occurred. +# +# In the example below the behaviour will be to save: +# after 900 sec (15 min) if at least 1 key changed +# after 300 sec (5 min) if at least 10 keys changed +# after 60 sec if at least 10000 keys changed +# +# Note: you can disable saving at all commenting all the "save" lines. +# +# It is also possible to remove all the previously configured save +# points by adding a save directive with a single empty string argument +# like in the following example: +# +# save "" + +# save 900 1 +# save 300 10 +# save 60 10000 +save "" + +# By default Redis will stop accepting writes if RDB snapshots are enabled +# (at least one save point) and the latest background save failed. +# This will make the user aware (in an hard way) that data is not persisting +# on disk properly, otherwise chances are that no one will notice and some +# distater will happen. +# +# If the background saving process will start working again Redis will +# automatically allow writes again. +# +# However if you have setup your proper monitoring of the Redis server +# and persistence, you may want to disable this feature so that Redis will +# continue to work as usually even if there are problems with disk, +# permissions, and so forth. +stop-writes-on-bgsave-error yes + +# Compress string objects using LZF when dump .rdb databases? +# For default that's set to 'yes' as it's almost always a win. +# If you want to save some CPU in the saving child set it to 'no' but +# the dataset will likely be bigger if you have compressible values or keys. +rdbcompression yes + +# Since version 5 of RDB a CRC64 checksum is placed at the end of the file. +# This makes the format more resistant to corruption but there is a performance +# hit to pay (around 10%) when saving and loading RDB files, so you can disable it +# for maximum performances. +# +# RDB files created with checksum disabled have a checksum of zero that will +# tell the loading code to skip the check. +rdbchecksum yes + +# The filename where to dump the DB +dbfilename dump.rdb + +# The working directory. +# +# The DB will be written inside this directory, with the filename specified +# above using the 'dbfilename' configuration directive. +# +# The Append Only File will also be created inside this directory. +# +# Note that you must specify a directory here, not a file name. +dir ./ + +################################# REPLICATION ################################# + +# Master-Slave replication. Use slaveof to make a Redis instance a copy of +# another Redis server. Note that the configuration is local to the slave +# so for example it is possible to configure the slave to save the DB with a +# different interval, or to listen to another port, and so on. +# +# slaveof + +# If the master is password protected (using the "requirepass" configuration +# directive below) it is possible to tell the slave to authenticate before +# starting the replication synchronization process, otherwise the master will +# refuse the slave request. +# +# masterauth + +# When a slave loses its connection with the master, or when the replication +# is still in progress, the slave can act in two different ways: +# +# 1) if slave-serve-stale-data is set to 'yes' (the default) the slave will +# still reply to client requests, possibly with out of date data, or the +# data set may just be empty if this is the first synchronization. +# +# 2) if slave-serve-stale-data is set to 'no' the slave will reply with +# an error "SYNC with master in progress" to all the kind of commands +# but to INFO and SLAVEOF. +# +slave-serve-stale-data yes + +# You can configure a slave instance to accept writes or not. Writing against +# a slave instance may be useful to store some ephemeral data (because data +# written on a slave will be easily deleted after resync with the master) but +# may also cause problems if clients are writing to it because of a +# misconfiguration. +# +# Since Redis 2.6 by default slaves are read-only. +# +# Note: read only slaves are not designed to be exposed to untrusted clients +# on the internet. It's just a protection layer against misuse of the instance. +# Still a read only slave exports by default all the administrative commands +# such as CONFIG, DEBUG, and so forth. To a limited extend you can improve +# security of read only slaves using 'rename-command' to shadow all the +# administrative / dangerous commands. +slave-read-only yes + +# Slaves send PINGs to server in a predefined interval. It's possible to change +# this interval with the repl_ping_slave_period option. The default value is 10 +# seconds. +# +# repl-ping-slave-period 10 + +# The following option sets a timeout for both Bulk transfer I/O timeout and +# master data or ping response timeout. The default value is 60 seconds. +# +# It is important to make sure that this value is greater than the value +# specified for repl-ping-slave-period otherwise a timeout will be detected +# every time there is low traffic between the master and the slave. +# +# repl-timeout 60 + +# Disable TCP_NODELAY on the slave socket after SYNC? +# +# If you select "yes" Redis will use a smaller number of TCP packets and +# less bandwidth to send data to slaves. But this can add a delay for +# the data to appear on the slave side, up to 40 milliseconds with +# Linux kernels using a default configuration. +# +# If you select "no" the delay for data to appear on the slave side will +# be reduced but more bandwidth will be used for replication. +# +# By default we optimize for low latency, but in very high traffic conditions +# or when the master and slaves are many hops away, turning this to "yes" may +# be a good idea. +repl-disable-tcp-nodelay no + +# The slave priority is an integer number published by Redis in the INFO output. +# It is used by Redis Sentinel in order to select a slave to promote into a +# master if the master is no longer working correctly. +# +# A slave with a low priority number is considered better for promotion, so +# for instance if there are three slaves with priority 10, 100, 25 Sentinel will +# pick the one wtih priority 10, that is the lowest. +# +# However a special priority of 0 marks the slave as not able to perform the +# role of master, so a slave with priority of 0 will never be selected by +# Redis Sentinel for promotion. +# +# By default the priority is 100. +slave-priority 100 + +################################## SECURITY ################################### + +# Require clients to issue AUTH before processing any other +# commands. This might be useful in environments in which you do not trust +# others with access to the host running redis-server. +# +# This should stay commented out for backward compatibility and because most +# people do not need auth (e.g. they run their own servers). +# +# Warning: since Redis is pretty fast an outside user can try up to +# 150k passwords per second against a good box. This means that you should +# use a very strong password otherwise it will be very easy to break. +# +# requirepass foobared + +# Command renaming. +# +# It is possible to change the name of dangerous commands in a shared +# environment. For instance the CONFIG command may be renamed into something +# hard to guess so that it will still be available for internal-use tools +# but not available for general clients. +# +# Example: +# +# rename-command CONFIG b840fc02d524045429941cc15f59e41cb7be6c52 +# +# It is also possible to completely kill a command by renaming it into +# an empty string: +# +# rename-command CONFIG "" +# +# Please note that changing the name of commands that are logged into the +# AOF file or transmitted to slaves may cause problems. + +################################### LIMITS #################################### + +# Set the max number of connected clients at the same time. By default +# this limit is set to 10000 clients, however if the Redis server is not +# able to configure the process file limit to allow for the specified limit +# the max number of allowed clients is set to the current file limit +# minus 32 (as Redis reserves a few file descriptors for internal uses). +# +# Once the limit is reached Redis will close all the new connections sending +# an error 'max number of clients reached'. +# +maxclients 10000 + +# Don't use more memory than the specified amount of bytes. +# When the memory limit is reached Redis will try to remove keys +# accordingly to the eviction policy selected (see maxmemmory-policy). +# +# If Redis can't remove keys according to the policy, or if the policy is +# set to 'noeviction', Redis will start to reply with errors to commands +# that would use more memory, like SET, LPUSH, and so on, and will continue +# to reply to read-only commands like GET. +# +# This option is usually useful when using Redis as an LRU cache, or to set +# an hard memory limit for an instance (using the 'noeviction' policy). +# +# WARNING: If you have slaves attached to an instance with maxmemory on, +# the size of the output buffers needed to feed the slaves are subtracted +# from the used memory count, so that network problems / resyncs will +# not trigger a loop where keys are evicted, and in turn the output +# buffer of slaves is full with DELs of keys evicted triggering the deletion +# of more keys, and so forth until the database is completely emptied. +# +# In short... if you have slaves attached it is suggested that you set a lower +# limit for maxmemory so that there is some free RAM on the system for slave +# output buffers (but this is not needed if the policy is 'noeviction'). +# +# maxmemory + +# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory +# is reached. You can select among five behaviors: +# +# volatile-lru -> remove the key with an expire set using an LRU algorithm +# allkeys-lru -> remove any key accordingly to the LRU algorithm +# volatile-random -> remove a random key with an expire set +# allkeys-random -> remove a random key, any key +# volatile-ttl -> remove the key with the nearest expire time (minor TTL) +# noeviction -> don't expire at all, just return an error on write operations +# +# Note: with any of the above policies, Redis will return an error on write +# operations, when there are not suitable keys for eviction. +# +# At the date of writing this commands are: set setnx setex append +# incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd +# sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby +# zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby +# getset mset msetnx exec sort +# +# The default is: +# +# maxmemory-policy volatile-lru + +# LRU and minimal TTL algorithms are not precise algorithms but approximated +# algorithms (in order to save memory), so you can select as well the sample +# size to check. For instance for default Redis will check three keys and +# pick the one that was used less recently, you can change the sample size +# using the following configuration directive. +# +# maxmemory-samples 3 + +############################## APPEND ONLY MODE ############################### + +# By default Redis asynchronously dumps the dataset on disk. This mode is +# good enough in many applications, but an issue with the Redis process or +# a power outage may result into a few minutes of writes lost (depending on +# the configured save points). +# +# The Append Only File is an alternative persistence mode that provides +# much better durability. For instance using the default data fsync policy +# (see later in the config file) Redis can lose just one second of writes in a +# dramatic event like a server power outage, or a single write if something +# wrong with the Redis process itself happens, but the operating system is +# still running correctly. +# +# AOF and RDB persistence can be enabled at the same time without problems. +# If the AOF is enabled on startup Redis will load the AOF, that is the file +# with the better durability guarantees. +# +# Please check http://redis.io/topics/persistence for more information. + +appendonly no + +# The name of the append only file (default: "appendonly.aof") +# appendfilename appendonly.aof + +# The fsync() call tells the Operating System to actually write data on disk +# instead to wait for more data in the output buffer. Some OS will really flush +# data on disk, some other OS will just try to do it ASAP. +# +# Redis supports three different modes: +# +# no: don't fsync, just let the OS flush the data when it wants. Faster. +# always: fsync after every write to the append only log . Slow, Safest. +# everysec: fsync only one time every second. Compromise. +# +# The default is "everysec", as that's usually the right compromise between +# speed and data safety. It's up to you to understand if you can relax this to +# "no" that will let the operating system flush the output buffer when +# it wants, for better performances (but if you can live with the idea of +# some data loss consider the default persistence mode that's snapshotting), +# or on the contrary, use "always" that's very slow but a bit safer than +# everysec. +# +# More details please check the following article: +# http://antirez.com/post/redis-persistence-demystified.html +# +# If unsure, use "everysec". + +# appendfsync always +appendfsync everysec +# appendfsync no + +# When the AOF fsync policy is set to always or everysec, and a background +# saving process (a background save or AOF log background rewriting) is +# performing a lot of I/O against the disk, in some Linux configurations +# Redis may block too long on the fsync() call. Note that there is no fix for +# this currently, as even performing fsync in a different thread will block +# our synchronous write(2) call. +# +# In order to mitigate this problem it's possible to use the following option +# that will prevent fsync() from being called in the main process while a +# BGSAVE or BGREWRITEAOF is in progress. +# +# This means that while another child is saving, the durability of Redis is +# the same as "appendfsync none". In practical terms, this means that it is +# possible to lose up to 30 seconds of log in the worst scenario (with the +# default Linux settings). +# +# If you have latency problems turn this to "yes". Otherwise leave it as +# "no" that is the safest pick from the point of view of durability. +no-appendfsync-on-rewrite no + +# Automatic rewrite of the append only file. +# Redis is able to automatically rewrite the log file implicitly calling +# BGREWRITEAOF when the AOF log size grows by the specified percentage. +# +# This is how it works: Redis remembers the size of the AOF file after the +# latest rewrite (if no rewrite has happened since the restart, the size of +# the AOF at startup is used). +# +# This base size is compared to the current size. If the current size is +# bigger than the specified percentage, the rewrite is triggered. Also +# you need to specify a minimal size for the AOF file to be rewritten, this +# is useful to avoid rewriting the AOF file even if the percentage increase +# is reached but it is still pretty small. +# +# Specify a percentage of zero in order to disable the automatic AOF +# rewrite feature. + +auto-aof-rewrite-percentage 100 +auto-aof-rewrite-min-size 64mb + +################################ LUA SCRIPTING ############################### + +# Max execution time of a Lua script in milliseconds. +# +# If the maximum execution time is reached Redis will log that a script is +# still in execution after the maximum allowed time and will start to +# reply to queries with an error. +# +# When a long running script exceed the maximum execution time only the +# SCRIPT KILL and SHUTDOWN NOSAVE commands are available. The first can be +# used to stop a script that did not yet called write commands. The second +# is the only way to shut down the server in the case a write commands was +# already issue by the script but the user don't want to wait for the natural +# termination of the script. +# +# Set it to 0 or a negative value for unlimited execution without warnings. +lua-time-limit 5000 + +################################## SLOW LOG ################################### + +# The Redis Slow Log is a system to log queries that exceeded a specified +# execution time. The execution time does not include the I/O operations +# like talking with the client, sending the reply and so forth, +# but just the time needed to actually execute the command (this is the only +# stage of command execution where the thread is blocked and can not serve +# other requests in the meantime). +# +# You can configure the slow log with two parameters: one tells Redis +# what is the execution time, in microseconds, to exceed in order for the +# command to get logged, and the other parameter is the length of the +# slow log. When a new command is logged the oldest one is removed from the +# queue of logged commands. + +# The following time is expressed in microseconds, so 1000000 is equivalent +# to one second. Note that a negative number disables the slow log, while +# a value of zero forces the logging of every command. +slowlog-log-slower-than 10000 + +# There is no limit to this length. Just be aware that it will consume memory. +# You can reclaim memory used by the slow log with SLOWLOG RESET. +slowlog-max-len 128 + +############################### ADVANCED CONFIG ############################### + +# Hashes are encoded using a memory efficient data structure when they have a +# small number of entries, and the biggest entry does not exceed a given +# threshold. These thresholds can be configured using the following directives. +hash-max-ziplist-entries 512 +hash-max-ziplist-value 64 + +# Similarly to hashes, small lists are also encoded in a special way in order +# to save a lot of space. The special representation is only used when +# you are under the following limits: +list-max-ziplist-entries 512 +list-max-ziplist-value 64 + +# Sets have a special encoding in just one case: when a set is composed +# of just strings that happens to be integers in radix 10 in the range +# of 64 bit signed integers. +# The following configuration setting sets the limit in the size of the +# set in order to use this special memory saving encoding. +set-max-intset-entries 512 + +# Similarly to hashes and lists, sorted sets are also specially encoded in +# order to save a lot of space. This encoding is only used when the length and +# elements of a sorted set are below the following limits: +zset-max-ziplist-entries 128 +zset-max-ziplist-value 64 + +# Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in +# order to help rehashing the main Redis hash table (the one mapping top-level +# keys to values). The hash table implementation Redis uses (see dict.c) +# performs a lazy rehashing: the more operation you run into an hash table +# that is rehashing, the more rehashing "steps" are performed, so if the +# server is idle the rehashing is never complete and some more memory is used +# by the hash table. +# +# The default is to use this millisecond 10 times every second in order to +# active rehashing the main dictionaries, freeing memory when possible. +# +# If unsure: +# use "activerehashing no" if you have hard latency requirements and it is +# not a good thing in your environment that Redis can reply form time to time +# to queries with 2 milliseconds delay. +# +# use "activerehashing yes" if you don't have such hard requirements but +# want to free memory asap when possible. +activerehashing yes + +# The client output buffer limits can be used to force disconnection of clients +# that are not reading data from the server fast enough for some reason (a +# common reason is that a Pub/Sub client can't consume messages as fast as the +# publisher can produce them). +# +# The limit can be set differently for the three different classes of clients: +# +# normal -> normal clients +# slave -> slave clients and MONITOR clients +# pubsub -> clients subcribed to at least one pubsub channel or pattern +# +# The syntax of every client-output-buffer-limit directive is the following: +# +# client-output-buffer-limit +# +# A client is immediately disconnected once the hard limit is reached, or if +# the soft limit is reached and remains reached for the specified number of +# seconds (continuously). +# So for instance if the hard limit is 32 megabytes and the soft limit is +# 16 megabytes / 10 seconds, the client will get disconnected immediately +# if the size of the output buffers reach 32 megabytes, but will also get +# disconnected if the client reaches 16 megabytes and continuously overcomes +# the limit for 10 seconds. +# +# By default normal clients are not limited because they don't receive data +# without asking (in a push way), but just after a request, so only +# asynchronous clients may create a scenario where data is requested faster +# than it can read. +# +# Instead there is a default limit for pubsub and slave clients, since +# subscribers and slaves receive data in a push fashion. +# +# Both the hard or the soft limit can be disabled by setting them to zero. +client-output-buffer-limit normal 0 0 0 +client-output-buffer-limit slave 256mb 64mb 60 +client-output-buffer-limit pubsub 32mb 8mb 60 + +# Redis calls an internal function to perform many background tasks, like +# closing connections of clients in timeot, purging expired keys that are +# never requested, and so forth. +# +# Not all tasks are perforemd with the same frequency, but Redis checks for +# tasks to perform accordingly to the specified "hz" value. +# +# By default "hz" is set to 10. Raising the value will use more CPU when +# Redis is idle, but at the same time will make Redis more responsive when +# there are many keys expiring at the same time, and timeouts may be +# handled with more precision. +# +# The range is between 1 and 500, however a value over 100 is usually not +# a good idea. Most users should use the default of 10 and raise this up to +# 100 only in environments where very low latency is required. +hz 10 + +# When a child rewrites the AOF file, if the following option is enabled +# the file will be fsync-ed every 32 MB of data generated. This is useful +# in order to commit the file to the disk more incrementally and avoid +# big latency spikes. +aof-rewrite-incremental-fsync yes + +################################## INCLUDES ################################### + +# Include one or more other config files here. This is useful if you +# have a standard template that goes to all Redis server but also need +# to customize a few per-server settings. Include files can include +# other files, so use this wisely. +# +# include /path/to/local.conf +# include /path/to/other.conf \ No newline at end of file diff --git a/source/Halibut.TestUtils.Contracts/IComplexObjectService.cs b/source/Halibut.TestUtils.Contracts/IComplexObjectService.cs index 9a788d3cd..e6706b4ea 100644 --- a/source/Halibut.TestUtils.Contracts/IComplexObjectService.cs +++ b/source/Halibut.TestUtils.Contracts/IComplexObjectService.cs @@ -13,6 +13,16 @@ public interface IComplexObjectService public class ComplexObjectMultipleDataStreams { + public ComplexObjectMultipleDataStreams() + { + } + + public ComplexObjectMultipleDataStreams(DataStream? payload1, DataStream? payload2) + { + Payload1 = payload1; + Payload2 = payload2; + } + public DataStream? Payload1; public DataStream? Payload2; } diff --git a/source/Halibut.Tests/BaseTest.cs b/source/Halibut.Tests/BaseTest.cs index 2c2b0302d..1ae8d199d 100644 --- a/source/Halibut.Tests/BaseTest.cs +++ b/source/Halibut.Tests/BaseTest.cs @@ -1,9 +1,12 @@ using System; using System.Threading; using System.Threading.Tasks; +using Halibut.Logging; using Halibut.Tests.Support; +using Halibut.Tests.Support.Logging; using NUnit.Framework; using NUnit.Framework.Interfaces; +using ILog = Halibut.Diagnostics.ILog; using ILogger = Serilog.ILogger; namespace Halibut.Tests @@ -15,6 +18,8 @@ public class BaseTest public CancellationToken CancellationToken { get; private set; } public ILogger Logger { get; private set; } = null!; + + public ILog HalibutLog { get; private set; } = null!; [SetUp] public void SetUp() @@ -25,6 +30,8 @@ public void SetUp() .Build() .ForContext(GetType()); + HalibutLog = new TestContextLogCreator("", LogLevel.Trace).CreateNewForPrefix(""); + Logger.Information("Trace log file {LogFile}", traceLogFileLogger.logFilePath); Logger.Information("Test started"); diff --git a/source/Halibut.Tests/Builders/RedisPendingRequestQueueBuilder.cs b/source/Halibut.Tests/Builders/RedisPendingRequestQueueBuilder.cs new file mode 100644 index 000000000..f2aa00e0b --- /dev/null +++ b/source/Halibut.Tests/Builders/RedisPendingRequestQueueBuilder.cs @@ -0,0 +1,70 @@ + +#if NET8_0_OR_GREATER +using System; +using Halibut.Logging; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.MessageStorage; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Tests.Queue; +using Halibut.Tests.Queue.Redis.Utils; +using Halibut.Tests.Support.Logging; +using Halibut.Tests.TestSetup.Redis; +using Halibut.Util; +using ILog = Halibut.Diagnostics.ILog; + +namespace Halibut.Tests.Builders +{ + public class RedisPendingRequestQueueBuilder : IPendingRequestQueueBuilder + { + ILog? log; + string? endpoint; + TimeSpan? pollingQueueWaitTimeout; + + public IPendingRequestQueueBuilder WithEndpoint(string endpoint) + { + this.endpoint = endpoint; + return this; + } + + public IPendingRequestQueueBuilder WithLog(ILog log) + { + this.log = log; + return this; + } + + public IPendingRequestQueueBuilder WithPollingQueueWaitTimeout(TimeSpan? pollingQueueWaitTimeout) + { + this.pollingQueueWaitTimeout = pollingQueueWaitTimeout; + return this; + } + + public QueueHolder Build() + { + var endpoint = new Uri(this.endpoint ?? "poll://endpoint001"); + var halibutTimeoutsAndLimits = new HalibutTimeoutsAndLimitsForTestsBuilder().Build(); + var log = this.log ?? new TestContextLogCreator("Queue", LogLevel.Trace).CreateNewForPrefix(""); + + if (this.pollingQueueWaitTimeout != null) + { + halibutTimeoutsAndLimits.PollingQueueWaitTimeout = pollingQueueWaitTimeout.Value; + } + + var disposableCollection = new DisposableCollection(); + + var redisFacade = new RedisFacade("localhost:" + RedisTestHost.Port(), (Guid.NewGuid()).ToString(), log); + disposableCollection.AddAsyncDisposable(redisFacade); + + var redisTransport = new HalibutRedisTransport(redisFacade); + var dataStreamStore = new InMemoryStoreDataStreamsForDistributedQueues(); + var messageSerializer = new QueueMessageSerializerBuilder().Build(); + var messageReaderWriter = new MessageSerialiserAndDataStreamStorage(messageSerializer, dataStreamStore); + + var queue = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), log, redisTransport, messageReaderWriter, halibutTimeoutsAndLimits); + + queue.WaitUntilQueueIsSubscribedToReceiveMessages().GetAwaiter().GetResult(); + + return new QueueHolder(queue, disposableCollection); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Builders/RequestMessageBuilder.cs b/source/Halibut.Tests/Builders/RequestMessageBuilder.cs index d9f685696..8b5e08b8e 100644 --- a/source/Halibut.Tests/Builders/RequestMessageBuilder.cs +++ b/source/Halibut.Tests/Builders/RequestMessageBuilder.cs @@ -6,6 +6,7 @@ namespace Halibut.Tests.Builders public class RequestMessageBuilder { readonly ServiceEndPointBuilder serviceEndPointBuilder = new(); + private Guid? activityId; public RequestMessageBuilder(string endpoint) { @@ -18,6 +19,12 @@ public RequestMessageBuilder WithServiceEndpoint(Action return this; } + public RequestMessageBuilder WithActivityId(Guid activityId) + { + this.activityId = activityId; + return this; + } + public RequestMessage Build() { var serviceEndPoint = serviceEndPointBuilder.Build(); @@ -25,11 +32,10 @@ public RequestMessage Build() var request = new RequestMessage { Id = Guid.NewGuid().ToString(), - Destination = serviceEndPoint + Destination = serviceEndPoint, + ActivityId = activityId ?? Guid.NewGuid(), }; - - return request; } } diff --git a/source/Halibut.Tests/Builders/ResponseMessageBuilder.cs b/source/Halibut.Tests/Builders/ResponseMessageBuilder.cs index 8ecd61b8e..acb9ea9e9 100644 --- a/source/Halibut.Tests/Builders/ResponseMessageBuilder.cs +++ b/source/Halibut.Tests/Builders/ResponseMessageBuilder.cs @@ -22,7 +22,7 @@ public ResponseMessage Build() var response = new ResponseMessage { Id = id, - Result = new object() + Result = "Hello World" }; return response; } diff --git a/source/Halibut.Tests/Halibut.Tests.csproj b/source/Halibut.Tests/Halibut.Tests.csproj index 8b2037fc9..602280ad8 100644 --- a/source/Halibut.Tests/Halibut.Tests.csproj +++ b/source/Halibut.Tests/Halibut.Tests.csproj @@ -7,6 +7,7 @@ true false 9.0 + VSTHRD002,VSTHRD003;VSTHRD103 enable true @@ -49,6 +50,7 @@ + diff --git a/source/Halibut.Tests/ManyPollingTentacleTests.cs b/source/Halibut.Tests/ManyPollingTentacleTests.cs new file mode 100644 index 000000000..8c1193def --- /dev/null +++ b/source/Halibut.Tests/ManyPollingTentacleTests.cs @@ -0,0 +1,191 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.Linq; +using System.Threading; +using System.Threading.Tasks; +using Docker.DotNet.Models; +using FluentAssertions; +using Halibut.Diagnostics; +using Halibut.Logging; +using Halibut.Queue.QueuedDataStreams; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.RedisDataLossDetection; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.ServiceModel; +using Halibut.Tests.Queue.Redis.Utils; +using Halibut.Tests.Support; +using Halibut.Tests.Support.Logging; +using Halibut.Tests.Support.TestAttributes; +using Halibut.Tests.TestServices; +using Halibut.Tests.TestServices.Async; +using Halibut.TestUtils.Contracts; +using Halibut.Util; +using NUnit.Framework; +using DisposableCollection = Halibut.Util.DisposableCollection; + +namespace Halibut.Tests +{ + [NonParallelizable] + [RedisTest] + public class ManyPollingTentacleTests : BaseTest + { + /// + /// Fuzz test, to check under load the queue still works. + /// + /// + /// + [Test] + [AllQueuesTestCases] + [NonParallelizable] + public async Task WhenMakingManyConcurrentRequestsToManyServices_AllRequestsCompleteSuccessfully_And(PendingRequestQueueTestCase queueTestCase) + { + var numberOfPollingServices = 100; + int concurrency = 20; + int numberOfCallsToMake = Math.Min(numberOfPollingServices, 20); + + var logFactory = new CachingLogFactory(new TestContextLogCreator("", LogLevel.Trace)); + var services = GetDelegateServiceFactory(); + await using var disposables = new DisposableCollection(); + var isRedis = queueTestCase.Name == PendingRequestQueueTestCase.RedisTestCaseName; + var log = new TestContextLogCreator("Redis", LogLevel.Fatal); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + await using (var octopus = new HalibutRuntimeBuilder() + .WithServerCertificate(Certificates.Octopus) + .WithPendingRequestQueueFactory(msgSer => + { + if (isRedis) + { + var watchForRedisLosingAllItsData = new WatchForRedisLosingAllItsData(redisFacade, log.CreateNewForPrefix("watcher")); + disposables.AddAsyncDisposable(watchForRedisLosingAllItsData); + + return new RedisPendingRequestQueueFactory(msgSer, + new InMemoryStoreDataStreamsForDistributedQueues(), + watchForRedisLosingAllItsData, + new HalibutRedisTransport(redisFacade), + new HalibutTimeoutsAndLimitsForTestsBuilder().Build(), + logFactory); + } + + return new PendingRequestQueueFactoryAsync(new HalibutTimeoutsAndLimitsForTestsBuilder().Build(), + logFactory); + }) + .WithHalibutTimeoutsAndLimits(new HalibutTimeoutsAndLimitsForTestsBuilder().Build()) + .Build()) + { + var listenPort = octopus.Listen(); + octopus.Trust(Certificates.TentacleListening.Thumbprint); + + var watchSubscriberCountCts = new CancelOnDisposeCancellationToken(CancellationToken); + watchSubscriberCountCts.AwaitTasksBeforeCTSDispose(Task.Run(async () => + { + while (!watchSubscriberCountCts.Token.IsCancellationRequested) + { + Logger.Information("Total subscribers: {TotalSubs}", redisFacade.TotalSubscribers); + await Task.Delay(1000); + } + })); + + var serviceEndpoint = new ServiceEndPoint(new Uri("https://localhost:" + listenPort), Certificates.Octopus.Thumbprint, new HalibutTimeoutsAndLimitsForTestsBuilder().Build()); + + var pollEndpoints = Enumerable.Range(0, numberOfPollingServices).Select(i => new Uri("poll://" + i + "Bob")).ToArray(); + + foreach (var pollEndpoint in pollEndpoints) + { + var tentacleListening = new HalibutRuntimeBuilder() + .WithServerCertificate(Certificates.TentacleListening) + .WithServiceFactory(services) + .WithHalibutTimeoutsAndLimits(new HalibutTimeoutsAndLimitsForTestsBuilder().Build()) + .Build(); + tentacleListening.Poll(pollEndpoint, serviceEndpoint, CancellationToken); + } + + var clients = pollEndpoints.Select(pollEndpoint => + octopus.CreateAsyncClient(new ServiceEndPoint(pollEndpoint, Certificates.Octopus.Thumbprint, new HalibutTimeoutsAndLimitsForTestsBuilder().Build()))) + .ToList(); + + var tasks = new List(); + + int expectedTotalNumberOfCallsToBeMade = concurrency * numberOfCallsToMake; + int actualCountOfCallsMade = 0; + + var totalSw = Stopwatch.StartNew(); + for (int i = 0; i < concurrency; i++) + { + tasks.Add(Task.Run(async () => + { + var shuffle = clients.ToArray(); + Random.Shared.Shuffle(shuffle); + shuffle = shuffle.Take(numberOfCallsToMake).ToArray(); + foreach (var client in shuffle) + { + await client.SayHelloAsync("World"); + var v = Interlocked.Increment(ref actualCountOfCallsMade); + if (v % 5000 == 0) + { + var timePerCall = totalSw.ElapsedMilliseconds / v; + Logger.Information("Done: {CallsMade} / {Total} avg: {A}", v, expectedTotalNumberOfCallsToBeMade, timePerCall); + } + + } + })); + } + + await Task.WhenAll(tasks); + + totalSw.Stop(); + + Logger.Information("Time was {T}", totalSw.ElapsedMilliseconds); + + actualCountOfCallsMade.Should().Be(expectedTotalNumberOfCallsToBeMade); + + if(isRedis) + { + redisFacade.TotalSubscribers.Should().Be(pollEndpoints.Length); + } + + // Check for exceptions. + foreach (var task in tasks) + { + await task; + } + } + } + + static DelegateServiceFactory GetDelegateServiceFactory() + { + var services = new DelegateServiceFactory(); + services.Register(() => new AsyncEchoServiceWithDelay()); + return services; + } + } + + public class AsyncEchoServiceWithDelay : IAsyncEchoService + { + + public Task LongRunningOperationAsync(CancellationToken cancellationToken) + { + throw new NotImplementedException(); + } + + public async Task SayHelloAsync(string name, CancellationToken cancellationToken) + { + + await Task.Delay(10, cancellationToken); + return name + "..."; + } + + public Task CrashAsync(CancellationToken cancellationToken) + { + throw new NotImplementedException(); + } + + public Task CountBytesAsync(DataStream dataStream, CancellationToken cancellationToken) + { + throw new NotImplementedException(); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/QueueMessageSerializerBuilder.cs b/source/Halibut.Tests/Queue/QueueMessageSerializerBuilder.cs new file mode 100644 index 000000000..35dece9d0 --- /dev/null +++ b/source/Halibut.Tests/Queue/QueueMessageSerializerBuilder.cs @@ -0,0 +1,42 @@ +using System; +using Halibut.Diagnostics; +using Halibut.Queue; +using Halibut.Transport.Protocol; +using Newtonsoft.Json; + +namespace Halibut.Tests.Queue +{ + public class QueueMessageSerializerBuilder + { + ITypeRegistry? typeRegistry; + Action? configureSerializer; + + public QueueMessageSerializerBuilder WithTypeRegistry(ITypeRegistry typeRegistry) + { + this.typeRegistry = typeRegistry; + return this; + } + + public QueueMessageSerializerBuilder WithSerializerSettings(Action configure) + { + configureSerializer = configure; + return this; + } + + public QueueMessageSerializer Build() + { + var typeRegistry = this.typeRegistry ?? new TypeRegistry(); + + StreamCapturingJsonSerializer StreamCapturingSerializer() + { + var settings = MessageSerializerBuilder.CreateSerializer(); + var binder = new RegisteredSerializationBinder(typeRegistry); + settings.SerializationBinder = binder; + configureSerializer?.Invoke(settings); + return new StreamCapturingJsonSerializer(settings); + } + + return new QueueMessageSerializer(StreamCapturingSerializer); + } + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/QueueMessageSerializerFixture.cs b/source/Halibut.Tests/Queue/QueueMessageSerializerFixture.cs new file mode 100644 index 000000000..3d6eccdca --- /dev/null +++ b/source/Halibut.Tests/Queue/QueueMessageSerializerFixture.cs @@ -0,0 +1,150 @@ +#if NET8_0_OR_GREATER +using System; +using System.IO; +using System.Threading; +using System.Threading.Tasks; +using FluentAssertions; +using Halibut.Diagnostics; +using Halibut.Tests.Support; +using Halibut.Transport.Protocol; +using NUnit.Framework; + +namespace Halibut.Tests.Queue +{ + public class QueueMessageSerializerFixture : BaseTest + { + [Test] + public void SerializeAndDeserializeSimpleStringMessage_ShouldRoundTrip() + { + // Arrange + var sut = new QueueMessageSerializerBuilder().Build(); + + const string testMessage = "Hello, Queue!"; + + // Act + var (json, dataStreams) = sut.WriteMessage(testMessage); + var (deserializedMessage, deserializedDataStreams) = sut.ReadMessage(json); + + // Assert + deserializedMessage.Should().Be(testMessage); + dataStreams.Should().BeEmpty(); + deserializedDataStreams.Should().BeEmpty(); + } + + [Test] + public void SerializeAndDeserializeRequestMessage_ShouldRoundTrip_RequestMessage() + { + // Arrange + var sut = new QueueMessageSerializerBuilder().Build(); + + var request = new RequestMessage() + { + Id = "hello", + ActivityId = Guid.NewGuid(), + Destination = new ServiceEndPoint(new Uri("poll://bob"), "n", new HalibutTimeoutsAndLimits()), + ServiceName = "service", + MethodName = "Echo", + Params = new object[] {"hello"} + }; + + // Act + var (json, dataStreams) = sut.WriteMessage(request); + var (deserializedMessage, deserializedDataStreams) = sut.ReadMessage(json); + + // Assert + deserializedMessage.Should().BeEquivalentTo(request); + dataStreams.Should().BeEmpty(); + deserializedDataStreams.Should().BeEmpty(); + } + + [Test] + public void SerializeAndDeserializeRequestMessageWithDataStream_ShouldRoundTrip_RequestMessage() + { + var typeRegistry = new TypeRegistry(); + typeRegistry.Register(typeof(IHaveTypeWithDataStreamsService)); + // Arrange + var sut = new QueueMessageSerializerBuilder() + .WithTypeRegistry(typeRegistry) + .Build(); + + var request = new RequestMessage() + { + Id = "hello", + ActivityId = Guid.NewGuid(), + Destination = new ServiceEndPoint(new Uri("poll://bob"), "n", new HalibutTimeoutsAndLimits()), + ServiceName = "service", + MethodName = "Echo", + Params = new object[] {"hello", + DataStream.FromString("yo") + ,new TypeWithDataStreams(new RepeatingStringDataStream("bob", 10)) + + } + }; + + // Act + var (json, dataStreams) = sut.WriteMessage(request); + + dataStreams[1].Should().BeOfType(); + + json.Should().Contain("TypeWithDataStreams"); + json.Should().NotContain("RepeatingStringDataStream"); + + var (deserializedMessage, deserializedDataStreams) = sut.ReadMessage(json); + + // Assert + // Manually check each field of the deserializedMessage matches the request + deserializedMessage.Id.Should().Be(request.Id); + deserializedMessage.ActivityId.Should().Be(request.ActivityId); + deserializedMessage.Destination.BaseUri.Should().Be(request.Destination.BaseUri); + deserializedMessage.ServiceName.Should().Be(request.ServiceName); + deserializedMessage.MethodName.Should().Be(request.MethodName); + + // Check Params array structure (DataStreams are replaced with placeholders during serialization) + deserializedMessage.Params.Should().HaveCount(request.Params.Length); + deserializedMessage.Params[0].Should().Be(request.Params[0]); // First param is a simple string + // Note: Params[1] and Params[2] contain DataStreams which get replaced during serialization + + deserializedDataStreams.Count.Should().Be(2); + } + + public interface IHaveTypeWithDataStreamsService + { + public void Do(TypeWithDataStreams typeWithDataStreams); + } + public class TypeWithDataStreams + { + public TypeWithDataStreams(DataStream dataStream) + { + DataStream = dataStream; + } + + public DataStream DataStream { get; set; } + } + + + public class RepeatingStringDataStream : DataStream + { + string toRepeat; + int HowManyTimes; + + public RepeatingStringDataStream(string toRepeat, int howManyTimes) + : base(toRepeat.GetUTF8Bytes().Length * howManyTimes, WriteRepeatedStringsAsync(toRepeat, howManyTimes)) + { + this.toRepeat = toRepeat; + HowManyTimes = howManyTimes; + } + + static Func WriteRepeatedStringsAsync(string toRepeat, int howManyTimes) + { + return (async (stream, token) => + { + for (int i = 0; i < howManyTimes; i++) + { + await stream.WriteAsync(toRepeat.GetUTF8Bytes(), token); + } + }); + } + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/NodeHeartBeat/NodeHeartBeatSenderFixture.cs b/source/Halibut.Tests/Queue/Redis/NodeHeartBeat/NodeHeartBeatSenderFixture.cs new file mode 100644 index 000000000..3c5ec5057 --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/NodeHeartBeat/NodeHeartBeatSenderFixture.cs @@ -0,0 +1,376 @@ +#if NET8_0_OR_GREATER +using System; +using System.Collections.Concurrent; +using System.Threading; +using System.Threading.Tasks; +using FluentAssertions; +using Halibut.Logging; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.NodeHeartBeat; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Tests.Builders; +using Halibut.Tests.Queue.Redis.Utils; +using Halibut.Tests.Support; +using Halibut.Tests.Support.Logging; +using Halibut.Tests.TestSetup.Redis; +using Nito.AsyncEx; +using NUnit.Framework; + +namespace Halibut.Tests.Queue.Redis.NodeHeartBeat +{ + [RedisTest] + public class NodeHeartBeatSenderFixture : BaseTest + { + [Test] + public async Task WhenCreated_ShouldStartSendingHeartbeats() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var requestActivityId = Guid.NewGuid(); + var log = new TestContextLogCreator("NodeHeartBeat", LogLevel.Trace).CreateNewForPrefix(""); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + + var anyHeartBeatReceived = new AsyncManualResetEvent(false); + + // Subscribe to heartbeats before creating the sender + await using var subscription = await redisTransport.SubscribeToNodeHeartBeatChannel( + endpoint, requestActivityId, HalibutQueueNodeSendingPulses.RequestProcessorNode, async () => + { + await Task.CompletedTask; + anyHeartBeatReceived.Set(); + }, CancellationToken); + + // Act + await using var heartBeatSender = new NodeHeartBeatSender(endpoint, requestActivityId, redisTransport, log, HalibutQueueNodeSendingPulses.RequestProcessorNode, defaultDelayBetweenPulses: TimeSpan.FromSeconds(1)); + + // Wait for a heart beat. + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(20), CancellationToken), anyHeartBeatReceived.WaitAsync()); + + // Assert + anyHeartBeatReceived.IsSet.Should().BeTrue("Should have received at least one heartbeat"); + } + + + [Test] + public async Task WhenHeartBeatsAreBeingSent_AndTheConnectionToRedisIsBrieflyDown_HeatBeatsShouldBeSentAgain() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var requestActivityId = Guid.NewGuid(); + var log = new TestContextLogCreator("NodeHeartBeat", LogLevel.Trace).CreateNewForPrefix(""); + var guid = Guid.NewGuid(); + + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var unstableRedisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder, guid); + await using var stableRedisFacade = RedisFacadeBuilder.CreateRedisFacade(port: RedisTestHost.Port(), prefix: guid); + + var redisTransport = new HalibutRedisTransport(unstableRedisFacade); + + var heartbeatsReceived = new ConcurrentBag(); + var heartBeatReceivedEvent = new AsyncManualResetEvent(false); + + // Subscribe with stable connection to monitor heartbeats + await using var subscription = await new HalibutRedisTransport(stableRedisFacade) + .SubscribeToNodeHeartBeatChannel( + endpoint, requestActivityId, HalibutQueueNodeSendingPulses.RequestProcessorNode, async () => + { + await Task.CompletedTask; + heartBeatReceivedEvent.Set(); + heartbeatsReceived.Add(DateTimeOffset.Now); + }, CancellationToken); + + // Act + await using var heartBeatSender = new NodeHeartBeatSender(endpoint, requestActivityId, redisTransport, log, HalibutQueueNodeSendingPulses.RequestProcessorNode, TimeSpan.FromSeconds(1)); + + // Wait for initial heartbeat + await heartBeatReceivedEvent.WaitAsync(CancellationToken); + + // Interrupt connection + portForwarder.EnterKillNewAndExistingConnectionsMode(); + + // Outage is 10s + await Task.Delay(TimeSpan.FromSeconds(4), CancellationToken); + heartBeatReceivedEvent.Reset(); + + // Restore connection + portForwarder.ReturnToNormalMode(); + + // Assert + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(10)), heartBeatReceivedEvent.WaitAsync(CancellationToken)); + heartBeatReceivedEvent.IsSet.Should().BeTrue("Heart beats should be sent again after the interruption."); + } + + [Test] + public async Task WhenDisposed_ShouldStopSendingHeartbeats() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var requestActivityId = Guid.NewGuid(); + var log = new TestContextLogCreator("NodeHeartBeat", LogLevel.Trace).CreateNewForPrefix(""); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + + var heartbeatsReceived = new ConcurrentBag(); + var anyHeartBeatReceived = new AsyncManualResetEvent(false); + + await using var subscription = await redisTransport.SubscribeToNodeHeartBeatChannel( + endpoint, requestActivityId, HalibutQueueNodeSendingPulses.RequestProcessorNode, async () => + { + await Task.CompletedTask; + anyHeartBeatReceived.Set(); + heartbeatsReceived.Add(DateTimeOffset.Now); + }, CancellationToken); + + // Act + var heartBeatSender = new NodeHeartBeatSender(endpoint, requestActivityId, redisTransport, log, HalibutQueueNodeSendingPulses.RequestProcessorNode, defaultDelayBetweenPulses: TimeSpan.FromSeconds(1)); + + // Wait for some heartbeats + await anyHeartBeatReceived.WaitAsync(CancellationToken); + + // Dispose the sender + await heartBeatSender.DisposeAsync(); + + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(4)), heartBeatSender.TaskSendingPulses); + anyHeartBeatReceived.Reset(); + + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(5), CancellationToken), anyHeartBeatReceived.WaitAsync()); + + // Assert + anyHeartBeatReceived.IsSet.Should().BeFalse(); + heartBeatSender.TaskSendingPulses.IsCompleted.Should().BeTrue(); + } + + [Test] + public async Task WhenWatchingTheNodeProcessingTheRequestIsStillAlive_AndHeartbeatsStopBeingSent_ShouldReturnProcessingNodeIsLikelyDisconnected() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var requestActivityId = Guid.NewGuid(); + var log = new TestContextLogCreator("NodeHeartBeat", LogLevel.Trace).CreateNewForPrefix(""); + var guid = Guid.NewGuid(); + + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var unstableRedisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder, guid); + await using var stableRedisFacade = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + var unstableRedisTransport = new HalibutRedisTransport(unstableRedisFacade); + var stableRedisTransport = new HalibutRedisTransport(stableRedisFacade); + + var request = new RequestMessageBuilder(endpoint.ToString()) + .WithActivityId(requestActivityId) + .Build(); + var pendingRequest = new RedisPendingRequest(request, log); + + // Start heartbeat sender + await using var heartBeatSender = new NodeHeartBeatSender( + endpoint, + requestActivityId, + unstableRedisTransport, + log, + HalibutQueueNodeSendingPulses.RequestProcessorNode, + defaultDelayBetweenPulses: TimeSpan.FromMilliseconds(200)); + + // Mark request as collected so watcher proceeds to monitoring phase + await pendingRequest.RequestHasBeenCollectedAndWillBeTransferred(); + + // Start the watcher + var watcherTask = NodeHeartBeatWatcher.WatchThatNodeProcessingTheRequestIsStillAlive( + endpoint, + request, + pendingRequest, + stableRedisTransport, + TimeSpan.FromSeconds(1), + log, + TimeSpan.FromSeconds(5), // Short timeout for test + CancellationToken); + + // Wait for initial heartbeats to establish baseline + await Task.Delay(TimeSpan.FromSeconds(3), CancellationToken); + watcherTask.IsCompleted.Should().BeFalse(); + + // Act - Kill the connection to stop heartbeats + portForwarder.EnterKillNewAndExistingConnectionsMode(); + + // Assert + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(10)), watcherTask); + watcherTask.IsCompleted.Should().BeTrue("Since it should have detected no heart beats have been sent for some time."); + var result = await watcherTask; + result.Should().Be(NodeWatcherResult.NodeMayHaveDisconnected); + } + + [Test] + public async Task WhenWatchingTheNodeProcessingTheRequestIsStillAlive_AndTheWatchersConnectionToRedisGoesDown_ShouldReturnProcessingNodeIsLikelyDisconnected() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var requestActivityId = Guid.NewGuid(); + var log = new TestContextLogCreator("NodeHeartBeat", LogLevel.Trace).CreateNewForPrefix(""); + var guid = Guid.NewGuid(); + + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var unstableRedisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder, guid); + await using var stableRedisFacade = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + var unstableRedisTransport = new HalibutRedisTransport(unstableRedisFacade); + var stableRedisTransport = new HalibutRedisTransport(stableRedisFacade); + + var request = new RequestMessageBuilder(endpoint.ToString()) + .WithActivityId(requestActivityId) + .Build(); + var pendingRequest = new RedisPendingRequest(request, log); + + // Start heartbeat sender + await using var heartBeatSender = new NodeHeartBeatSender( + endpoint, + requestActivityId, + stableRedisTransport, + log, + HalibutQueueNodeSendingPulses.RequestProcessorNode, + defaultDelayBetweenPulses: TimeSpan.FromMilliseconds(200)); + + // Mark request as collected so watcher proceeds to monitoring phase + await pendingRequest.RequestHasBeenCollectedAndWillBeTransferred(); + + // Start the watcher + var watcherTask = NodeHeartBeatWatcher.WatchThatNodeProcessingTheRequestIsStillAlive( + endpoint, + request, + pendingRequest, + unstableRedisTransport, + timeBetweenCheckingIfRequestWasCollected: TimeSpan.FromSeconds(1), + log, + maxTimeBetweenHeartBeetsBeforeProcessingNodeIsAssumedToBeOffline: TimeSpan.FromSeconds(5), // Short timeout for test + CancellationToken); + + // Wait for initial heartbeats to establish baseline + await Task.Delay(TimeSpan.FromSeconds(3), CancellationToken); + watcherTask.IsCompleted.Should().BeFalse(); + + // Act - Kill the connection to stop heartbeats + portForwarder.EnterKillNewAndExistingConnectionsMode(); + + // Assert + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(20)), watcherTask); + watcherTask.IsCompleted.Should().BeTrue("Since it should have detected no heart beats have been sent for some time."); + var result = await watcherTask; + result.Should().Be(NodeWatcherResult.NodeMayHaveDisconnected); + } + + [Test] + public async Task WhenWatchingTheNodeProcessingTheRequestIsStillAlive_AndTheConnectionIsSuperStableAndWeStopWatching_WatcherShouldReturnNodeStayedConnected() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var requestActivityId = Guid.NewGuid(); + var log = new TestContextLogCreator("NodeHeartBeat", LogLevel.Trace).CreateNewForPrefix(""); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + + var request = new RequestMessageBuilder(endpoint.ToString()) + .WithActivityId(requestActivityId) + .Build(); + var pendingRequest = new RedisPendingRequest(request, log); + await pendingRequest.RequestHasBeenCollectedAndWillBeTransferred(); + + await using var heartBeatSender = new NodeHeartBeatSender(endpoint, requestActivityId, redisTransport, log, HalibutQueueNodeSendingPulses.RequestProcessorNode, TimeSpan.FromSeconds(1)); + + using var watcherCts = new CancellationTokenSource(); + var watcherTask = NodeHeartBeatWatcher.WatchThatNodeProcessingTheRequestIsStillAlive( + endpoint, + request, + pendingRequest, + redisTransport, + timeBetweenCheckingIfRequestWasCollected: TimeSpan.FromSeconds(1), + log, + maxTimeBetweenHeartBeetsBeforeProcessingNodeIsAssumedToBeOffline: TimeSpan.FromMinutes(1), + watcherCts.Token); + + await Task.Delay(100); + + // Act + await watcherCts.CancelAsync(); + + // Assert + var result = await watcherTask; + result.Should().Be(NodeWatcherResult.NoDisconnectSeen); + } + + [Test] + public async Task SenderAndReceiverNodeTypes_ShouldUseDistinctChannels() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var requestActivityId = Guid.NewGuid(); + var log = new TestContextLogCreator("NodeHeartBeat", LogLevel.Trace).CreateNewForPrefix(""); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + + var senderHeartbeatsReceived = new AsyncManualResetEvent(false); + var receiverHeartbeatsReceived = new AsyncManualResetEvent(false); + + // Subscribe to sender heartbeats + await using var senderSubscription = await redisTransport.SubscribeToNodeHeartBeatChannel( + endpoint, requestActivityId, HalibutQueueNodeSendingPulses.RequestSenderNode, async () => + { + await Task.CompletedTask; + senderHeartbeatsReceived.Set(); + }, CancellationToken); + + // Subscribe to receiver heartbeats + await using var receiverSubscription = await redisTransport.SubscribeToNodeHeartBeatChannel( + endpoint, requestActivityId, HalibutQueueNodeSendingPulses.RequestProcessorNode, async () => + { + await Task.CompletedTask; + receiverHeartbeatsReceived.Set(); + }, CancellationToken); + + // Act - Create sender node heartbeat sender + await using var senderHeartBeatSender = new NodeHeartBeatSender(endpoint, requestActivityId, redisTransport, log, HalibutQueueNodeSendingPulses.RequestSenderNode, defaultDelayBetweenPulses: TimeSpan.FromSeconds(1)); + + // Wait for sender heartbeat + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(5), CancellationToken), senderHeartbeatsReceived.WaitAsync()); + + // Create receiver node heartbeat sender + await using var receiverHeartBeatSender = new NodeHeartBeatSender(endpoint, requestActivityId, redisTransport, log, HalibutQueueNodeSendingPulses.RequestProcessorNode, defaultDelayBetweenPulses: TimeSpan.FromSeconds(1)); + + // Wait for receiver heartbeat + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(5), CancellationToken), receiverHeartbeatsReceived.WaitAsync()); + + // Assert + senderHeartbeatsReceived.IsSet.Should().BeTrue("Should have received sender heartbeat"); + receiverHeartbeatsReceived.IsSet.Should().BeTrue("Should have received receiver heartbeat"); + } + + [Test] + public async Task SenderNodeHeartbeats_ShouldNotBeReceivedByReceiverSubscription() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var requestActivityId = Guid.NewGuid(); + var log = new TestContextLogCreator("NodeHeartBeat", LogLevel.Trace).CreateNewForPrefix(""); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + + var receiverHeartbeatsReceived = new AsyncManualResetEvent(false); + + // Subscribe only to receiver heartbeats + await using var receiverSubscription = await redisTransport.SubscribeToNodeHeartBeatChannel( + endpoint, requestActivityId, HalibutQueueNodeSendingPulses.RequestProcessorNode, async () => + { + await Task.CompletedTask; + receiverHeartbeatsReceived.Set(); + }, CancellationToken); + + // Act - Create sender node heartbeat sender (should not trigger receiver subscription) + await using var senderHeartBeatSender = new NodeHeartBeatSender(endpoint, requestActivityId, redisTransport, log, HalibutQueueNodeSendingPulses.RequestSenderNode, defaultDelayBetweenPulses: TimeSpan.FromSeconds(1)); + + // Wait to see if receiver subscription gets triggered (it shouldn't) + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(3), CancellationToken), receiverHeartbeatsReceived.WaitAsync()); + + // Assert + receiverHeartbeatsReceived.IsSet.Should().BeFalse("Should not have received sender heartbeat on receiver subscription"); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/RedisDataLoseDetection/WatchForRedisLosingAllItsDataFixture.cs b/source/Halibut.Tests/Queue/Redis/RedisDataLoseDetection/WatchForRedisLosingAllItsDataFixture.cs new file mode 100644 index 000000000..011cfb64e --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/RedisDataLoseDetection/WatchForRedisLosingAllItsDataFixture.cs @@ -0,0 +1,100 @@ +#if NET8_0_OR_GREATER +using System; +using System.Threading.Tasks; +using FluentAssertions; +using Halibut.Queue.Redis.RedisDataLossDetection; +using Halibut.Tests.Queue.Redis.Utils; +using Halibut.Tests.Support; +using NUnit.Framework; + +namespace Halibut.Tests.Queue.Redis.RedisDataLoseDetection +{ + [RedisTest] + public class WatchForRedisLosingAllItsDataFixture : BaseTest + { + [Test] + public async Task WhenTheConnectionToRedisCanNotBeCreated_WhenAskingForALostDataCancellationToken_ATimeoutOccurs() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + portForwarder.EnterKillNewAndExistingConnectionsMode(); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder, null); + await using var watcher = new WatchForRedisLosingAllItsData(redisFacade, HalibutLog, watchInterval:TimeSpan.FromSeconds(1)); + + + await AssertException.Throws(watcher.GetTokenForDataLossDetection(TimeSpan.FromSeconds(1), CancellationToken)); + } + + [Test] + public async Task WhenTheConnectionToRedisIsInitiallyDown_WhenAskingForALostDataCancellationToken_AndTheConnectionToRedisReturns_TheCancellationTokenIsReturned() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + portForwarder.EnterKillNewAndExistingConnectionsMode(); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder, null); + await using var watcher = new WatchForRedisLosingAllItsData(redisFacade, HalibutLog, watchInterval:TimeSpan.FromSeconds(1)); + + var _ = Task.Run(async () => + { + await Task.Delay(2000); + portForwarder.ReturnToNormalMode(); + + }); + + await watcher.GetTokenForDataLossDetection(TimeSpan.FromSeconds(20), CancellationToken); + } + + [Test] + public async Task WhenTheWatcherWatchesRedisForMoreThanTheKeyTTL_NoDataLoseShouldBeDetected() + { + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + await using var watcher = new WatchForRedisLosingAllItsData(redisFacade, HalibutLog, watchInterval:TimeSpan.FromMilliseconds(100), keyTTL: TimeSpan.FromSeconds(2)); + var watcherCt = await watcher.GetTokenForDataLossDetection(TimeSpan.FromSeconds(20), CancellationToken); + + await Task.Delay(TimeSpan.FromSeconds(4)); + watcherCt.IsCancellationRequested.Should().BeFalse(); + } + + [Test] + public async Task WhenWatchingRedisForDataLose_AndRedisLosesAllDaya_DataLoseIsDetected() + { + // Arrange - Create Redis container using the builder + Logger.Information("Creating Redis container"); + await using var container = new RedisContainerBuilder().Build(); + + Logger.Information("Starting Redis container"); + await container.StartAsync(); + Logger.Information("Redis container started successfully with connection string: {ConnectionString}", container.ConnectionString); + + // Create RedisFacade connected to the containerized Redis + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(host: "localhost", container.RedisPort); + + await using var watcher = new WatchForRedisLosingAllItsData(redisFacade, HalibutLog, watchInterval: TimeSpan.FromSeconds(1)); + + Logger.Information("Getting initial cancellation token for data loss detection (20 second timeout)"); + var watcherCT = await watcher.GetTokenForDataLossDetection(TimeSpan.FromSeconds(20), CancellationToken); + Logger.Information("Initial cancellation token obtained, IsCancellationRequested: {IsCancellationRequested}", watcherCT.IsCancellationRequested); + + watcherCT.IsCancellationRequested.Should().BeFalse(); + + // Act + Logger.Information("Stopping Redis container to simulate data loss"); + await container.StopAsync(); + Logger.Information("Redis container stopped"); + + Logger.Information("Starting Redis container again (fresh instance, data lost)"); + await container.StartAsync(); + Logger.Information("Redis container restarted"); + + // Assert + Logger.Information("Waiting up to 10 seconds for data loss detection to trigger cancellation token"); + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(10), watcherCT)); + + watcherCT.IsCancellationRequested.Should().BeTrue("Should have detected the data loss"); + + Logger.Information("Getting new cancellation token to verify recovery"); + var nextToken = await watcher.GetTokenForDataLossDetection(TimeSpan.FromSeconds(20), CancellationToken); + + nextToken.IsCancellationRequested.Should().BeFalse("The new token should have no data loss"); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/RedisHelpers/RedisFacadeFixture.cs b/source/Halibut.Tests/Queue/Redis/RedisHelpers/RedisFacadeFixture.cs new file mode 100644 index 000000000..70d03cd17 --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/RedisHelpers/RedisFacadeFixture.cs @@ -0,0 +1,631 @@ +#if NET8_0_OR_GREATER +using System; +using System.Collections.Generic; +using System.Linq; +using System.Threading.Tasks; +using FluentAssertions; +using Halibut.Logging; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Tests.Queue.Redis.Utils; +using Halibut.Tests.Support; +using Halibut.Tests.Support.Logging; +using Halibut.Util.AsyncEx; +using Nito.AsyncEx; +using NUnit.Framework; + +namespace Halibut.Tests.Queue.Redis.RedisHelpers +{ + [RedisTest] + public class RedisFacadeFixture : BaseTest + { + [Test] + public async Task SetString_AndGetString_ShouldStoreAndRetrieveValue() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var value = "test-value"; + + // Act + await redisFacade.SetString(key, value, TimeSpan.FromMinutes(1), CancellationToken); + var retrievedValue = await redisFacade.GetString(key, CancellationToken); + + // Assert + retrievedValue.Should().Be(value); + } + + [Test] + public async Task GetString_WithNonExistentKey_ShouldReturnNull() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var nonExistentKey = Guid.NewGuid().ToString(); + + // Act + var retrievedValue = await redisFacade.GetString(nonExistentKey, CancellationToken); + + // Assert + retrievedValue.Should().BeNull(); + } + + [Test] + public async Task SetInHash_ShouldStoreValueInHash() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var field = "test-field"; + var payload = "test-payload"; + var values = new Dictionary { { field, payload } }; + + // Act + await redisFacade.SetInHash(key, values, TimeSpan.FromMinutes(1), CancellationToken); + + // Assert - We'll verify by trying to get and delete it + var retrievedValues = await redisFacade.TryGetAndDeleteFromHash(key, new[] { field }, CancellationToken); + retrievedValues.Should().NotBeNull(); + retrievedValues![field].Should().Be(payload); + } + + [Test] + public async Task TryGetAndDeleteFromHash_WithExistingValue_ShouldReturnValueAndDelete() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var field = "test-field"; + var payload = "test-payload"; + var values = new Dictionary { { field, payload } }; + + await redisFacade.SetInHash(key, values, TimeSpan.FromMinutes(1), CancellationToken); + + // Act + var retrievedValues = await redisFacade.TryGetAndDeleteFromHash(key, new[] { field }, CancellationToken); + + // Assert + retrievedValues.Should().NotBeNull(); + retrievedValues![field].Should().Be(payload); + } + + [Test] + public async Task HashContainsKey_WithExistingField_ShouldReturnTrue() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var field = "test-field"; + var payload = "test-payload"; + var values = new Dictionary { { field, payload } }; + + await redisFacade.SetInHash(key, values, TimeSpan.FromMinutes(1), CancellationToken); + + // Act + var exists = await redisFacade.HashContainsKey(key, field, CancellationToken); + + // Assert + exists.Should().BeTrue(); + } + + [Test] + public async Task HashContainsKey_WithNonExistentField_ShouldReturnFalse() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var nonExistentField = "non-existent-field"; + + // Act + var exists = await redisFacade.HashContainsKey(key, nonExistentField, CancellationToken); + + // Assert + exists.Should().BeFalse(); + } + + [Test] + public async Task HashContainsKey_WithNonExistentKey_ShouldReturnFalse() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var nonExistentKey = Guid.NewGuid().ToString(); + var field = "test-field"; + + // Act + var exists = await redisFacade.HashContainsKey(nonExistentKey, field, CancellationToken); + + // Assert + exists.Should().BeFalse(); + } + + [Test] + public async Task TryGetAndDeleteFromHash_ShouldDeleteTheEntireKey() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var field = "test-field"; + var payload = "test-payload"; + var values = new Dictionary { { field, payload } }; + + await redisFacade.SetInHash(key, values, TimeSpan.FromMinutes(1), CancellationToken); + + // Verify the hash field exists + var existsBefore = await redisFacade.HashContainsKey(key, field, CancellationToken); + existsBefore.Should().BeTrue(); + + // Act + var retrievedValues = await redisFacade.TryGetAndDeleteFromHash(key, new[] { field }, CancellationToken); + + // Assert + retrievedValues.Should().NotBeNull(); + retrievedValues![field].Should().Be(payload); + + // Verify the entire key was deleted (not just the field) + var existsAfter = await redisFacade.HashContainsKey(key, field, CancellationToken); + existsAfter.Should().BeFalse(); + + // Verify trying to get it again returns null + var secondRetrieval = await redisFacade.TryGetAndDeleteFromHash(key, new[] { field }, CancellationToken); + secondRetrieval.Should().BeNull(); + } + + [Test] + public async Task ListRightPushAsync_AndListLeftPopAsync_ShouldWorkAsQueue() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var payload1 = "first-item"; + var payload2 = "second-item"; + + // Act - Push items to the right + await redisFacade.ListRightPushAsync(key, payload1, TimeSpan.FromMinutes(1), CancellationToken); + await redisFacade.ListRightPushAsync(key, payload2, TimeSpan.FromMinutes(1), CancellationToken); + + // Pop items from the left (FIFO) + var firstItem = await redisFacade.ListLeftPopAsync(key, CancellationToken); + var secondItem = await redisFacade.ListLeftPopAsync(key, CancellationToken); + + // Assert + firstItem.Should().Be(payload1); + secondItem.Should().Be(payload2); + } + + [Test] + public async Task ListLeftPopAsync_WithEmptyList_ShouldReturnNull() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var emptyListKey = Guid.NewGuid().ToString(); + + // Act + var result = await redisFacade.ListLeftPopAsync(emptyListKey, CancellationToken); + + // Assert + result.Should().BeNull(); + } + + [Test] + public async Task PublishToChannel_AndSubscribeToChannel_ShouldDeliverMessage() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var channelName = Guid.NewGuid().ToString(); + var testMessage = "test-message"; + var receivedMessages = new List(); + var messageReceived = new TaskCompletionSource(); + + // Subscribe to the channel + await using var subscription = await redisFacade.SubscribeToChannel(channelName, async message => + { + await Task.Yield(); // Make it properly async + if (!message.Message.IsNull) + { + receivedMessages.Add(message.Message!); + messageReceived.SetResult(true); + } + }, CancellationToken); + + // Act - Publish a message + await redisFacade.PublishToChannel(channelName, testMessage, CancellationToken); + + // Wait for the message to be received + await messageReceived.Task.TimeoutAfter(TimeSpan.FromSeconds(5), CancellationToken); + + // Assert + receivedMessages.Should().HaveCount(1); + receivedMessages[0].Should().Be(testMessage); + } + + [Test] + public async Task PublishToChannel_WithMultipleMessages_ShouldDeliverAllMessages() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var channelName = Guid.NewGuid().ToString(); + var messages = new[] { "message1", "message2", "message3" }; + var receivedMessages = new List(); + var allMessagesReceived = new TaskCompletionSource(); + + // Subscribe to the channel + await using var subscription = await redisFacade.SubscribeToChannel(channelName, async message => + { + await Task.Yield(); + if (!message.Message.IsNull) + { + receivedMessages.Add(message.Message!); + if (receivedMessages.Count == messages.Length) + { + allMessagesReceived.SetResult(true); + } + } + }, CancellationToken); + + // Act - Publish multiple messages + foreach (var msg in messages) + { + await redisFacade.PublishToChannel(channelName, msg, CancellationToken); + } + + // Wait for all messages to be received + await allMessagesReceived.Task.TimeoutAfter(TimeSpan.FromSeconds(5), CancellationToken); + + // Assert + receivedMessages.Should().HaveCount(3); + receivedMessages.Should().Contain(messages); + } + + [Test] + public async Task SubscribeToChannel_WhenDisposed_ShouldUnsubscribe() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var channelName = Guid.NewGuid().ToString(); + var receivedMessages = new List(); + + var subscription = await redisFacade.SubscribeToChannel(channelName, async message => + { + await Task.Yield(); + if (!message.Message.IsNull) + { + receivedMessages.Add(message.Message!); + } + }, CancellationToken); + + // Act - Dispose the subscription + await subscription.DisposeAsync(); + + // Publish a message after unsubscribing + await redisFacade.PublishToChannel(channelName, "should-not-receive", CancellationToken); + + // Wait a bit to ensure no message is received + await Task.Delay(100); + + // Assert + receivedMessages.Should().BeEmpty(); + } + + [Test] + public async Task KeyPrefixing_ShouldIsolateDataBetweenDifferentPrefixes() + { + // Arrange + var prefix1 = Guid.NewGuid(); + var prefix2 = Guid.NewGuid(); + + await using var redisFacade1 = RedisFacadeBuilder.CreateRedisFacade(prefix: prefix1); + await using var redisFacade2 = RedisFacadeBuilder.CreateRedisFacade(prefix: prefix2); + + var key = "shared-key"; + var value1 = "value-from-facade1"; + var value2 = "value-from-facade2"; + + // Act - Set values with the same key but different prefixes + await redisFacade1.SetString(key, value1, TimeSpan.FromMinutes(1), CancellationToken); + await redisFacade2.SetString(key, value2, TimeSpan.FromMinutes(1), CancellationToken); + + // Get values using both facades + var retrievedValue1 = await redisFacade1.GetString(key, CancellationToken); + var retrievedValue2 = await redisFacade2.GetString(key, CancellationToken); + + // Assert - Each facade should retrieve its own value + retrievedValue1.Should().Be(value1); + retrievedValue2.Should().Be(value2); + } + + [Test] + public async Task SetInHash_WithTTL_ShouldExpireAfterSpecifiedTime() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var field = "test-field"; + var payload = "test-payload"; + var values = new Dictionary { { field, payload } }; + + // Act - Set a value in hash with short TTL that we can actually test + await redisFacade.SetInHash(key, values, TimeSpan.FromMinutes(3), CancellationToken); + + // Immediately verify it exists + var immediateExists = await redisFacade.HashContainsKey(key, field, CancellationToken); + immediateExists.Should().BeTrue(); + + // Also verify we can retrieve the value immediately + var immediateValues = await redisFacade.TryGetAndDeleteFromHash(key, new[] { field }, CancellationToken); + immediateValues.Should().NotBeNull(); + immediateValues![field].Should().Be(payload); + + // Set the value again to test expiration (since TryGetAndDeleteFromHash removes it) + await redisFacade.SetInHash(key, values, TimeSpan.FromMilliseconds(3), CancellationToken); + + // Assert - Should eventually expire + await ShouldEventually.Eventually(async () => + { + var exists = await redisFacade.HashContainsKey(key, field, CancellationToken); + exists.Should().BeFalse("the hash key should expire after TTL"); + }, TimeSpan.FromSeconds(5), CancellationToken); + + // Verify TryGetAndDeleteFromHash also returns null for expired key + var expiredValues = await redisFacade.TryGetAndDeleteFromHash(key, new[] { field }, CancellationToken); + expiredValues.Should().BeNull(); + } + + [Test] + public async Task DeleteString_WithExistingKey_ShouldReturnTrueAndDeleteValue() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var value = "test-value"; + + // Set a value first + await redisFacade.SetString(key, value, TimeSpan.FromMinutes(1), CancellationToken); + + // Verify it exists + var existingValue = await redisFacade.GetString(key, CancellationToken); + existingValue.Should().Be(value); + + // Act + var deleteResult = await redisFacade.DeleteString(key, CancellationToken); + + // Assert + deleteResult.Should().BeTrue(); + + // Verify the value is gone + var deletedValue = await redisFacade.GetString(key, CancellationToken); + deletedValue.Should().BeNull(); + } + + [Test] + public async Task DeleteString_WithNonExistentKey_ShouldReturnFalse() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var nonExistentKey = Guid.NewGuid().ToString(); + + // Act + var deleteResult = await redisFacade.DeleteString(nonExistentKey, CancellationToken); + + // Assert + deleteResult.Should().BeFalse(); + } + + [Test] + public async Task SetTtlForString_WithExistingKey_ShouldUpdateTTL() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var value = "test-value"; + + // Set a value first with a long TTL + await redisFacade.SetString(key, value, TimeSpan.FromHours(1), CancellationToken); + + // Verify it exists + var existingValue = await redisFacade.GetString(key, CancellationToken); + existingValue.Should().Be(value); + + // Act - Update TTL to a shorter time + await redisFacade.SetTtlForString(key, TimeSpan.FromMinutes(1), CancellationToken); + + // Assert - Value should still exist immediately after TTL update + var valueAfterTtlUpdate = await redisFacade.GetString(key, CancellationToken); + valueAfterTtlUpdate.Should().Be(value); + + // Note: We can't easily test the actual TTL expiration in a unit test + // without waiting, but we've verified the operation completes successfully + } + + [Test] + public async Task SetString_WithShortTTL_ShouldExpire() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var value = "test-value"; + + // Act - Set with very short TTL + await redisFacade.SetString(key, value, TimeSpan.FromMilliseconds(3), CancellationToken); + + // Assert - Should eventually expire + await ShouldEventually.Eventually(async () => + { + var expiredValue = await redisFacade.GetString(key, CancellationToken); + expiredValue.Should().BeNull("the string should expire after TTL"); + }, TimeSpan.FromSeconds(5), CancellationToken); + } + + [Test] + public async Task ListRightPushAsync_WithShortTTL_ShouldExpire() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var payload = "test-payload"; + + // Act - Push with very short TTL + await redisFacade.ListRightPushAsync(key, payload, TimeSpan.FromSeconds(30), CancellationToken); + + // Immediately verify it exists + var immediateValue = await redisFacade.ListLeftPopAsync(key, CancellationToken); + immediateValue.Should().Be(payload); + + // Push another item and test expiration + await redisFacade.ListRightPushAsync(key, payload, TimeSpan.FromMilliseconds(3), CancellationToken); + + // Assert - Should eventually expire + await ShouldEventually.Eventually(async () => + { + var listValue = await redisFacade.ListLeftPopAsync(key, CancellationToken); + listValue.Should().BeNull("the list should expire after TTL"); + }, TimeSpan.FromSeconds(5), CancellationToken); + } + + [Test] + public void IsConnected_WhenNotInitialized_ShouldReturnFalse() + { + // Arrange + var redisFacade = new RedisFacade("localhost", "test-prefix", new TestContextLogCreator("Redis", LogLevel.Trace).CreateNewForPrefix("")); + + // Act & Assert + redisFacade.IsConnected.Should().BeFalse(); + } + + [Test] + public async Task IsConnected_AfterSuccessfulOperation_ShouldReturnTrue() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + + // Act - Perform an operation to initialize connection + await redisFacade.SetString(Guid.NewGuid().ToString(), "test", TimeSpan.FromMinutes(1), CancellationToken); + + // Assert + redisFacade.IsConnected.Should().BeTrue(); + } + + [Test] + public async Task TotalSubscribers_ShouldTrackActiveSubscriptions() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var channelName = Guid.NewGuid().ToString(); + + // Act & Assert - Initially no subscribers + redisFacade.TotalSubscribers.Should().Be(0); + + // Subscribe to channels + await using var subscription1 = await redisFacade.SubscribeToChannel(channelName + "1", _ => Task.CompletedTask, CancellationToken); + redisFacade.TotalSubscribers.Should().Be(1); + + await using var subscription2 = await redisFacade.SubscribeToChannel(channelName + "2", _ => Task.CompletedTask, CancellationToken); + redisFacade.TotalSubscribers.Should().Be(2); + + // Dispose one subscription + await subscription1.DisposeAsync(); + redisFacade.TotalSubscribers.Should().Be(1); + + // Dispose second subscription + await subscription2.DisposeAsync(); + redisFacade.TotalSubscribers.Should().Be(0); + } + + [Test] + public async Task MultipleSetString_WithDifferentTTLs_ShouldRespectIndividualTTLs() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key1 = Guid.NewGuid().ToString(); + var key2 = Guid.NewGuid().ToString(); + var value1 = "value1"; + var value2 = "value2"; + + // Act - Set with different TTLs + await redisFacade.SetString(key1, value1, TimeSpan.FromMilliseconds(3), CancellationToken); // Short TTL + await redisFacade.SetString(key2, value2, TimeSpan.FromMinutes(1), CancellationToken); // Long TTL + + // Assert - First should eventually expire, second should still exist + await ShouldEventually.Eventually(async () => + { + var expiredValue1 = await redisFacade.GetString(key1, CancellationToken); + expiredValue1.Should().BeNull("the first string should expire after short TTL"); + }, TimeSpan.FromSeconds(5), CancellationToken); + + // Verify the second key still exists after the first expires + var stillExists2 = await redisFacade.GetString(key2, CancellationToken); + stillExists2.Should().Be(value2); + } + + [Test] + public async Task TryGetAndDeleteFromHash_WithConcurrentCalls_ShouldReturnValueToExactlyOneCall() + { + // Arrange + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var key = Guid.NewGuid().ToString(); + var field = "test-field"; + var payload = "test-payload"; + var values = new Dictionary { { field, payload } }; + const int concurrentCallCount = 20; + + // Set a value in the hash + await redisFacade.SetInHash(key, values, TimeSpan.FromMinutes(1), CancellationToken); + + var countDownLatch = new AsyncCountdownEvent(concurrentCallCount); + + // Act - Make multiple concurrent calls to TryGetAndDeleteFromHash + var concurrentTasks = new Task?>[concurrentCallCount]; + for (int i = 0; i < concurrentCallCount; i++) + { + concurrentTasks[i] = Task.Run(async () => + { + countDownLatch.Signal(); + await countDownLatch.WaitAsync(); + return await redisFacade.TryGetAndDeleteFromHash(key, new[] { field }, CancellationToken); + }); + } + + var results = await Task.WhenAll(concurrentTasks); + + // Assert - Exactly one call should get the payload, all others should get null + var nonNullResults = results.Where(result => result != null).ToArray(); + var nullResults = results.Where(result => result == null).ToArray(); + + nonNullResults.Should().HaveCount(1, "exactly one concurrent call should retrieve the value"); + nonNullResults[0]![field].Should().Be(payload, "the successful call should return the correct payload"); + nullResults.Should().HaveCount(concurrentCallCount - 1, "all other concurrent calls should return null"); + + // Verify the hash key no longer exists + var existsAfterConcurrentCalls = await redisFacade.HashContainsKey(key, field, CancellationToken); + existsAfterConcurrentCalls.Should().BeFalse("the hash key should be deleted after the successful TryGetAndDeleteFromHash call"); + } + + [Test] + public async Task DisposeAsync_ShouldCleanupResourcesAndNotThrow() + { + // Arrange + var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + + // Perform some operations to initialize resources + await redisFacade.SetString(Guid.NewGuid().ToString(), "test", TimeSpan.FromMinutes(1), CancellationToken); + await using var subscription = await redisFacade.SubscribeToChannel(Guid.NewGuid().ToString(), _ => Task.CompletedTask, CancellationToken); + + // Act & Assert - Dispose should not throw + Func disposeAction = async () => await redisFacade.DisposeAsync(); + await disposeAction.Should().NotThrowAsync(); + } + + [Test] + public async Task DisposeAsync_CalledMultipleTimes_ShouldNotThrow() + { + // Arrange + var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + await redisFacade.SetString(Guid.NewGuid().ToString(), "test", TimeSpan.FromMinutes(1), CancellationToken); + + // Act & Assert - Multiple dispose calls should not throw + await redisFacade.DisposeAsync(); + + Func secondDisposeAction = async () => await redisFacade.DisposeAsync(); + await secondDisposeAction.Should().NotThrowAsync(); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/RedisHelpers/RedisFacadeWhenRedisGoesDownAwayTests.cs b/source/Halibut.Tests/Queue/Redis/RedisHelpers/RedisFacadeWhenRedisGoesDownAwayTests.cs new file mode 100644 index 000000000..9a4c1fb0f --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/RedisHelpers/RedisFacadeWhenRedisGoesDownAwayTests.cs @@ -0,0 +1,320 @@ +#if NET8_0_OR_GREATER +using System; +using System.Collections.Concurrent; +using System.Collections.Generic; +using System.Threading.Tasks; +using FluentAssertions; +using Halibut.Tests.Queue.Redis.Utils; +using Halibut.Tests.Support; +using NUnit.Framework; + +namespace Halibut.Tests.Queue.Redis.RedisHelpers +{ + [RedisTest] + public class RedisFacadeWhenRedisGoesDownAwayTests : BaseTest + { + [Test] + public async Task WhenTheEstablishedConnectionToRedisBrieflyGoesDown_WeCanEventuallyInteractWithRedisAgain() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder); + + await redisFacade.SetString("foo", "bar", TimeSpan.FromMinutes(1), CancellationToken); + + (await redisFacade.GetString("foo", CancellationToken)).Should().Be("bar"); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + portForwarder.ReturnToNormalMode(); + + await redisFacade.GetString("foo", CancellationToken); + } + + [Test] + public async Task WhenTheEstablishedConnectionToRedisBrieflyGoesDown_WeCanPublishToAChannel() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + var guid = Guid.NewGuid(); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder, guid); + + await using var redisFacadeReliable = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + var receivedMessages = new ConcurrentBag(); + await using var subscription = await redisFacadeReliable.SubscribeToChannel("test-channel", async message => + { + await Task.CompletedTask; + receivedMessages.Add(message.Message!); + }, CancellationToken); + + // Establish connection first + await redisFacade.SetString("connection", "established", TimeSpan.FromMinutes(1), CancellationToken); + + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + portForwarder.ReturnToNormalMode(); + + // Assert + await redisFacade.PublishToChannel("test-channel", "test-message", CancellationToken); + + // Check that publish actually happened. + await ShouldEventually.Eventually(() => receivedMessages.Should().Contain("test-message"), TimeSpan.FromSeconds(10), CancellationToken); + } + + [Test] + public async Task WhenTheEstablishedConnectionToRedisBrieflyGoesDown_WeCanImmediatelySetInHash() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder); + + // Establish connection first + await redisFacade.SetString("connection", "established", TimeSpan.FromMinutes(1), CancellationToken); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + portForwarder.ReturnToNormalMode(); + + // Assert + await redisFacade.SetInHash("test-hash", new Dictionary(){{"test-field", "test-value"}}, TimeSpan.FromMinutes(1), CancellationToken); + + // Check that the value was set. + var retrievedValue = await redisFacade.TryGetAndDeleteFromHash("test-hash", new []{"test-field"}, CancellationToken); + retrievedValue.Should().NotBeNull(); + retrievedValue.Should().ContainKey("test-field"); + retrievedValue!["test-field"].Should().Be("test-value"); + } + + [Test] + public async Task WhenTheEstablishedConnectionToRedisBrieflyGoesDown_WeCanImmediatelyTryGetAndDeleteFromHash() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder); + + // Establish connection and set up test data + await redisFacade.SetInHash("test-hash", new Dictionary(){{"test-field", "test-value"}}, TimeSpan.FromMinutes(1), CancellationToken); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + portForwarder.ReturnToNormalMode(); + + var retrievedValue = await redisFacade.TryGetAndDeleteFromHash("test-hash", new []{"test-field"}, CancellationToken); + retrievedValue.Should().NotBeNull(); + retrievedValue.Should().ContainKey("test-field"); + retrievedValue!["test-field"].Should().Be("test-value"); + } + + [Test] + public async Task WhenTheEstablishedConnectionToRedisBrieflyGoesDown_WeCanImmediatelyListRightPush() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder); + + // Establish connection first + await redisFacade.SetString("connection", "established", TimeSpan.FromMinutes(1), CancellationToken); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + portForwarder.ReturnToNormalMode(); + + await redisFacade.ListRightPushAsync("test-list", "test-item", TimeSpan.FromMinutes(1), CancellationToken); + + // Check we actually added something to the queue. + var poppedValue = await redisFacade.ListLeftPopAsync("test-list", CancellationToken); + poppedValue.Should().Be("test-item"); + } + + [Test] + public async Task WhenTheEstablishedConnectionToRedisBrieflyGoesDown_WeCanImmediatelyListLeftPop() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder); + + // Establish connection and set up test data + await redisFacade.ListRightPushAsync("test-list", "test-item", TimeSpan.FromMinutes(1), CancellationToken); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + portForwarder.ReturnToNormalMode(); + + var result = await redisFacade.ListLeftPopAsync("test-list", CancellationToken); + result.Should().Be("test-item"); + } + + [Test] + public async Task WhenTheEstablishedConnectionToRedisBrieflyGoesDown_WeCanImmediatelySetString() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder); + + // Establish connection first + await redisFacade.SetString("connection", "established", TimeSpan.FromMinutes(1), CancellationToken); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + portForwarder.ReturnToNormalMode(); + + await redisFacade.SetString("test-key", "test-value", TimeSpan.FromMinutes(1), CancellationToken); + + // Verify we can read back the string + var retrievedValue = await redisFacade.GetString("test-key", CancellationToken); + retrievedValue.Should().Be("test-value"); + } + + [Test] + public async Task WhenTheEstablishedConnectionToRedisBrieflyGoesDown_WeCanImmediatelyGetString() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder); + + // Establish connection and set up test data + await redisFacade.SetString("test-key", "test-value", TimeSpan.FromMinutes(1), CancellationToken); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + portForwarder.ReturnToNormalMode(); + + var result = await redisFacade.GetString("test-key", CancellationToken); + result.Should().Be("test-value"); + } + + [Test] + public async Task WhenTheEstablishedConnectionToRedisBrieflyGoesDown_WeCanImmediatelyHashContainsKey() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder); + + // Establish connection and set up test data + await redisFacade.SetInHash("test-hash", new Dictionary(){{"test-field", "test-value"}}, TimeSpan.FromMinutes(1), CancellationToken); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + portForwarder.ReturnToNormalMode(); + + var exists = await redisFacade.HashContainsKey("test-hash", "test-field", CancellationToken); + exists.Should().BeTrue(); + } + + [Test] + public async Task WhenTheConnectionToRedisHasBeenEstablished_AndIsLaterTerminated_AndThenWeTryToSubscribe_WhenTheConnectionIsRestored_WeCanReceiveMessages() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + var guid = Guid.NewGuid(); + await using var redisViaPortForwarder = RedisFacadeBuilder.CreateRedisFacade(portForwarder: portForwarder, prefix: guid); + await using var redisStableConnection = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + await redisStableConnection.PublishToChannel("bob", "establishing connection to redis", CancellationToken); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + + var msgs = new ConcurrentBag(); + using var subscribeToChannelTask = redisViaPortForwarder.SubscribeToChannel("bob", async message => + { + await Task.CompletedTask; + msgs.Add(message.Message!); + }, CancellationToken); + + // Give everything enough time to have a crack at trying to subscribe to messages. + await Task.Delay(2000); + await redisStableConnection.PublishToChannel("bob", "MISSED", CancellationToken); + + // Just in case the subscriber reconnects faster than redis publishes the MISSED message. + await Task.Delay(2000); + + portForwarder.ReturnToNormalMode(); + + // Keep going around the loop until we recieve something + while (msgs.Count == 0) + { + Logger.Information("Trying again"); + await redisStableConnection.PublishToChannel("bob", "RECONNECT", CancellationToken); + await Task.Delay(1000); + } + + msgs.Should().Contain("RECONNECT", "Since the subscriber should eventually connect back up"); + msgs.Should().NotContain("MISSED", "Since this was sent when the subscriber could not have been connected. " + + "If this is seen maybe the test itself has a bug."); + } + + [Test] + public async Task WhenTheConnectionIsNeverEstablished_AndWeTryToSubscribe_WhenTheConnectionIsRestored_WeCanReceiveMessages() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + var guid = Guid.NewGuid(); + await using var redisViaPortForwarder = RedisFacadeBuilder.CreateRedisFacade(portForwarder: portForwarder, prefix: guid); + await using var redisStableConnection = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + + var msgs = new ConcurrentBag(); + using var subscribeToChannelTask = redisViaPortForwarder.SubscribeToChannel("bob", async message => + { + await Task.CompletedTask; + msgs.Add(message.Message!); + }, CancellationToken); + + // Give everything enough time to have a crack at trying to subscribe to messages. + await Task.Delay(2000); + await redisStableConnection.PublishToChannel("bob", "MISSED", CancellationToken); + + // Just in case the subscriber reconnects faster than the publish call. + await Task.Delay(2000); + + portForwarder.ReturnToNormalMode(); + + // Keep going around the loop until we recieve something + while (msgs.Count == 0) + { + Logger.Information("Trying again"); + await redisStableConnection.PublishToChannel("bob", "RECONNECT", CancellationToken); + await Task.Delay(1000); + } + + msgs.Should().Contain("RECONNECT", "Since the subscriber should eventually connect up"); + msgs.Should().NotContain("MISSED", "Since this was sent when the subscriber could not have been connected. " + + "If this is seen maybe the test itself has a bug."); + } + + [Test] + public async Task WhenSubscribedAndTheConnectionGoesDown_WhenTheConnectionIsRestored_MessagesCanEventuallyBeSentToTheSubscriberAgain() + { + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + + var guid = Guid.NewGuid(); + await using var redisViaPortForwarder = RedisFacadeBuilder.CreateRedisFacade(portForwarder: portForwarder, prefix: guid); + await using var redisStableConnection = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + var msgs = new ConcurrentBag(); + await using var channel = await redisViaPortForwarder.SubscribeToChannel("bob", async message => + { + await Task.CompletedTask; + msgs.Add(message.Message!); + }, CancellationToken); + + // Check both sides can publish. + await redisViaPortForwarder.PublishToChannel("bob", "hello unstable", CancellationToken); + await redisStableConnection.PublishToChannel("bob", "hello stable", CancellationToken); + + await ShouldEventually.Eventually(() => msgs.Should().BeEquivalentTo("hello unstable", "hello stable"), TimeSpan.FromSeconds(10), CancellationToken); + + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + // The stable connection should still be able to publish to redis. + // But the subscriber on the unstable connection will not got the message. + await redisStableConnection.PublishToChannel("bob", "MISSED", CancellationToken); + await Task.Delay(1111); // Delay for some amount of time for redis to publish MISSED this won't be received since the connection is down. + portForwarder.ReturnToNormalMode(); + + while (msgs.Count <= 2) + { + CancellationToken.ThrowIfCancellationRequested(); + Logger.Information("Trying again"); + await redisStableConnection.PublishToChannel("bob", "RECONNECT", CancellationToken); + await Task.Delay(1000); + } + + msgs.Should().Contain("RECONNECT", "Since the subscriber should eventually be re-connected"); + msgs.Should().NotContain("MISSED", "Since this was sent when the subscriber could not have been connected. " + + "If this is seen maybe the test itself has a bug."); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/RedisPendingRequestQueueFixture.cs b/source/Halibut.Tests/Queue/Redis/RedisPendingRequestQueueFixture.cs new file mode 100644 index 000000000..0c528845f --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/RedisPendingRequestQueueFixture.cs @@ -0,0 +1,816 @@ +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using FluentAssertions; +using Halibut.Diagnostics; +using Halibut.Exceptions; +using Halibut.Logging; +using Halibut.Queue; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.Exceptions; +using Halibut.Queue.Redis.MessageStorage; +using Halibut.Queue.Redis.NodeHeartBeat; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.ServiceModel; +using Halibut.Tests.Builders; +using Halibut.Tests.Queue.Redis.Utils; +using Halibut.Tests.Support; +using Halibut.Tests.Support.Logging; +using Halibut.Tests.Support.TestAttributes; +using Halibut.Tests.Support.TestCases; +using Halibut.Tests.TestServices.Async; +using Halibut.Tests.Util; +using Halibut.TestUtils.Contracts; +using Halibut.Transport.Protocol; +using Halibut.Util; +using NSubstitute; +using NSubstitute.Extensions; +using NUnit.Framework; +using ILog = Halibut.Diagnostics.ILog; + +namespace Halibut.Tests.Queue.Redis +{ + [RedisTest] + public class RedisPendingRequestQueueFixture : BaseTest + { + [Test] + public async Task DequeueAsync_ShouldReturnRequestFromRedis() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var sut = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, CreateMessageSerialiserAndDataStreamStorage(), new HalibutTimeoutsAndLimits()); + await sut.WaitUntilQueueIsSubscribedToReceiveMessages(); + + var task = sut.QueueAndWaitAsync(request, CancellationToken.None); + + // Act + var result = await sut.DequeueAsync(CancellationToken); + + // Assert + result.Should().NotBeNull(); + result!.RequestMessage.Id.Should().Be(request.Id); + result.RequestMessage.MethodName.Should().Be(request.MethodName); + result.RequestMessage.ServiceName.Should().Be(request.ServiceName); + } + + [Test] + public async Task WhenThePickupTimeoutExpires_AnErrorsIsReturnedAndTheRequestCanNotBeCollected() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + + var request = new RequestMessageBuilder("poll://test-endpoint") + .WithServiceEndpoint(b => b.WithPollingRequestQueueTimeout(TimeSpan.FromMilliseconds(100))) + .Build(); + + var sut = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, CreateMessageSerialiserAndDataStreamStorage(), new HalibutTimeoutsAndLimits()); + await sut.WaitUntilQueueIsSubscribedToReceiveMessages(); + + // Act + var response = await sut.QueueAndWaitAsync(request, CancellationToken.None); + var result = await sut.DequeueAsync(CancellationToken); + + // Assert + response.Error!.Message.Should().Contain("A request was sent to a polling endpoint, but the polling endpoint did not collect the request within the allowed time"); + result.Should().BeNull(); + } + + [Test] + public async Task FullSendAndReceiveShouldWork() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var node1Sender = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, CreateMessageSerialiserAndDataStreamStorage(), new HalibutTimeoutsAndLimits()); + var node2Receiver = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, CreateMessageSerialiserAndDataStreamStorage(), new HalibutTimeoutsAndLimits()); + await node2Receiver.WaitUntilQueueIsSubscribedToReceiveMessages(); + + // Act + var queueAndWaitAsync = node1Sender.QueueAndWaitAsync(request, CancellationToken.None); + + var requestMessageWithCancellationToken = await node2Receiver.DequeueAsync(CancellationToken); + + requestMessageWithCancellationToken.Should().NotBeNull(); + requestMessageWithCancellationToken!.RequestMessage.Id.Should().Be(request.Id); + requestMessageWithCancellationToken.RequestMessage.MethodName.Should().Be(request.MethodName); + requestMessageWithCancellationToken.RequestMessage.ServiceName.Should().Be(request.ServiceName); + + var response = ResponseMessage.FromResult(requestMessageWithCancellationToken.RequestMessage, "Yay"); + await node2Receiver.ApplyResponse(response, requestMessageWithCancellationToken.RequestMessage.ActivityId); + + var responseMessage = await queueAndWaitAsync; + + // Assert + responseMessage.Result.Should().Be("Yay"); + } + + [Test] + public async Task FullSendAndReceiveWithDataStreamShouldWork() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + request.Params = new[] { new ComplexObjectMultipleDataStreams(DataStream.FromString("hello"), DataStream.FromString("world")) }; + + var node1Sender = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, messageReaderWriter, new HalibutTimeoutsAndLimits()); + var node2Receiver = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, messageReaderWriter, new HalibutTimeoutsAndLimits()); + await node2Receiver.WaitUntilQueueIsSubscribedToReceiveMessages(); + + var queueAndWaitAsync = node1Sender.QueueAndWaitAsync(request, CancellationToken.None); + + var requestMessageWithCancellationToken = await node2Receiver.DequeueAsync(CancellationToken); + + var objWithDataStreams = (ComplexObjectMultipleDataStreams)requestMessageWithCancellationToken!.RequestMessage.Params[0]; + (await objWithDataStreams.Payload1!.ReadAsString(CancellationToken)).Should().Be("hello"); + (await objWithDataStreams.Payload2!.ReadAsString(CancellationToken)).Should().Be("world"); + + var response = ResponseMessage.FromResult(requestMessageWithCancellationToken.RequestMessage, + new ComplexObjectMultipleDataStreams(DataStream.FromString("good"), DataStream.FromString("bye"))); + + await node2Receiver.ApplyResponse(response, requestMessageWithCancellationToken.RequestMessage.ActivityId); + + var responseMessage = await queueAndWaitAsync; + + var returnObject = (ComplexObjectMultipleDataStreams)responseMessage.Result!; + (await returnObject.Payload1!.ReadAsString(CancellationToken)).Should().Be("good"); + (await returnObject.Payload2!.ReadAsString(CancellationToken)).Should().Be("bye"); + } + + [Test] + public async Task WhenReadingTheResponseFromTheQueueFails_TheQueueAndWaitTaskReturnsAnUnknownError() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage() + .ThrowsOnReadResponse(() => new OperationCanceledException()); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var queue = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, messageReaderWriter, new HalibutTimeoutsAndLimits()); + await queue.WaitUntilQueueIsSubscribedToReceiveMessages(); + + // Act + var queueAndWaitAsync = queue.QueueAndWaitAsync(request, CancellationToken.None); + + var requestMessageWithCancellationToken = await queue.DequeueAsync(CancellationToken); + await queue.ApplyResponse(ResponseMessage.FromResult(requestMessageWithCancellationToken!.RequestMessage, "Yay"), + requestMessageWithCancellationToken.RequestMessage.ActivityId); + + var responseMessage = await queueAndWaitAsync; + + // Assert + responseMessage.Error.Should().NotBeNull(); + CreateExceptionFromResponse(responseMessage, HalibutLog).IsRetryableError().Should().Be(HalibutRetryableErrorType.IsRetryable); + } + + [Test] + public async Task WhenEnteringTheQueue_AndRedisIsUnavailable_ARetryableExceptionIsThrown() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder); + redisFacade.MaxDurationToRetryFor = TimeSpan.FromSeconds(1); + + var redisTransport = new HalibutRedisTransport(redisFacade); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + var queue = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, CreateMessageSerialiserAndDataStreamStorage(), new HalibutTimeoutsAndLimits()); + portForwarder.EnterKillNewAndExistingConnectionsMode(); + + // Act + var exception = await AssertThrowsAny.Exception(async () => await queue.QueueAndWaitAsync(request, CancellationToken.None)); + + // Assert + exception.IsRetryableError().Should().Be(HalibutRetryableErrorType.IsRetryable); + exception.Message.Should().Contain("ailed since an error occured inserting the data into the queue"); + } + + [Test] + public async Task WhenEnteringTheQueue_AndRedisIsUnavailableAndDataLoseOccurs_ARetryableExceptionIsThrown() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder, null); + redisFacade.MaxDurationToRetryFor = TimeSpan.FromSeconds(1); + + var redisDataLoseDetector = new CancellableDataLossWatchForRedisLosingAllItsData(); + + var redisTransport = Substitute.ForPartsOf(new HalibutRedisTransport(redisFacade)); + redisTransport.Configure().PutRequest(Arg.Any(), Arg.Any(), Arg.Any(), Arg.Any(), Arg.Any()) + .Returns(async callInfo => + { + await redisDataLoseDetector.DataLossHasOccured(); + throw new OperationCanceledException(); + }); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var queue = new RedisPendingRequestQueue(endpoint, redisDataLoseDetector, HalibutLog, redisTransport, CreateMessageSerialiserAndDataStreamStorage(), new HalibutTimeoutsAndLimits()); + + // Act + var exception = await AssertThrowsAny.Exception(async () => await queue.QueueAndWaitAsync(request, CancellationToken.None)); + + // Assert + exception.IsRetryableError().Should().Be(HalibutRetryableErrorType.IsRetryable); + exception.Message.Should().Contain("was cancelled because we detected that redis lost all of its data."); + } + + [Test] + public async Task WhenTheRequestReceiverNodeDetectsRedisDataLose_AndTheRequestSenderDoesNotYetDetectDataLose_TheRequestSenderNodeReturnsARetryableResponse() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var guid = Guid.NewGuid(); + + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + await using var stableConnection = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + var redisDataLoseDetectorOnReceiver = new CancellableDataLossWatchForRedisLosingAllItsData(); + var node1Sender = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(stableConnection), messageReaderWriter, new HalibutTimeoutsAndLimits()); + var node2Receiver = new RedisPendingRequestQueue(endpoint, redisDataLoseDetectorOnReceiver, HalibutLog, new HalibutRedisTransport(stableConnection), messageReaderWriter, new HalibutTimeoutsAndLimits()); + await node2Receiver.WaitUntilQueueIsSubscribedToReceiveMessages(); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + var queueAndWaitTask = node1Sender.QueueAndWaitAsync(request, CancellationToken.None); + + var dequeuedRequest = await node2Receiver.DequeueAsync(CancellationToken); + + // Act + await redisDataLoseDetectorOnReceiver.DataLossHasOccured(); + + var responseToSendBack = CreateNonRetryableErrorResponse(dequeuedRequest); + + await node2Receiver.ApplyResponse(responseToSendBack, dequeuedRequest!.RequestMessage.ActivityId); + + var response = await queueAndWaitTask; + response.Error.Should().NotBeNull(); + + // Assert + CreateExceptionFromResponse(response, HalibutLog) + .IsRetryableError().Should().Be(HalibutRetryableErrorType.IsRetryable); + } + + ResponseMessage CreateNonRetryableErrorResponse(RequestMessageWithCancellationToken? dequeuedRequest) + { + var responseThatWouldNotBeRetried = ResponseMessage.FromException(dequeuedRequest!.RequestMessage, new NoMatchingServiceOrMethodHalibutClientException("")); + CreateExceptionFromResponse(responseThatWouldNotBeRetried, HalibutLog) + .IsRetryableError().Should().Be(HalibutRetryableErrorType.NotRetryable); + return responseThatWouldNotBeRetried; + } + + [Test] + public async Task WhenPreparingRequestFails_ARetryableExceptionIsThrown() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + + var redisTransport = new HalibutRedisTransport(redisFacade); + + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage() + .ThrowsOnPrepareRequest(() => new OperationCanceledException()); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + var queue = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, messageReaderWriter, new HalibutTimeoutsAndLimits()); + + // Act Assert + var exception = await AssertThrowsAny.Exception(async () => await queue.QueueAndWaitAsync(request, CancellationToken.None)); + exception.IsRetryableError().Should().Be(HalibutRetryableErrorType.IsRetryable); + exception.Message.Should().Contain("error occured when preparing request for queue"); + } + + [Test] + public async Task WhenDataLostIsDetected_InFlightRequestShouldBeAbandoned_AndARetryableExceptionIsThrown() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + await using var dataLossWatcher = new CancellableDataLossWatchForRedisLosingAllItsData(); + + var node1Sender = new RedisPendingRequestQueue(endpoint, dataLossWatcher, HalibutLog, redisTransport, messageReaderWriter, new HalibutTimeoutsAndLimits()); + var node2Receiver = new RedisPendingRequestQueue(endpoint, dataLossWatcher, HalibutLog, redisTransport, messageReaderWriter, new HalibutTimeoutsAndLimits()); + await node2Receiver.WaitUntilQueueIsSubscribedToReceiveMessages(); + + var queueAndWaitAsync = node1Sender.QueueAndWaitAsync(request, CancellationToken.None); + + var requestMessageWithCancellationToken = await node2Receiver.DequeueAsync(CancellationToken); + + requestMessageWithCancellationToken.Should().NotBeNull(); + + // Act + await dataLossWatcher.DataLossHasOccured(); + + // Assert + requestMessageWithCancellationToken!.CancellationToken.IsCancellationRequested.Should().BeTrue("The receiver of the data should just give up processing"); + + // Verify that queueAndWaitAsync quickly returns with an error when data lose has occured. + await Task.WhenAny(Task.Delay(5000), queueAndWaitAsync); + + queueAndWaitAsync.IsCompleted.Should().BeTrue("As soon as data loss is detected the queueAndWait should return."); + + // Sigh it can go down either of these paths! + var e = await AssertException.Throws(queueAndWaitAsync); + e.And.IsRetryableError().Should().Be(HalibutRetryableErrorType.IsRetryable); + e.And.Should().BeOfType(); + } + + [Test] + public async Task OnceARequestIsComplete_NoInflightDisposableShouldExist() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var queue = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, messageReaderWriter, new HalibutTimeoutsAndLimits()); + await queue.WaitUntilQueueIsSubscribedToReceiveMessages(); + + // Act + var queueAndWaitAsync = queue.QueueAndWaitAsync(request, CancellationToken.None); + + var requestMessageWithCancellationToken = await queue.DequeueAsync(CancellationToken); + requestMessageWithCancellationToken.Should().NotBeNull(); + + var response = ResponseMessage.FromResult(requestMessageWithCancellationToken!.RequestMessage, "Yay"); + await queue.ApplyResponse(response, requestMessageWithCancellationToken.RequestMessage.ActivityId); + + var responseMessage = await queueAndWaitAsync; + responseMessage.Result.Should().Be("Yay"); + + // Assert + queue.DisposablesForInFlightRequests.Should().BeEmpty(); + } + + [Test] + public async Task OnceARequestIsComplete_NoRequestSenderNodeHeartBeatsShouldBeSent() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var queue = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, CreateMessageSerialiserAndDataStreamStorage(), new HalibutTimeoutsAndLimits()); + await queue.WaitUntilQueueIsSubscribedToReceiveMessages(); + queue.RequestSenderNodeHeartBeatRate = TimeSpan.FromSeconds(1); + + // Act + var queueAndWaitAsync = queue.QueueAndWaitAsync(request, CancellationToken.None); + + var requestMessageWithCancellationToken = await queue.DequeueAsync(CancellationToken); + requestMessageWithCancellationToken.Should().NotBeNull(); + + var response = ResponseMessage.FromResult(requestMessageWithCancellationToken!.RequestMessage, "Yay"); + await queue.ApplyResponse(response, requestMessageWithCancellationToken.RequestMessage.ActivityId); + + var responseMessage = await queueAndWaitAsync; + responseMessage.Result.Should().Be("Yay"); + + // Assert + var heartBeatSent = false; + var cts = new CancelOnDisposeCancellationToken(); + using var _ = redisTransport.SubscribeToNodeHeartBeatChannel(endpoint, request.ActivityId, HalibutQueueNodeSendingPulses.RequestSenderNode, async () => + { + await Task.CompletedTask; + heartBeatSent = true; + }, + cts.Token); + + await Task.Delay(5000); + heartBeatSent.Should().BeFalse(); + } + + [Test] + public async Task OnceARequestIsComplete_NoRequestProcessorNodeHeartBeatsShouldBeSent() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var queue = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, redisTransport, CreateMessageSerialiserAndDataStreamStorage(), new HalibutTimeoutsAndLimits()); + await queue.WaitUntilQueueIsSubscribedToReceiveMessages(); + queue.RequestReceiverNodeHeartBeatRate = TimeSpan.FromSeconds(1); + + // Act + var queueAndWaitAsync = queue.QueueAndWaitAsync(request, CancellationToken.None); + + var requestMessageWithCancellationToken = await queue.DequeueAsync(CancellationToken); + requestMessageWithCancellationToken.Should().NotBeNull(); + + var response = ResponseMessage.FromResult(requestMessageWithCancellationToken!.RequestMessage, "Yay"); + await queue.ApplyResponse(response, requestMessageWithCancellationToken.RequestMessage.ActivityId); + + var responseMessage = await queueAndWaitAsync; + responseMessage.Result.Should().Be("Yay"); + + // Assert + var heartBeatSent = false; + var cts = new CancelOnDisposeCancellationToken(); + using var _ = redisTransport.SubscribeToNodeHeartBeatChannel(endpoint, request.ActivityId, HalibutQueueNodeSendingPulses.RequestProcessorNode, async () => + { + await Task.CompletedTask; + heartBeatSent = true; + }, + cts.Token); + + await Task.Delay(5000); + heartBeatSent.Should().BeFalse(); + } + + [Test] + public async Task WhenTheRequestProcessorNodeConnectionToRedisIsInterrupted_AndRestoredBeforeWorkIsPublished_TheReceiverShouldBeAbleToCollectThatWorkQuickly() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var guid = Guid.NewGuid(); + await using var redisFacadeSender = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var unstableRedisFacade = RedisFacadeBuilder.CreateRedisFacade(portForwarder, guid); + + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var highDequeueTimoueHalibutLimits = new HalibutTimeoutsAndLimits(); + highDequeueTimoueHalibutLimits.PollingQueueWaitTimeout = TimeSpan.FromDays(1); // We should not need to rely on the timeout working for very short disconnects. + + var requestSenderQueue = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(redisFacadeSender), messageReaderWriter, highDequeueTimoueHalibutLimits); + var requestProcessQueueWithUnstableConnection = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(unstableRedisFacade), messageReaderWriter, highDequeueTimoueHalibutLimits); + await requestProcessQueueWithUnstableConnection.WaitUntilQueueIsSubscribedToReceiveMessages(); + var dequeueTask = requestProcessQueueWithUnstableConnection.DequeueAsync(CancellationToken); + + await Task.Delay(5000, CancellationToken); // Allow some time for the receiver to subscribe to work. + dequeueTask.IsCompleted.Should().BeFalse("Dequeue should not have "); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + await Task.Delay(1000, CancellationToken); // The network outage continues! + + portForwarder.ReturnToNormalMode(); // The network outage gets all fixed up :D + Logger.Information("Network restored!"); + + // The receiver should be able to get itself back into a state where, + // new RequestMessages that are published are quickly collected. + // However first we allow some time for the subscriptions to re-connect to redis, + // we don't know how long that will take so give it what feels like too much time. + await Task.Delay(TimeSpan.FromSeconds(30), CancellationToken); + + var queueAndWaitAsync = requestSenderQueue.QueueAndWaitAsync(request, CancellationToken.None); + + // Surely it will be done in 25s, it should take less than 1s. + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(20), CancellationToken), dequeueTask); + + dequeueTask.IsCompleted.Should().BeTrue("The queue did not app"); + + var requestReceived = await dequeueTask; + requestReceived.Should().NotBeNull(); + requestReceived!.RequestMessage.ActivityId.Should().Be(request.ActivityId); + } + + /// + /// We want to check that the queue doesn't do something like: + /// - place work on the queue + /// - not receive a heart beat from the RequestProcessorNode, because the request is not yet collected. + /// - timeout because we did not receive that heart beat. + /// + [Test] + public async Task WhenTheReceiverDoesntCollectWorkImmediately_TheRequestCanSitOnTheQueueForSometime_AndBeOnTheQueueLongerThanTheHeartBeatTimeout() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + var node1Sender = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(redisFacade), messageReaderWriter, new HalibutTimeoutsAndLimits()); + // We are testing that we don't expect heart beats before the request is collected. + node1Sender.RequestReceiverNodeHeartBeatTimeout = TimeSpan.FromSeconds(1); + await node1Sender.WaitUntilQueueIsSubscribedToReceiveMessages(); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + request.Destination.PollingRequestQueueTimeout = TimeSpan.FromHours(1); + await using var cts = new CancelOnDisposeCancellationToken(CancellationToken); + + var queueAndWaitAsync = node1Sender.QueueAndWaitAsync(request, cts.Token); + + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(5), CancellationToken), queueAndWaitAsync); + + queueAndWaitAsync.IsCompleted.Should().BeFalse(); + } + + [Test] + public async Task WhenTheSendersConnectionToRedisIsBrieflyInterruptedWhileSendingTheRequestMessageToRedis_TheWorkIsStillSent() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var guid = Guid.NewGuid(); + await using var redisFacadeReceiver = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var redisFacadeSender = RedisFacadeBuilder.CreateRedisFacade(portForwarder, guid); + + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var node1Sender = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(redisFacadeSender), messageReaderWriter, new HalibutTimeoutsAndLimits()); + var node2Receiver = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(redisFacadeSender), messageReaderWriter, new HalibutTimeoutsAndLimits()); + await node2Receiver.WaitUntilQueueIsSubscribedToReceiveMessages(); + + portForwarder.EnterKillNewAndExistingConnectionsMode(); + + var networkRestoreTask = Task.Run(async () => + { + await Task.Delay(TimeSpan.FromSeconds(3), CancellationToken); + portForwarder.ReturnToNormalMode(); + }); + + var queueAndWaitAsync = node1Sender.QueueAndWaitAsync(request, CancellationToken.None); + + var dequeuedRequest = await node2Receiver.DequeueAsync(CancellationToken); + + dequeuedRequest.Should().NotBeNull(); + dequeuedRequest!.RequestMessage.ActivityId.Should().Be(request.ActivityId); + } + + [Test] + public async Task WhenTheRequestProcessorNodeDequeuesWork_AndThenDisconnectsFromRedisForEver_TheRequestSenderNodeEventuallyTimesOut() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var guid = Guid.NewGuid(); + + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + var halibutTimeoutAndLimits = new HalibutTimeoutsAndLimits(); + halibutTimeoutAndLimits.PollingRequestQueueTimeout = TimeSpan.FromDays(1); + halibutTimeoutAndLimits.PollingQueueWaitTimeout = TimeSpan.FromDays(1); // We should not need to rely on the timeout working for very short disconnects. + + await using var stableRedisConnection = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var unstableRedisConnection = RedisFacadeBuilder.CreateRedisFacade(portForwarder, guid); + + var node1Sender = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(stableRedisConnection), messageReaderWriter, halibutTimeoutAndLimits); + var node2Receiver = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(unstableRedisConnection), messageReaderWriter, halibutTimeoutAndLimits); + await node2Receiver.WaitUntilQueueIsSubscribedToReceiveMessages(); + + // Lower this to complete the test sooner. + node1Sender.RequestReceiverNodeHeartBeatRate = TimeSpan.FromSeconds(1); + node2Receiver.RequestReceiverNodeHeartBeatRate = TimeSpan.FromSeconds(1); + node1Sender.RequestReceiverNodeHeartBeatTimeout = TimeSpan.FromSeconds(10); + node2Receiver.RequestReceiverNodeHeartBeatTimeout = TimeSpan.FromSeconds(10); + + // Act + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + // Setting this low shows we don't timeout because the request was not picked up in time. + request.Destination.PollingRequestQueueTimeout = TimeSpan.FromSeconds(5); + var queueAndWaitTask = node1Sender.QueueAndWaitAsync(request, CancellationToken.None); + + var dequeuedRequest = await node2Receiver.DequeueAsync(CancellationToken); + + // Now disconnect the receiver from redis. + portForwarder.EnterKillNewAndExistingConnectionsMode(); + + // Assert + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(20), CancellationToken), queueAndWaitTask); + + queueAndWaitTask.IsCompleted.Should().BeTrue(); + + var response = await queueAndWaitTask; + response.Error.Should().NotBeNull(); + response.Error!.Message.Should().Contain("The node processing the request did not send a heartbeat for long enough, and so the node is now assumed to be offline."); + + CreateExceptionFromResponse(response, HalibutLog).IsRetryableError().Should().Be(HalibutRetryableErrorType.IsRetryable); + } + + [Test] + public async Task WhenTheRequestProcessorNodeDequeuesWork_AndTheRequestSenderNodeDisconnects_AndNeverReconnects_TheDequeuedWorkIsEventuallyCancelled() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var guid = Guid.NewGuid(); + + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + var halibutTimeoutAndLimits = new HalibutTimeoutsAndLimits(); + halibutTimeoutAndLimits.PollingRequestQueueTimeout = TimeSpan.FromDays(1); + halibutTimeoutAndLimits.PollingQueueWaitTimeout = TimeSpan.FromDays(1); // We should not need to rely on the timeout working for very short disconnects. + + await using var stableRedisConnection = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var unstableRedisConnection = RedisFacadeBuilder.CreateRedisFacade(portForwarder, guid); + + var node1Sender = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(unstableRedisConnection), messageReaderWriter, halibutTimeoutAndLimits); + var node2Receiver = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(stableRedisConnection), messageReaderWriter, halibutTimeoutAndLimits); + await node2Receiver.WaitUntilQueueIsSubscribedToReceiveMessages(); + + node1Sender.RequestSenderNodeHeartBeatRate = TimeSpan.FromSeconds(1); + node2Receiver.RequestSenderNodeHeartBeatRate = TimeSpan.FromSeconds(1); + node1Sender.RequestSenderNodeHeartBeatTimeout = TimeSpan.FromSeconds(10); + node2Receiver.RequestSenderNodeHeartBeatTimeout = TimeSpan.FromSeconds(10); + node1Sender.TimeBetweenCheckingIfRequestWasCollected = TimeSpan.FromSeconds(1); + node2Receiver.TimeBetweenCheckingIfRequestWasCollected = TimeSpan.FromSeconds(1); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + + var queueAndWaitTask = node1Sender.QueueAndWaitAsync(request, CancellationToken); + + var dequeuedRequest = await node2Receiver.DequeueAsync(CancellationToken); + dequeuedRequest!.CancellationToken.IsCancellationRequested.Should().BeFalse(); + + // Now disconnect the sender from redis. + portForwarder.EnterKillNewAndExistingConnectionsMode(); + + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(35), dequeuedRequest.CancellationToken)); + + dequeuedRequest.CancellationToken.IsCancellationRequested.Should().BeTrue(); + } + + [Test] + public async Task WhenTheRequestSenderNodeBrieflyDisconnectsFromRedis_AtExactlyTheTimeWhenTheRequestReceiverNodeSendsTheResponseBack_TheRequestSenderNodeStillGetsTheResponse() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var guid = Guid.NewGuid(); + + var messageReaderWriter = CreateMessageSerialiserAndDataStreamStorage(); + + await using var stableConnection = RedisFacadeBuilder.CreateRedisFacade(prefix: guid); + using var portForwarder = PortForwardingToRedisBuilder.ForwardingToRedis(Logger); + await using var unreliableConnection = RedisFacadeBuilder.CreateRedisFacade(portForwarder, guid); + + var node1Sender = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(unreliableConnection), messageReaderWriter, new HalibutTimeoutsAndLimits()); + var node2Receiver = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), HalibutLog, new HalibutRedisTransport(stableConnection), messageReaderWriter, new HalibutTimeoutsAndLimits()); + await node2Receiver.WaitUntilQueueIsSubscribedToReceiveMessages(); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + var queueAndWaitTask = node1Sender.QueueAndWaitAsync(request, CancellationToken); + + var dequeuedRequest = await node2Receiver.DequeueAsync(CancellationToken); + + // Just before we send the response, disconnect the sender. + portForwarder.EnterKillNewAndExistingConnectionsMode(); + await node2Receiver.ApplyResponse(ResponseMessage.FromResult(dequeuedRequest!.RequestMessage, "Yay"), dequeuedRequest!.RequestMessage.ActivityId); + + await Task.Delay(TimeSpan.FromSeconds(2), CancellationToken); + portForwarder.ReturnToNormalMode(); + + await Task.WhenAny(Task.Delay(TimeSpan.FromMinutes(2), CancellationToken), queueAndWaitTask); + + queueAndWaitTask.IsCompleted.Should().BeTrue(); + + var response = await queueAndWaitTask; + response.Error.Should().BeNull(); + response.Result.Should().Be("Yay"); + } + + static Exception CreateExceptionFromResponse(ResponseMessage responseThatWouldNotBeRetried, ILog log) + { + try + { + HalibutProxyWithAsync.ThrowExceptionFromReceivedError(responseThatWouldNotBeRetried.Error!, log); + } + catch (Exception e) + { + return e; + } + + Assert.Fail("Excpected an exception in the response message"); + throw new Exception("it failed"); + } + + [Test] + [LatestClientAndLatestServiceTestCases(testNetworkConditions: false, testListening: false, testWebSocket: false)] + public async Task WhenUsingTheRedisQueue_ASimpleEchoServiceCanBeCalled(ClientAndServiceTestCase clientAndServiceTestCase) + { + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + var dataStreamStore = new InMemoryStoreDataStreamsForDistributedQueues(); + + await using (var clientAndService = await clientAndServiceTestCase.CreateTestCaseBuilder() + .WithStandardServices() + .AsLatestClientAndLatestServiceBuilder() + .WithPendingRequestQueueFactory((queueMessageSerializer, logFactory) => + new RedisPendingRequestQueueFactory( + queueMessageSerializer, + dataStreamStore, + new RedisNeverLosesData(), + redisTransport, + new HalibutTimeoutsAndLimits(), + logFactory) + .WithWaitForReceiverToBeReady()) + .Build(CancellationToken)) + { + var echo = clientAndService.CreateAsyncClient(); + (await echo.SayHelloAsync("Deploy package A")).Should().Be("Deploy package A..."); + + for (var i = 0; i < clientAndServiceTestCase.RecommendedIterations; i++) (await echo.SayHelloAsync($"Deploy package A {i}")).Should().Be($"Deploy package A {i}..."); + } + } + + [Test] + public async Task CancellingARequestShouldResultInTheDequeuedResponseTokenBeingCancelled() + { + // Arrange + var endpoint = new Uri("poll://" + Guid.NewGuid()); + var log = new TestContextLogCreator("Redis", LogLevel.Trace).CreateNewForPrefix(""); + await using var redisFacade = RedisFacadeBuilder.CreateRedisFacade(); + var redisTransport = new HalibutRedisTransport(redisFacade); + var dataStreamStore = new InMemoryStoreDataStreamsForDistributedQueues(); + var messageSerializer = new QueueMessageSerializerBuilder().Build(); + var messageReaderWriter = new MessageSerialiserAndDataStreamStorage(messageSerializer, dataStreamStore); + + var request = new RequestMessageBuilder("poll://test-endpoint").Build(); + request.Params = new[] { new ComplexObjectMultipleDataStreams(DataStream.FromString("hello"), DataStream.FromString("world")) }; + + var node1Sender = new RedisPendingRequestQueue(endpoint, new RedisNeverLosesData(), log, redisTransport, messageReaderWriter, new HalibutTimeoutsAndLimits()); + await node1Sender.WaitUntilQueueIsSubscribedToReceiveMessages(); + + using var cts = new CancellationTokenSource(); + + var queueAndWaitAsync = node1Sender.QueueAndWaitAsync(request, cts.Token); + + var requestMessageWithCancellationToken = await node1Sender.DequeueAsync(CancellationToken); + + requestMessageWithCancellationToken!.CancellationToken.IsCancellationRequested.Should().BeFalse(); + + await cts.CancelAsync(); + + try + { + await Task.Delay(TimeSpan.FromSeconds(10), requestMessageWithCancellationToken.CancellationToken); + } + catch (TaskCanceledException) + { + } + + requestMessageWithCancellationToken!.CancellationToken.IsCancellationRequested.Should().BeTrue(); + } + + public class QueueMessageSerializerBuilder + { + public QueueMessageSerializer Build() + { + var typeRegistry = new TypeRegistry(); + typeRegistry.Register(typeof(IComplexObjectService)); + + StreamCapturingJsonSerializer StreamCapturingSerializer() + { + var settings = MessageSerializerBuilder.CreateSerializer(); + var binder = new RegisteredSerializationBinder(typeRegistry); + settings.SerializationBinder = binder; + return new StreamCapturingJsonSerializer(settings); + } + + return new QueueMessageSerializer(StreamCapturingSerializer); + } + } + + static MessageSerialiserAndDataStreamStorage CreateMessageSerialiserAndDataStreamStorage() + { + var dataStreamStore = new InMemoryStoreDataStreamsForDistributedQueues(); + var messageSerializer = new QueueMessageSerializerBuilder().Build(); + var messageReaderWriter = new MessageSerialiserAndDataStreamStorage(messageSerializer, dataStreamStore); + return messageReaderWriter; + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/Utils/CancellableDataLossWatchForRedisLosingAllItsData.cs b/source/Halibut.Tests/Queue/Redis/Utils/CancellableDataLossWatchForRedisLosingAllItsData.cs new file mode 100644 index 000000000..1aa9f8f02 --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/Utils/CancellableDataLossWatchForRedisLosingAllItsData.cs @@ -0,0 +1,41 @@ +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.RedisDataLossDetection; +using Halibut.Util; +using Try = Halibut.Tests.Support.Try; + +namespace Halibut.Tests.Queue.Redis.Utils +{ + public class CancellableDataLossWatchForRedisLosingAllItsData : IWatchForRedisLosingAllItsData + { + CancelOnDisposeCancellationToken cancellationToken = new(); + + public TaskCompletionSource TaskCompletionSource = new(); + public CancellableDataLossWatchForRedisLosingAllItsData() + { + TaskCompletionSource.SetResult(cancellationToken.Token); + } + + public async Task DataLossHasOccured() + { + await cancellationToken.DisposeAsync(); + cancellationToken = new CancelOnDisposeCancellationToken(); + TaskCompletionSource = new TaskCompletionSource(); + TaskCompletionSource.SetResult(cancellationToken.Token); + } + + public async ValueTask DisposeAsync() + { + await Try.CatchingError(async () => await cancellationToken.DisposeAsync()); + } + + public async Task GetTokenForDataLossDetection(TimeSpan timeToWait, CancellationToken cancellationToken) + { +#pragma warning disable VSTHRD003 + return await TaskCompletionSource.Task; +#pragma warning restore VSTHRD003 + } + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/Utils/HalibutRedisTransportWithVirtuals.cs b/source/Halibut.Tests/Queue/Redis/Utils/HalibutRedisTransportWithVirtuals.cs new file mode 100644 index 000000000..b5577513c --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/Utils/HalibutRedisTransportWithVirtuals.cs @@ -0,0 +1,113 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.NodeHeartBeat; +using Halibut.Queue.Redis.RedisHelpers; +using StackExchange.Redis; + +namespace Halibut.Tests.Queue.Redis.Utils +{ + public class HalibutRedisTransportWithVirtuals : IHalibutRedisTransport + { + readonly IHalibutRedisTransport halibutRedisTransport; + + public HalibutRedisTransportWithVirtuals(IHalibutRedisTransport halibutRedisTransport) + { + this.halibutRedisTransport = halibutRedisTransport; + } + + public Task SubscribeToRequestMessagePulseChannel(Uri endpoint, Action onRequestMessagePulse, CancellationToken cancellationToken) + { + return halibutRedisTransport.SubscribeToRequestMessagePulseChannel(endpoint, onRequestMessagePulse, cancellationToken); + } + + public Task PulseRequestPushedToEndpoint(Uri endpoint, CancellationToken cancellationToken) + { + return halibutRedisTransport.PulseRequestPushedToEndpoint(endpoint, cancellationToken); + } + + public Task PushRequestGuidOnToQueue(Uri endpoint, Guid guid, CancellationToken cancellationToken) + { + return halibutRedisTransport.PushRequestGuidOnToQueue(endpoint, guid, cancellationToken); + } + + public Task TryPopNextRequestGuid(Uri endpoint, CancellationToken cancellationToken) + { + return halibutRedisTransport.TryPopNextRequestGuid(endpoint, cancellationToken); + } + + public virtual Task PutRequest(Uri endpoint, Guid requestId, RedisStoredMessage requestMessage, TimeSpan requestPickupTimeout, CancellationToken cancellationToken) + { + return halibutRedisTransport.PutRequest(endpoint, requestId, requestMessage, requestPickupTimeout, cancellationToken); + } + + public Task TryGetAndRemoveRequest(Uri endpoint, Guid requestId, CancellationToken cancellationToken) + { + return halibutRedisTransport.TryGetAndRemoveRequest(endpoint, requestId, cancellationToken); + } + + public Task IsRequestStillOnQueue(Uri endpoint, Guid requestId, CancellationToken cancellationToken) + { + return halibutRedisTransport.IsRequestStillOnQueue(endpoint, requestId, cancellationToken); + } + + public Task SubscribeToRequestCancellation(Uri endpoint, Guid requestId, Func onRpcCancellation, CancellationToken cancellationToken) + { + return halibutRedisTransport.SubscribeToRequestCancellation(endpoint, requestId, onRpcCancellation, cancellationToken); + } + + public Task PublishCancellation(Uri endpoint, Guid requestId, CancellationToken cancellationToken) + { + return halibutRedisTransport.PublishCancellation(endpoint, requestId, cancellationToken); + } + + public Task MarkRequestAsCancelled(Uri endpoint, Guid requestId, TimeSpan ttl, CancellationToken cancellationToken) + { + return halibutRedisTransport.MarkRequestAsCancelled(endpoint, requestId, ttl, cancellationToken); + } + + public Task IsRequestMarkedAsCancelled(Uri endpoint, Guid requestId, CancellationToken cancellationToken) + { + return halibutRedisTransport.IsRequestMarkedAsCancelled(endpoint, requestId, cancellationToken); + } + + public Task SubscribeToNodeHeartBeatChannel(Uri endpoint, Guid requestId, HalibutQueueNodeSendingPulses nodeSendingPulsesType, Func onHeartBeat, CancellationToken cancellationToken) + { + return halibutRedisTransport.SubscribeToNodeHeartBeatChannel(endpoint, requestId, nodeSendingPulsesType, onHeartBeat, cancellationToken); + } + + public Task SendNodeHeartBeat(Uri endpoint, Guid requestId, HalibutQueueNodeSendingPulses nodeSendingPulsesType, CancellationToken cancellationToken) + { + return halibutRedisTransport.SendNodeHeartBeat(endpoint, requestId, nodeSendingPulsesType, cancellationToken); + } + + public Task SubscribeToResponseChannel(Uri endpoint, Guid identifier, Func onValueReceived, CancellationToken cancellationToken) + { + return halibutRedisTransport.SubscribeToResponseChannel(endpoint, identifier, onValueReceived, cancellationToken); + } + + public Task PublishThatResponseIsAvailable(Uri endpoint, Guid identifier, CancellationToken cancellationToken) + { + return halibutRedisTransport.PublishThatResponseIsAvailable(endpoint, identifier, cancellationToken); + } + + public Task SetResponseMessage(Uri endpoint, Guid identifier, RedisStoredMessage responseMessage, TimeSpan ttl, CancellationToken cancellationToken) + { + return halibutRedisTransport.SetResponseMessage(endpoint, identifier, responseMessage, ttl, cancellationToken); + } + + public Task GetResponseMessage(Uri endpoint, Guid identifier, CancellationToken cancellationToken) + { + return halibutRedisTransport.GetResponseMessage(endpoint, identifier, cancellationToken); + } + + public Task DeleteResponseMessage(Uri endpoint, Guid identifier, CancellationToken cancellationToken) + { + return halibutRedisTransport.DeleteResponseMessage(endpoint, identifier, cancellationToken); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/Utils/InMemoryStoreDataStreamsForDistributedQueues.cs b/source/Halibut.Tests/Queue/Redis/Utils/InMemoryStoreDataStreamsForDistributedQueues.cs new file mode 100644 index 000000000..f9181547c --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/Utils/InMemoryStoreDataStreamsForDistributedQueues.cs @@ -0,0 +1,39 @@ +using System; +using System.Collections.Generic; +using System.IO; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Queue.QueuedDataStreams; + +namespace Halibut.Tests.Queue.Redis.Utils +{ + public class InMemoryStoreDataStreamsForDistributedQueues : IStoreDataStreamsForDistributedQueues + { + readonly IDictionary dataStreamsStored = new Dictionary(); + public async Task StoreDataStreams(IReadOnlyList dataStreams, CancellationToken cancellationToken) + { + foreach (var dataStream in dataStreams) + { + using var memoryStream = new MemoryStream(); + await dataStream.WriteData(memoryStream, cancellationToken); + dataStreamsStored[dataStream.Id] = memoryStream.ToArray(); + } + + return ""; + } + + public async Task ReHydrateDataStreams(string _, IReadOnlyList dataStreams, CancellationToken cancellationToken) + { + await Task.CompletedTask; + foreach (var dataStream in dataStreams) + { + var bytes = dataStreamsStored[dataStream.Id]; + dataStreamsStored.Remove(dataStream.Id); + dataStream.SetWriterAsync(async (stream, ct) => + { + await stream.WriteAsync(bytes, 0, bytes.Length, ct); + }); + } + } + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/Utils/MessageReaderWriterExtensionsMethods.cs b/source/Halibut.Tests/Queue/Redis/Utils/MessageReaderWriterExtensionsMethods.cs new file mode 100644 index 000000000..6139c53bf --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/Utils/MessageReaderWriterExtensionsMethods.cs @@ -0,0 +1,82 @@ +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Queue.Redis.MessageStorage; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Transport.Protocol; + +namespace Halibut.Tests.Queue.Redis.Utils +{ + public static class MessageReaderWriterExtensionsMethods + { + public static IMessageSerialiserAndDataStreamStorage ThrowsOnReadResponse(this IMessageSerialiserAndDataStreamStorage messageSerialiserAndDataStreamStorage, Func exceptionFactory) + { + return new MessageSerialiserAndDataStreamStorageThatThrowsWhenReadingResponse(messageSerialiserAndDataStreamStorage, exceptionFactory); + } + + public static IMessageSerialiserAndDataStreamStorage ThrowsOnPrepareRequest(this IMessageSerialiserAndDataStreamStorage messageSerialiserAndDataStreamStorage, Func exception) + { + return new MessageSerialiserAndDataStreamStorageThatThrowsOnPrepareRequest(messageSerialiserAndDataStreamStorage, exception); + } + } + + class MessageSerialiserAndDataStreamStorageWithVirtualMethods : IMessageSerialiserAndDataStreamStorage + { + readonly IMessageSerialiserAndDataStreamStorage messageSerialiserAndDataStreamStorage; + + public MessageSerialiserAndDataStreamStorageWithVirtualMethods(IMessageSerialiserAndDataStreamStorage messageSerialiserAndDataStreamStorage) + { + this.messageSerialiserAndDataStreamStorage = messageSerialiserAndDataStreamStorage; + } + + public virtual Task PrepareRequest(RequestMessage request, CancellationToken cancellationToken) + { + return messageSerialiserAndDataStreamStorage.PrepareRequest(request, cancellationToken); + } + + public virtual Task ReadRequest(RedisStoredMessage jsonRequest, CancellationToken cancellationToken) + { + return messageSerialiserAndDataStreamStorage.ReadRequest(jsonRequest, cancellationToken); + } + + public virtual Task PrepareResponse(ResponseMessage response, CancellationToken cancellationToken) + { + return messageSerialiserAndDataStreamStorage.PrepareResponse(response, cancellationToken); + } + + public virtual Task ReadResponse(RedisStoredMessage jsonResponse, CancellationToken cancellationToken) + { + return messageSerialiserAndDataStreamStorage.ReadResponse(jsonResponse, cancellationToken); + } + } + + class MessageSerialiserAndDataStreamStorageThatThrowsWhenReadingResponse : MessageSerialiserAndDataStreamStorageWithVirtualMethods + { + readonly Func exception; + + public MessageSerialiserAndDataStreamStorageThatThrowsWhenReadingResponse(IMessageSerialiserAndDataStreamStorage messageSerialiserAndDataStreamStorage, Func exception) : base(messageSerialiserAndDataStreamStorage) + { + this.exception = exception; + } + + public override Task ReadResponse(RedisStoredMessage jsonResponse, CancellationToken cancellationToken) + { + throw exception(); + } + } + + class MessageSerialiserAndDataStreamStorageThatThrowsOnPrepareRequest : MessageSerialiserAndDataStreamStorageWithVirtualMethods + { + readonly Func exception; + + public MessageSerialiserAndDataStreamStorageThatThrowsOnPrepareRequest(IMessageSerialiserAndDataStreamStorage messageSerialiserAndDataStreamStorage, Func exception) : base(messageSerialiserAndDataStreamStorage) + { + this.exception = exception; + } + + public override Task PrepareRequest(RequestMessage request, CancellationToken cancellationToken) + { + throw exception(); + } + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/Utils/RedisContainerBuilder.cs b/source/Halibut.Tests/Queue/Redis/Utils/RedisContainerBuilder.cs new file mode 100644 index 000000000..65392f431 --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/Utils/RedisContainerBuilder.cs @@ -0,0 +1,122 @@ +using System; +using System.IO; +using System.Threading.Tasks; +using DotNet.Testcontainers.Builders; +using DotNet.Testcontainers.Containers; +using Halibut.Tests.Support; +using NUnit.Framework; +using Try = Halibut.Util.Try; + +namespace Halibut.Tests.Queue.Redis.Utils +{ + public class RedisContainerBuilder + { + private string _image = "redis:7-alpine"; + private string? _customConfigPath; + private int? _hostPort; + + /// + /// Sets the Redis Docker image to use. Defaults to "redis:7-alpine". + /// + /// The Redis Docker image tag + /// The builder instance for method chaining + public RedisContainerBuilder WithImage(string image) + { + _image = image; + return this; + } + + /// + /// Sets a custom Redis configuration path to mount into the container. + /// If not specified, uses the default redis-conf directory from the project root. + /// + /// The path to the Redis configuration directory + /// The builder instance for method chaining + public RedisContainerBuilder WithCustomConfigPath(string configPath) + { + _customConfigPath = configPath; + return this; + } + + /// + /// Sets a specific host port to bind to. If not specified, finds a free port automatically. + /// + /// The host port to bind to + /// The builder instance for method chaining + public RedisContainerBuilder WithHostPort(int hostPort) + { + _hostPort = hostPort; + return this; + } + + /// + /// Builds and returns a configured Redis container with the specified settings. + /// The container is not started - call StartAsync() on the returned container to start it. + /// + /// A configured Redis container ready to be started + public RedisContainer Build() + { + var hostPort = _hostPort ?? TcpPortHelper.FindFreeTcpPort(); + var redisConfigPath = _customConfigPath ?? + Path.GetFullPath(Path.Combine(TestContext.CurrentContext.TestDirectory, "../../../../../redis-conf")); + + var container = new ContainerBuilder() + .WithImage(_image) + .WithPortBinding(hostPort, 6379) + .WithBindMount(redisConfigPath, "/usr/local/etc/redis") + .WithCommand("redis-server", "/usr/local/etc/redis/redis.conf") + .WithWaitStrategy(DotNet.Testcontainers.Builders.Wait.ForUnixContainer() + .UntilPortIsAvailable(6379)) + .Build(); + + return new RedisContainer(container, hostPort); + } + } + + /// + /// Wrapper around the testcontainers IContainer that provides Redis-specific functionality + /// + public class RedisContainer : IAsyncDisposable + { + private readonly IContainer _container; + + public RedisContainer(IContainer container, int redisPort) + { + _container = container; + RedisPort = redisPort; + } + + /// + /// The host port that Redis is bound to + /// + public int RedisPort { get; } + + /// + /// The connection string to connect to this Redis instance + /// + public string ConnectionString => $"localhost:{RedisPort}"; + + /// + /// Starts the Redis container + /// + public async Task StartAsync() + { + // Since I have seen errors here. + for (int i = 0; i < 5; i++) + { + await Try.IgnoringError(async () => await _container.StartAsync()); + } + await _container.StartAsync(); + } + + /// + /// Stops the Redis container + /// + public Task StopAsync() => _container.StopAsync(); + + /// + /// Disposes the Redis container + /// + public ValueTask DisposeAsync() => _container.DisposeAsync(); + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/Utils/RedisFacadeBuilder.cs b/source/Halibut.Tests/Queue/Redis/Utils/RedisFacadeBuilder.cs new file mode 100644 index 000000000..68ace900c --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/Utils/RedisFacadeBuilder.cs @@ -0,0 +1,26 @@ +using System; +using Halibut.Logging; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Tests.Support.Logging; +using Halibut.Tests.TestSetup.Redis; +using Octopus.TestPortForwarder; + +namespace Halibut.Tests.Queue.Redis.Utils +{ + public class RedisFacadeBuilder + { + public static RedisFacade CreateRedisFacade(string? host = null, int? port = 0, Guid? prefix = null) + { + port = port == 0 ? RedisTestHost.Port() : port; + return new RedisFacade((host??RedisTestHost.RedisHost) + ":" + port, (prefix ?? Guid.NewGuid()).ToString(), new TestContextLogCreator("Redis", LogLevel.Trace).CreateNewForPrefix("")); + } + + public static RedisFacade CreateRedisFacade(PortForwarder portForwarder, Guid? prefix = null) + { + return CreateRedisFacade(host: portForwarder.PublicEndpoint.Host, + port: portForwarder.ListeningPort, + prefix: prefix); + } + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/Utils/RedisNeverLosesData.cs b/source/Halibut.Tests/Queue/Redis/Utils/RedisNeverLosesData.cs new file mode 100644 index 000000000..d8e16eb68 --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/Utils/RedisNeverLosesData.cs @@ -0,0 +1,28 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.RedisDataLossDetection; + +namespace Halibut.Tests.Queue.Redis.Utils +{ + /// + /// Test implementation of IWatchForRedisLosingAllItsData that returns CancellationToken.None + /// to indicate no data loss detection is active during testing. + /// + public class RedisNeverLosesData : IWatchForRedisLosingAllItsData + { + public Task GetTokenForDataLossDetection(TimeSpan timeToWait, CancellationToken cancellationToken) + { + return Task.FromResult(CancellationToken.None); + } + + public ValueTask DisposeAsync() + { + return ValueTask.CompletedTask; + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Queue/Redis/Utils/TestRedisPendingRequestQueueFactory.cs b/source/Halibut.Tests/Queue/Redis/Utils/TestRedisPendingRequestQueueFactory.cs new file mode 100644 index 000000000..ee7f0e56e --- /dev/null +++ b/source/Halibut.Tests/Queue/Redis/Utils/TestRedisPendingRequestQueueFactory.cs @@ -0,0 +1,34 @@ + +#if NET8_0_OR_GREATER +using System; +using Halibut.Queue.Redis; +using Halibut.ServiceModel; + +namespace Halibut.Tests.Queue.Redis.Utils +{ + public class TestRedisPendingRequestQueueFactory : IPendingRequestQueueFactory + { + RedisPendingRequestQueueFactory redisPendingRequestQueueFactory; + + public TestRedisPendingRequestQueueFactory(RedisPendingRequestQueueFactory redisPendingRequestQueueFactory) + { + this.redisPendingRequestQueueFactory = redisPendingRequestQueueFactory; + } + + public IPendingRequestQueue CreateQueue(Uri endpoint) + { + var queue = (RedisPendingRequestQueue) redisPendingRequestQueueFactory.CreateQueue(endpoint); + queue.WaitUntilQueueIsSubscribedToReceiveMessages().GetAwaiter().GetResult(); + return queue; + } + } + + public static class RedisPendingRequestQueueFactoryExtensionMethods + { + public static IPendingRequestQueueFactory WithWaitForReceiverToBeReady(this RedisPendingRequestQueueFactory redisPendingRequestQueueFactory) + { + return new TestRedisPendingRequestQueueFactory(redisPendingRequestQueueFactory); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut.Tests/Support/EnvironmentVariableReaderHelper.cs b/source/Halibut.Tests/Support/EnvironmentVariableReaderHelper.cs index c48051ada..7d286a4c6 100644 --- a/source/Halibut.Tests/Support/EnvironmentVariableReaderHelper.cs +++ b/source/Halibut.Tests/Support/EnvironmentVariableReaderHelper.cs @@ -15,5 +15,21 @@ public static bool EnvironmentVariableAsBool(string envVar, bool defaultValue) return value!.Equals("true"); } + + public static int? TryReadIntFromEnvironmentVariable(string envVar) + { + var value = Environment.GetEnvironmentVariable(envVar); + if (string.IsNullOrWhiteSpace(value)) + { + return null; + } + + if (int.TryParse(value, out var result)) + { + return result; + } + + return null; + } } } \ No newline at end of file diff --git a/source/Halibut.Tests/Support/LatestClientAndLatestServiceBuilder.cs b/source/Halibut.Tests/Support/LatestClientAndLatestServiceBuilder.cs index 2d702079a..6fb4af75e 100644 --- a/source/Halibut.Tests/Support/LatestClientAndLatestServiceBuilder.cs +++ b/source/Halibut.Tests/Support/LatestClientAndLatestServiceBuilder.cs @@ -4,6 +4,7 @@ using System.Threading.Tasks; using Halibut.Diagnostics; using Halibut.Logging; +using Halibut.Queue; using Halibut.ServiceModel; using Halibut.TestProxy; using Halibut.Tests.TestServices; @@ -221,6 +222,11 @@ public LatestClientAndLatestServiceBuilder WithProxy(out Reference pendingRequestQueueFactory) + { + return WithPendingRequestQueueFactory((_, logFactory) => pendingRequestQueueFactory(logFactory)); + } + + public LatestClientAndLatestServiceBuilder WithPendingRequestQueueFactory(Func pendingRequestQueueFactory) { clientBuilder.WithPendingRequestQueueFactory(pendingRequestQueueFactory); return this; diff --git a/source/Halibut.Tests/Support/LatestClientBuilder.cs b/source/Halibut.Tests/Support/LatestClientBuilder.cs index 3352eaeab..38af89104 100644 --- a/source/Halibut.Tests/Support/LatestClientBuilder.cs +++ b/source/Halibut.Tests/Support/LatestClientBuilder.cs @@ -5,6 +5,7 @@ using Halibut.Diagnostics; using Halibut.Diagnostics.LogCreators; using Halibut.Logging; +using Halibut.Queue; using Halibut.ServiceModel; using Halibut.Tests.Support.Logging; using Halibut.Transport.Observability; @@ -24,7 +25,7 @@ public class LatestClientBuilder : IClientBuilder IRpcObserver? clientRpcObserver; Func? portForwarderFactory; Reference? portForwarderReference; - Func? pendingRequestQueueFactory; + Func? pendingRequestQueueFactory; Action? pendingRequestQueueFactoryBuilder; ProxyDetails? proxyDetails; LogLevel halibutLogLevel = LogLevel.Trace; @@ -115,7 +116,7 @@ public LatestClientBuilder WithPortForwarding(out Reference portF return this; } - public LatestClientBuilder WithPendingRequestQueueFactory(Func pendingRequestQueueFactory) + public LatestClientBuilder WithPendingRequestQueueFactory(Func pendingRequestQueueFactory) { this.pendingRequestQueueFactory = pendingRequestQueueFactory; return this; @@ -184,12 +185,11 @@ public async Task Build(CancellationToken cancellationToken) { var octopusLogFactory = BuildClientLogger(); - var factory = CreatePendingRequestQueueFactory(octopusLogFactory); var clientBuilder = new HalibutRuntimeBuilder() .WithServerCertificate(clientCertAndThumbprint.Certificate2) .WithLogFactory(octopusLogFactory) - .WithPendingRequestQueueFactory(factory) + .WithPendingRequestQueueFactory(serializer => CreatePendingRequestQueueFactory(serializer, octopusLogFactory)) .WithTrustProvider(clientTrustProvider!) .WithStreamFactoryIfNotNull(clientStreamFactory) .WithControlMessageObserverIfNotNull(controlMessageObserver) @@ -248,11 +248,11 @@ public async Task Build(CancellationToken cancellationToken) return new LatestClient(client, clientListeningUri, clientTrustsThumbprint, portForwarder, proxyDetails, serviceConnectionType, disposableCollection); } - IPendingRequestQueueFactory CreatePendingRequestQueueFactory(ILogFactory octopusLogFactory) + IPendingRequestQueueFactory CreatePendingRequestQueueFactory(QueueMessageSerializer queueMessageSerializer, ILogFactory octopusLogFactory) { if (pendingRequestQueueFactory != null) { - return pendingRequestQueueFactory(octopusLogFactory); + return pendingRequestQueueFactory(queueMessageSerializer, octopusLogFactory); } var pendingRequestQueueFactoryBuilder = new PendingRequestQueueFactoryBuilder(octopusLogFactory, halibutTimeoutsAndLimits); diff --git a/source/Halibut.Tests/Support/PortForwardingToRedisBuilder.cs b/source/Halibut.Tests/Support/PortForwardingToRedisBuilder.cs new file mode 100644 index 000000000..426835554 --- /dev/null +++ b/source/Halibut.Tests/Support/PortForwardingToRedisBuilder.cs @@ -0,0 +1,15 @@ +using System; +using Halibut.Tests.TestSetup.Redis; +using Octopus.TestPortForwarder; +using Serilog; + +namespace Halibut.Tests.Support +{ + public static class PortForwardingToRedisBuilder + { + public static PortForwarder ForwardingToRedis(ILogger logger) + { + return new PortForwarderBuilder(new Uri("http://" + RedisTestHost.RedisHost + ":" + RedisTestHost.Port()), logger).Build(); + } + } +} diff --git a/source/Halibut.Tests/Support/ShouldEventually.cs b/source/Halibut.Tests/Support/ShouldEventually.cs new file mode 100644 index 000000000..94e0f1bc5 --- /dev/null +++ b/source/Halibut.Tests/Support/ShouldEventually.cs @@ -0,0 +1,113 @@ +using System; +using System.Diagnostics; +using System.Threading; +using System.Threading.Tasks; + +namespace Halibut.Tests.Support +{ + public static class ShouldEventually + { + /// + /// Keeps executing the given task until it completes without throwing an exception or the timeout is reached. + /// + /// The task to execute repeatedly until it succeeds + /// The maximum time to keep retrying + /// Optional cancellation token + /// A task that completes when the given task succeeds or throws when timeout is reached + public static async Task Eventually(Func task, TimeSpan timeout, CancellationToken cancellationToken = default) + { + using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); + cts.CancelAfter(timeout); + + var stopwatch = Stopwatch.StartNew(); + Exception? lastException = null; + + while (!cts.Token.IsCancellationRequested) + { + try + { + await task(); + return; // Success! + } + catch (Exception ex) + { + lastException = ex; + + // Short delay between retries + try + { + await Task.Delay(TimeSpan.FromMilliseconds(20), cts.Token); + } + catch (OperationCanceledException) + { + // Timeout reached + break; + } + } + } + + // If we get here, we've timed out + var timeoutMessage = $"Task did not complete successfully within {timeout.TotalSeconds:F1} seconds (elapsed: {stopwatch.Elapsed.TotalSeconds:F1}s)"; + if (lastException != null) + { + throw new TimeoutException($"{timeoutMessage}. Last exception: {lastException.Message}", lastException); + } + throw new TimeoutException(timeoutMessage); + } + + /// + /// Keeps executing the given action until it completes without throwing an exception or the timeout is reached. + /// + /// The action to execute repeatedly until it succeeds + /// The maximum time to keep retrying + /// Optional cancellation token + public static async Task Eventually(Action action, TimeSpan timeout, CancellationToken cancellationToken = default) + { + await Eventually(() => + { + action(); + return Task.CompletedTask; + }, timeout, cancellationToken); + } + + /// + /// Keeps executing the given function until it returns a result without throwing an exception or the timeout is reached. + /// + /// The return type of the function + /// The function to execute repeatedly until it succeeds + /// The maximum time to keep retrying + /// Optional cancellation token + /// The result of the function when it succeeds + public static async Task Eventually(Func> function, TimeSpan timeout, CancellationToken cancellationToken = default) + { + T result = default(T)!; + + await Eventually(async () => + { + result = await function(); + }, timeout, cancellationToken); + + return result; + } + + /// + /// Keeps executing the given function until it returns a result without throwing an exception or the timeout is reached. + /// + /// The return type of the function + /// The function to execute repeatedly until it succeeds + /// The maximum time to keep retrying + /// Optional cancellation token + /// The result of the function when it succeeds + public static async Task Eventually(Func function, TimeSpan timeout, CancellationToken cancellationToken = default) + { + T result = default(T)!; + + await Eventually(() => + { + result = function(); + }, timeout, cancellationToken); + + return result; + } + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/Support/TestAttributes/AllQueuesTestCasesAttribute.cs b/source/Halibut.Tests/Support/TestAttributes/AllQueuesTestCasesAttribute.cs index a826d33d0..924e2c2ef 100644 --- a/source/Halibut.Tests/Support/TestAttributes/AllQueuesTestCasesAttribute.cs +++ b/source/Halibut.Tests/Support/TestAttributes/AllQueuesTestCasesAttribute.cs @@ -6,6 +6,7 @@ using Halibut.Tests.Builders; using Halibut.Tests.Support.BackwardsCompatibility; using Halibut.Tests.Support.TestCases; +using Halibut.Tests.TestSetup.Redis; using NUnit.Framework; namespace Halibut.Tests.Support.TestAttributes @@ -25,7 +26,14 @@ static class PendingRequestQueueFactories public static IEnumerable GetEnumerator() { var factories = new List(); - factories.Add(new PendingRequestQueueTestCase("InMemory", () => new PendingRequestQueueBuilder())); +#if NET8_0_OR_GREATER + if (EnsureRedisIsAvailableSetupFixture.WillRunRedisTests) + { + + factories.Add(new PendingRequestQueueTestCase(PendingRequestQueueTestCase.RedisTestCaseName, () => new RedisPendingRequestQueueBuilder())); + } +#endif + factories.Add(new PendingRequestQueueTestCase(PendingRequestQueueTestCase.InMemoryTestCaseName, () => new PendingRequestQueueBuilder())); return factories; } @@ -34,6 +42,11 @@ public static IEnumerable GetEnumerator() public class PendingRequestQueueTestCase { + + public static string RedisTestCaseName = "Redis"; + + public static string InMemoryTestCaseName = "InMemory"; + public readonly string Name; private Func BuilderBuilder { get; } diff --git a/source/Halibut.Tests/Support/TestAttributes/RedisTestAttribute.cs b/source/Halibut.Tests/Support/TestAttributes/RedisTestAttribute.cs new file mode 100644 index 000000000..66bc39355 --- /dev/null +++ b/source/Halibut.Tests/Support/TestAttributes/RedisTestAttribute.cs @@ -0,0 +1,23 @@ +using System; +using Halibut.Tests.TestSetup.Redis; +using NUnit.Framework; +using NUnit.Framework.Interfaces; +using NUnit.Framework.Internal; + +[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method)] +public class RedisTestAttribute : NUnitAttribute, IApplyToTest +{ + public void ApplyToTest(Test test) + { + if (test.RunState == RunState.NotRunnable || test.RunState == RunState.Ignored) + { + return; + } + + if (!EnsureRedisIsAvailableSetupFixture.WillRunRedisTests) + { + test.RunState = RunState.Skipped; + test.Properties.Add("_SKIPREASON", "Redis tests are not yet supported on this OS or dotnet version."); + } + } +} diff --git a/source/Halibut.Tests/Support/Try.cs b/source/Halibut.Tests/Support/Try.cs index 09c35ceb0..dfc9295d2 100644 --- a/source/Halibut.Tests/Support/Try.cs +++ b/source/Halibut.Tests/Support/Try.cs @@ -6,6 +6,19 @@ namespace Halibut.Tests.Support { public static class Try { + public static Exception? CatchingError(Action tryThisAction) + { + try + { + tryThisAction(); + } + catch (Exception e) + { + return e; + } + + return null; + } public static void CatchingError(Action tryThisAction, Action onFailure) { try diff --git a/source/Halibut.Tests/TestSetup/Redis/CreateRedisDockerContainerForTests.cs b/source/Halibut.Tests/TestSetup/Redis/CreateRedisDockerContainerForTests.cs new file mode 100644 index 000000000..cdd819553 --- /dev/null +++ b/source/Halibut.Tests/TestSetup/Redis/CreateRedisDockerContainerForTests.cs @@ -0,0 +1,43 @@ +using System; +using System.Threading.Tasks; +using Halibut.Tests.Queue.Redis.Utils; +using Serilog; + +namespace Halibut.Tests.TestSetup +{ + public class CreateRedisDockerContainerForTests : IAsyncDisposable + { + readonly ILogger logger; + public RedisContainer? container = null; + + public CreateRedisDockerContainerForTests(ILogger logger) + { + this.logger = logger; + } + + public async Task InitializeAsync() + { + logger.Information("Creating Redis container"); + container = new RedisContainerBuilder().Build(); + + logger.Information("Starting Redis container"); + await container.StartAsync(); + logger.Information("Redis container started successfully with connection string: {ConnectionString}", container.ConnectionString); + + } + public async ValueTask DisposeAsync() + { + if (container != null) + { + try + { + await container.DisposeAsync(); + } + catch (Exception e) + { + logger.Error(e, "Error while disposing Redis container"); + } + } + } + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/TestSetup/Redis/EnsureRedisIsAvailableSetupFixture.cs b/source/Halibut.Tests/TestSetup/Redis/EnsureRedisIsAvailableSetupFixture.cs new file mode 100644 index 000000000..0b82d2b21 --- /dev/null +++ b/source/Halibut.Tests/TestSetup/Redis/EnsureRedisIsAvailableSetupFixture.cs @@ -0,0 +1,62 @@ +using System; +using System.Runtime.InteropServices; +using Halibut.Tests.Support; +using Serilog; +using StackExchange.Redis; + +namespace Halibut.Tests.TestSetup.Redis +{ + public class EnsureRedisIsAvailableSetupFixture : ISetupFixture + { + public static bool WillRunRedisTests => +#if NETFRAMEWORK + false; +#else + !RuntimeInformation.IsOSPlatform(OSPlatform.Windows) + || !TeamCityDetection.IsRunningInTeamCity(); +#endif + + static readonly int RedisPortToTry = EnvironmentVariableReaderHelper.TryReadIntFromEnvironmentVariable("HALIBUT_REDIS_PORT") ?? 6379; + static readonly string RedisHost = Environment.GetEnvironmentVariable("HALIBUT_REDIS_HOST") ?? "localhost"; + CreateRedisDockerContainerForTests? redisContainer = null; + public void OneTimeSetUp(ILogger logger) + { + if (!WillRunRedisTests) return; + + var isWindows = RuntimeInformation.IsOSPlatform(OSPlatform.Windows); + bool shouldCreateRedis = WillRunRedisTests; + + if (!TeamCityDetection.IsRunningInTeamCity()) + { + // Does the user already have redis running on the normal port? + try + { + using var multiplexer = ConnectionMultiplexer.Connect(RedisHost + ":" + RedisPortToTry); + var ts = multiplexer.GetDatabase().Ping(); + RedisTestHost.SetPort(RedisPortToTry); + RedisTestHost.RedisHost = RedisHost; + logger.Information("Able to connect to redis using {Host}:{Port}", RedisHost, RedisPortToTry); + return; + } + catch + { + shouldCreateRedis = true; + } + } + + if (shouldCreateRedis) + { + redisContainer = new CreateRedisDockerContainerForTests(logger); + redisContainer.InitializeAsync().GetAwaiter().GetResult(); + RedisTestHost.SetPort(redisContainer.container!.RedisPort); + logger.Information("RedisPort is: {RedisPort}", RedisTestHost.Port()); + } + } + + public void OneTimeTearDown(ILogger logger) + { + + if(redisContainer != null) redisContainer.DisposeAsync().GetAwaiter().GetResult(); + } + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/TestSetup/Redis/RedisTestHost.cs b/source/Halibut.Tests/TestSetup/Redis/RedisTestHost.cs new file mode 100644 index 000000000..4e117dd9f --- /dev/null +++ b/source/Halibut.Tests/TestSetup/Redis/RedisTestHost.cs @@ -0,0 +1,25 @@ +using System; + +namespace Halibut.Tests.TestSetup.Redis +{ + public static class RedisTestHost + { + static int redisPort = 0; + public static void SetPort(int value) + { + redisPort = value; + } + + public static int Port() + { + if (redisPort == 0) + { + throw new Exception("Redis is unavailable"); + } + + return redisPort; + } + + public static string RedisHost { get; set; } = "localhost"; + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/TestsSetupClass.cs b/source/Halibut.Tests/TestsSetupClass.cs index bf834c5f4..1e1496816 100644 --- a/source/Halibut.Tests/TestsSetupClass.cs +++ b/source/Halibut.Tests/TestsSetupClass.cs @@ -2,6 +2,7 @@ using System.Text; using Halibut.Tests.Support; using Halibut.Tests.TestSetup; +using Halibut.Tests.TestSetup.Redis; using NUnit.Framework; namespace Halibut.Tests @@ -13,15 +14,15 @@ namespace Halibut.Tests [SetUpFixture] public class TestsSetupClass { - ISetupFixture[] setupFixtures = new ISetupFixture[] + private ISetupFixture[] setupFixtures = new ISetupFixture[] { + new EnsureRedisIsAvailableSetupFixture(), new BumpThreadPoolForAllTests() }; [OneTimeSetUp] public void OneTimeSetup() { - var sb = new StringBuilder(); var traceLogFileLogger = new TraceLogFileLogger("TestsSetupClass"); var logger = new SerilogLoggerBuilder() diff --git a/source/Halibut.Tests/Util/AssertThrowsAny.cs b/source/Halibut.Tests/Util/AssertThrowsAny.cs new file mode 100644 index 000000000..52a915e40 --- /dev/null +++ b/source/Halibut.Tests/Util/AssertThrowsAny.cs @@ -0,0 +1,24 @@ +using System; +using System.Threading.Tasks; +using NUnit.Framework; + +namespace Halibut.Tests.Util +{ + public static class AssertThrowsAny + { + public static async Task Exception(Func action) + { + try + { + await action(); + Assert.Fail("Should have thrown an exception."); + } + catch (Exception exception) + { + return exception; + } + + throw new Exception("Impossible?"); + } + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/Util/CancelOnDisposeCancellationTokenFixture.cs b/source/Halibut.Tests/Util/CancelOnDisposeCancellationTokenFixture.cs new file mode 100644 index 000000000..5b719c197 --- /dev/null +++ b/source/Halibut.Tests/Util/CancelOnDisposeCancellationTokenFixture.cs @@ -0,0 +1,237 @@ +using System; +using System.Threading; +using System.Threading.Tasks; +using FluentAssertions; +using Halibut.Util; +using Nito.AsyncEx; +using NUnit.Framework; + +namespace Halibut.Tests.Util +{ + public class CancelOnDisposeCancellationTokenFixture : BaseTest + { + [Test] + public async Task Constructor_NoParameters_ShouldCreateValidToken() + { + // Arrange & Act + await using var cancellationToken = new CancelOnDisposeCancellationToken(); + + // Assert + cancellationToken.Token.Should().NotBeNull(); + cancellationToken.Token.IsCancellationRequested.Should().BeFalse(); + } + + [Test] + public async Task Constructor_WithSingleToken_ShouldCreateLinkedToken() + { + // Arrange + using var parentCts = new CancellationTokenSource(); + var parentToken = parentCts.Token; + + // Act + await using var cancellationToken = new CancelOnDisposeCancellationToken(parentToken); + + // Assert + cancellationToken.Token.Should().NotBeNull(); + cancellationToken.Token.IsCancellationRequested.Should().BeFalse(); + + // When parent is cancelled, child should also be cancelled + await parentCts.CancelAsync(); + cancellationToken.Token.IsCancellationRequested.Should().BeTrue(); + } + + [Test] + public async Task Constructor_WithMultipleTokens_ShouldCreateLinkedToken() + { + // Arrange + using var parentCts1 = new CancellationTokenSource(); + using var parentCts2 = new CancellationTokenSource(); + var parentToken1 = parentCts1.Token; + var parentToken2 = parentCts2.Token; + + // Act + await using var cancellationToken = new CancelOnDisposeCancellationToken(parentToken1, parentToken2); + + // Assert + cancellationToken.Token.Should().NotBeNull(); + cancellationToken.Token.IsCancellationRequested.Should().BeFalse(); + + // When any parent is cancelled, child should also be cancelled + await parentCts1.CancelAsync(); + cancellationToken.Token.IsCancellationRequested.Should().BeTrue(); + } + + [Test] + public async Task Token_PropertyAccess_ShouldNotThrowAfterDisposal() + { + // Arrange + var cancellationToken = new CancelOnDisposeCancellationToken(); + var token = cancellationToken.Token; + + // Act + await cancellationToken.DisposeAsync(); + + // Assert - accessing Token property should not throw + var tokenAfterDispose = cancellationToken.Token; + tokenAfterDispose.Should().Be(token); // Should be the same token instance + } + + [Test] + public async Task CancelAsync_ShouldCancelToken() + { + // Arrange + await using var cancellationToken = new CancelOnDisposeCancellationToken(); + + // Act + await cancellationToken.CancelAsync(); + + // Assert + cancellationToken.Token.IsCancellationRequested.Should().BeTrue(); + } + + [Test] + public async Task CancelAfter_ShouldCancelTokenAfterTimeout() + { + // Arrange + await using var cancellationToken = new CancelOnDisposeCancellationToken(); + + // Act + cancellationToken.CancelAfter(TimeSpan.FromMilliseconds(200)); + + // Assert - token should not be cancelled immediately + cancellationToken.Token.IsCancellationRequested.Should().BeFalse(); + + // Wait for timeout + Thread.Sleep(500); + cancellationToken.Token.IsCancellationRequested.Should().BeTrue(); + } + + [Test] + public async Task AwaitTasksBeforeCTSDispose_ShouldWaitForTasksOnDispose() + { + // Arrange + var cancellationToken = new CancelOnDisposeCancellationToken(); + + var manualResetEvent = new AsyncManualResetEvent(); + // Act + cancellationToken.AwaitTasksBeforeCTSDispose(manualResetEvent.WaitAsync(CancellationToken)); + + // Start disposal (don't await yet) + var disposeTask = cancellationToken.DisposeAsync(); + + disposeTask.IsCompleted.Should().BeFalse(); + + manualResetEvent.Set(); + await Task.WhenAny(Task.Delay(TimeSpan.FromSeconds(1)), Task.Run(async () => await disposeTask)); + + // Assert + disposeTask.IsCompleted.Should().BeTrue(); + } + + [Test] + public async Task AwaitTasksBeforeCTSDispose_ShouldHandleTaskExceptions() + { + // Arrange + var cancellationToken = new CancelOnDisposeCancellationToken(); + + var faultyTask = Task.Run(async () => + { + await Task.CompletedTask; + throw new InvalidOperationException("Test exception"); + }); + + // Act + cancellationToken.AwaitTasksBeforeCTSDispose(faultyTask); + + // Assert - dispose should not throw even though the task throws + await cancellationToken.DisposeAsync(); + + // Task should be faulted + faultyTask.IsFaulted.Should().BeTrue(); + } + + [Test] + public async Task DisposeAsync_ShouldCancelTokenBeforeDispose() + { + // Arrange + var cancellationToken = new CancelOnDisposeCancellationToken(); + var token = cancellationToken.Token; + + // Act + await cancellationToken.DisposeAsync(); + + // Assert + token.IsCancellationRequested.Should().BeTrue(); + } + + [Test] + public async Task DisposeAsync_CalledMultipleTimes_ShouldNotThrow() + { + // Arrange + var cancellationToken = new CancelOnDisposeCancellationToken(); + + // Act & Assert - multiple dispose calls should not throw + await cancellationToken.DisposeAsync(); + await cancellationToken.DisposeAsync(); + await cancellationToken.DisposeAsync(); + } + + [Test] + public async Task DisposeAsync_WithTasksUsingToken_ShouldWaitForCancellation() + { + // Arrange + var taskCancelled = false; + var cancellationToken = new CancelOnDisposeCancellationToken(); + + var taskUsingToken = Task.Run(async () => + { + try + { + await Task.Delay(1000, cancellationToken.Token); + } + catch (OperationCanceledException) + { + taskCancelled = true; + } + }); + + cancellationToken.AwaitTasksBeforeCTSDispose(taskUsingToken); + + // Act + await cancellationToken.DisposeAsync(); + + // Assert + taskCancelled.Should().BeTrue(); + } + + [Test] + public async Task DisposeAsync_WithMultipleTasks_ShouldWaitForAllTasks() + { + // Arrange + var task1Completed = false; + var task2Completed = false; + var cancellationToken = new CancelOnDisposeCancellationToken(); + + var task1 = Task.Run(async () => + { + await Task.CompletedTask; + task1Completed = true; + }); + + var task2 = Task.Run(async () => + { + await Task.CompletedTask; + task2Completed = true; + }); + + cancellationToken.AwaitTasksBeforeCTSDispose(task1, task2); + + // Act + await cancellationToken.DisposeAsync(); + + // Assert + task1Completed.Should().BeTrue(); + task2Completed.Should().BeTrue(); + } + } +} diff --git a/source/Halibut.Tests/Util/CancellationTokenSourceExtensionMethods.cs b/source/Halibut.Tests/Util/CancellationTokenSourceExtensionMethods.cs new file mode 100644 index 000000000..27140b547 --- /dev/null +++ b/source/Halibut.Tests/Util/CancellationTokenSourceExtensionMethods.cs @@ -0,0 +1,30 @@ +// Copyright 2012-2013 Octopus Deploy Pty. Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +using System.Threading; +using System.Threading.Tasks; + +namespace Halibut.Tests.Util +{ + public static class CancellationTokenSourceExtensionMethods + { +#if NETFRAMEWORK + public static async Task CancelAsync(this CancellationTokenSource cts) + { + await Task.CompletedTask; + cts.Cancel(); + } +#endif + } +} \ No newline at end of file diff --git a/source/Halibut.Tests/Util/LinearBackoffStrategyFixture.cs b/source/Halibut.Tests/Util/LinearBackoffStrategyFixture.cs new file mode 100644 index 000000000..1a268259c --- /dev/null +++ b/source/Halibut.Tests/Util/LinearBackoffStrategyFixture.cs @@ -0,0 +1,191 @@ +using System; +using Halibut.Util; +using NUnit.Framework; + +namespace Halibut.Tests.Util +{ + public class LinearBackoffStrategyFixture + { + [Test] + [TestCase(1, 1, 30, 1, ExpectedResult = 1)] + [TestCase(1, 1, 30, 2, ExpectedResult = 2)] + [TestCase(1, 1, 30, 3, ExpectedResult = 3)] + [TestCase(1, 1, 30, 10, ExpectedResult = 10)] + [TestCase(1, 1, 30, 30, ExpectedResult = 30)] + [TestCase(1, 1, 30, 31, ExpectedResult = 30)] // Should cap at maximum + [TestCase(1, 1, 30, 50, ExpectedResult = 30)] // Should cap at maximum + [TestCase(2, 3, 20, 1, ExpectedResult = 2)] // initialDelay=2, increment=3 + [TestCase(2, 3, 20, 2, ExpectedResult = 5)] // 2 + (2-1)*3 = 5 + [TestCase(2, 3, 20, 3, ExpectedResult = 8)] // 2 + (3-1)*3 = 8 + [TestCase(2, 3, 20, 7, ExpectedResult = 20)] // 2 + (7-1)*3 = 20 (at max) + [TestCase(2, 3, 20, 8, ExpectedResult = 20)] // Should cap at maximum + [TestCase(0, 2, 10, 1, ExpectedResult = 0)] // Zero initial delay + [TestCase(0, 2, 10, 2, ExpectedResult = 2)] // 0 + (2-1)*2 = 2 + [TestCase(0, 2, 10, 6, ExpectedResult = 10)] // 0 + (6-1)*2 = 10 (at max) + public int CalculateDelayForAttemptShouldBeCorrect(int initialDelaySeconds, int incrementSeconds, int maxDelaySeconds, int attemptNumber) + { + var strategy = new LinearBackoffStrategy( + TimeSpan.FromSeconds(initialDelaySeconds), + TimeSpan.FromSeconds(incrementSeconds), + TimeSpan.FromSeconds(maxDelaySeconds) + ); + + var delay = strategy.CalculateDelayForAttempt(attemptNumber); + + return (int)delay.TotalSeconds; + } + + [Test] + public void GetSleepPeriodShouldReturnZeroWhenNoAttemptsMade() + { + var strategy = LinearBackoffStrategy.Create(); + + var delay = strategy.GetSleepPeriod(); + + Assert.That(delay, Is.EqualTo(TimeSpan.Zero)); + } + + [Test] + public void GetSleepPeriodShouldIncreaseLinearlyWithAttempts() + { + var strategy = new LinearBackoffStrategy( + TimeSpan.FromSeconds(2), + TimeSpan.FromSeconds(3), + TimeSpan.FromSeconds(20) + ); + + // First attempt + strategy.Try(); + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.FromSeconds(2))); + Assert.That(strategy.AttemptCount, Is.EqualTo(1)); + + // Second attempt + strategy.Try(); + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.FromSeconds(5))); // 2 + (2-1)*3 = 5 + Assert.That(strategy.AttemptCount, Is.EqualTo(2)); + + // Third attempt + strategy.Try(); + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.FromSeconds(8))); // 2 + (3-1)*3 = 8 + Assert.That(strategy.AttemptCount, Is.EqualTo(3)); + } + + [Test] + public void GetSleepPeriodShouldCapAtMaximumDelay() + { + var strategy = new LinearBackoffStrategy( + TimeSpan.FromSeconds(5), + TimeSpan.FromSeconds(10), + TimeSpan.FromSeconds(20) + ); + + strategy.Try(); // Attempt 1: 5 seconds + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.FromSeconds(5))); + + strategy.Try(); // Attempt 2: 15 seconds + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.FromSeconds(15))); + + strategy.Try(); // Attempt 3: would be 25, but capped at 20 + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.FromSeconds(20))); + + strategy.Try(); // Attempt 4: still capped at 20 + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.FromSeconds(20))); + } + + [Test] + public void SuccessShouldResetAttemptCount() + { + var strategy = LinearBackoffStrategy.Create(); + + strategy.Try(); + strategy.Try(); + strategy.Try(); + Assert.That(strategy.AttemptCount, Is.EqualTo(3)); + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.FromSeconds(3))); + + strategy.Success(); + Assert.That(strategy.AttemptCount, Is.EqualTo(0)); + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.Zero)); + + // After reset, should start from the beginning again + strategy.Try(); + Assert.That(strategy.AttemptCount, Is.EqualTo(1)); + Assert.That(strategy.GetSleepPeriod(), Is.EqualTo(TimeSpan.FromSeconds(1))); + } + + [Test] + public void CreateShouldReturnStrategyWithDefaultValues() + { + var strategy = LinearBackoffStrategy.Create(); + + Assert.That(strategy.InitialDelay, Is.EqualTo(TimeSpan.FromSeconds(1))); + Assert.That(strategy.Increment, Is.EqualTo(TimeSpan.FromSeconds(1))); + Assert.That(strategy.MaximumDelay, Is.EqualTo(TimeSpan.FromSeconds(30))); + Assert.That(strategy.AttemptCount, Is.EqualTo(0)); + } + + [Test] + [TestCase(-1)] // Negative initial delay + public void InvalidInitialDelayShouldThrow(int initialDelaySeconds) + { + Assert.Throws(() => + new LinearBackoffStrategy( + TimeSpan.FromSeconds(initialDelaySeconds), + TimeSpan.FromSeconds(1), + TimeSpan.FromSeconds(10) + ) + ); + } + + [Test] + [TestCase(0)] // Zero increment + [TestCase(-1)] // Negative increment + public void InvalidIncrementShouldThrow(int incrementSeconds) + { + Assert.Throws(() => + new LinearBackoffStrategy( + TimeSpan.FromSeconds(1), + TimeSpan.FromSeconds(incrementSeconds), + TimeSpan.FromSeconds(10) + ) + ); + } + + [Test] + public void MaximumDelayLessThanInitialDelayShouldThrow() + { + Assert.Throws(() => + new LinearBackoffStrategy( + TimeSpan.FromSeconds(10), // Initial delay + TimeSpan.FromSeconds(1), // Increment + TimeSpan.FromSeconds(5) // Maximum delay (less than initial) + ) + ); + } + + [Test] + [TestCase(0, ExpectedResult = 0)] + [TestCase(-1, ExpectedResult = 0)] + [TestCase(-10, ExpectedResult = 0)] + public int CalculateDelayForAttemptShouldReturnZeroForInvalidAttemptNumbers(int attemptNumber) + { + var strategy = LinearBackoffStrategy.Create(); + var delay = strategy.CalculateDelayForAttempt(attemptNumber); + return (int)delay.TotalSeconds; + } + + [Test] + public void PropertiesShouldReturnConstructorValues() + { + var initialDelay = TimeSpan.FromSeconds(3); + var increment = TimeSpan.FromSeconds(2); + var maximumDelay = TimeSpan.FromSeconds(15); + + var strategy = new LinearBackoffStrategy(initialDelay, increment, maximumDelay); + + Assert.That(strategy.InitialDelay, Is.EqualTo(initialDelay)); + Assert.That(strategy.Increment, Is.EqualTo(increment)); + Assert.That(strategy.MaximumDelay, Is.EqualTo(maximumDelay)); + } + } +} \ No newline at end of file diff --git a/source/Halibut/DataStream.cs b/source/Halibut/DataStream.cs index 414457fc0..51906c597 100644 --- a/source/Halibut/DataStream.cs +++ b/source/Halibut/DataStream.cs @@ -10,7 +10,7 @@ namespace Halibut { public class DataStream : IEquatable, IDataStreamInternal { - readonly Func writerAsync; + protected Func writerAsync; IDataStreamReceiver? receiver; [JsonConstructor] @@ -179,9 +179,24 @@ async Task IDataStreamInternal.TransmitAsync(Stream stream, CancellationToken ca await writerAsync(stream, cancellationToken); } + public async Task WriteData(Stream stream, CancellationToken cancellationToken) + { + await writerAsync(stream, cancellationToken); + } + void IDataStreamInternal.Received(IDataStreamReceiver attachedReceiver) { receiver = attachedReceiver; } + + /// + /// Used to re-hydrate deserialised data streams, which won't have a writer set. + /// + /// + public void SetWriterAsync(Func writerAsync) + { + if(this.writerAsync != null) throw new InvalidOperationException("Cannot set writer more than once."); + this.writerAsync = writerAsync; + } } } \ No newline at end of file diff --git a/source/Halibut/Diagnostics/ExceptionReturnedByHalibutProxyExtensionMethod.cs b/source/Halibut/Diagnostics/ExceptionReturnedByHalibutProxyExtensionMethod.cs index 1406a5bac..011eefa37 100644 --- a/source/Halibut/Diagnostics/ExceptionReturnedByHalibutProxyExtensionMethod.cs +++ b/source/Halibut/Diagnostics/ExceptionReturnedByHalibutProxyExtensionMethod.cs @@ -2,6 +2,8 @@ using System.IO; using System.Net.Sockets; using Halibut.Exceptions; +using Halibut.Queue.Redis; +using Halibut.Queue.Redis.Exceptions; using Halibut.Transport; using Halibut.Transport.Protocol; using Halibut.Transport.Proxy.Exceptions; @@ -13,19 +15,48 @@ public static class ExceptionReturnedByHalibutProxyExtensionMethod public static HalibutRetryableErrorType IsRetryableError(this Exception exception) { var halibutNetworkExceptionType = IsNetworkError(exception); - switch (halibutNetworkExceptionType) + + // All network errors can be retried. + if (halibutNetworkExceptionType == HalibutNetworkExceptionType.IsNetworkError) return HalibutRetryableErrorType.IsRetryable; + + if (IsRedisRetryableError(exception)) return HalibutRetryableErrorType.IsRetryable; + + if (halibutNetworkExceptionType == HalibutNetworkExceptionType.NotANetworkError) return HalibutRetryableErrorType.NotRetryable; + + return HalibutRetryableErrorType.UnknownError; + } + + static bool IsRedisRetryableError(Exception exception) + { + if (exception is RedisDataLossHalibutClientException + || exception is RedisQueueShutdownClientException + || exception is CouldNotGetDataLossTokenInTimeHalibutClientException + || exception is ErrorWhilePreparingRequestForQueueHalibutClientException + || exception is ErrorOccuredWhenInsertingDataIntoRedisHalibutPendingRequestQueueHalibutClientException) + { + return true; + } + + if (exception is HalibutClientException) + { + // Sometimes the error occurs NOT on the node executing the RPC, e.g. the Node talking to tentacle. + // In that case we need to look at error messages, since we won't have the original exception type. + // We will also need to check error messages any time Error Responses are raised rather than a raw exception + // bubbling out of the QueueAndWait method. + if (exception.Message.Contains("The request was abandoned, possibly because the node processing the request shutdown or redis lost all of its data.")) return true; + if (exception.Message.Contains("The node processing the request did not send a heartbeat for long enough, and so the node is now assumed to be offline.")) return true; + if (exception.Message.Contains("Error occured when reading data from the queue")) return true; + if (exception.Message.Contains("error occured when preparing request for queue")) return true; + } + + if (exception is HalibutClientException && exception.InnerException != null) { - case HalibutNetworkExceptionType.IsNetworkError: - return HalibutRetryableErrorType.IsRetryable; - case HalibutNetworkExceptionType.UnknownError: - return HalibutRetryableErrorType.UnknownError; - case HalibutNetworkExceptionType.NotANetworkError: - return HalibutRetryableErrorType.NotRetryable; - default: - throw new ArgumentOutOfRangeException(); + return IsRedisRetryableError(exception.InnerException); } + + return false; } - + /// /// Classifies the exception thrown from a halibut proxy as a network error or not. /// In some cases it is not possible to tell if the exception is a network error. diff --git a/source/Halibut/Halibut.csproj b/source/Halibut/Halibut.csproj index 3f77b994c..383684887 100644 --- a/source/Halibut/Halibut.csproj +++ b/source/Halibut/Halibut.csproj @@ -30,6 +30,7 @@ + diff --git a/source/Halibut/HalibutRuntime.cs b/source/Halibut/HalibutRuntime.cs index effb8abc9..93d005c90 100644 --- a/source/Halibut/HalibutRuntime.cs +++ b/source/Halibut/HalibutRuntime.cs @@ -87,7 +87,15 @@ internal HalibutRuntime( IPendingRequestQueue GetQueue(Uri target) { - return queues.GetOrAdd(target, u => queueFactory.CreateQueue(target)); + IPendingRequestQueue? createdQueue = null; + var queue = queues.GetOrAdd(target, u => createdQueue = queueFactory.CreateQueue(target)); + if (createdQueue != null && !ReferenceEquals(createdQueue, queue)) + { + // We created a queue that won't be used, dispose of it in the background. + Task.Run(() => Try.IgnoringError(() => createdQueue.DisposeAsync())); + } + + return queue; } public int Listen() diff --git a/source/Halibut/HalibutRuntimeBuilder.cs b/source/Halibut/HalibutRuntimeBuilder.cs index 4f0ad832b..ce50749d2 100644 --- a/source/Halibut/HalibutRuntimeBuilder.cs +++ b/source/Halibut/HalibutRuntimeBuilder.cs @@ -2,6 +2,7 @@ using System.Linq; using System.Security.Cryptography.X509Certificates; using Halibut.Diagnostics; +using Halibut.Queue; using Halibut.ServiceModel; using Halibut.Transport.Observability; using Halibut.Transport.Protocol; @@ -13,7 +14,7 @@ namespace Halibut public class HalibutRuntimeBuilder { ILogFactory? logFactory; - IPendingRequestQueueFactory? queueFactory; + Func? queueFactoryFactory; X509Certificate2? serverCertificate; IServiceFactory? serviceFactory; ITrustProvider? trustProvider; @@ -66,7 +67,13 @@ public HalibutRuntimeBuilder WithTrustProvider(ITrustProvider trustProvider) public HalibutRuntimeBuilder WithPendingRequestQueueFactory(IPendingRequestQueueFactory queueFactory) { - this.queueFactory = queueFactory; + this.queueFactoryFactory = _ => queueFactory; + return this; + } + + public HalibutRuntimeBuilder WithPendingRequestQueueFactory(Func queueFactory) + { + this.queueFactoryFactory = queueFactory; return this; } @@ -133,7 +140,7 @@ public HalibutRuntime Build() var serviceFactory = this.serviceFactory ?? new NullServiceFactory(); if (serverCertificate == null) throw new ArgumentException($"Set a server certificate with {nameof(WithServerCertificate)} before calling {nameof(Build)}", nameof(serverCertificate)); var logFactory = this.logFactory ?? new LogFactory(); - var queueFactory = this.queueFactory ?? new PendingRequestQueueFactoryAsync(halibutTimeoutsAndLimits, logFactory); + var trustProvider = this.trustProvider ?? new DefaultTrustProvider(); //use either the supplied type registry, or configure the default one @@ -153,6 +160,11 @@ public HalibutRuntime Build() var builder = new MessageSerializerBuilder(logFactory); configureMessageSerializerBuilder?.Invoke(builder); var messageSerializer = builder.WithTypeRegistry(typeRegistry).Build(); + + var queueMessageSerializer = new QueueMessageSerializer(messageSerializer.CreateStreamCapturingSerializer); + var queueFactory = this.queueFactoryFactory?.Invoke(queueMessageSerializer) + ?? new PendingRequestQueueFactoryAsync(halibutTimeoutsAndLimits, logFactory); + var streamFactory = this.streamFactory ?? new StreamFactory(); var connectionsObserver = this.connectionsObserver ?? NoOpConnectionsObserver.Instance; var rpcObserver = this.rpcObserver ?? new NoRpcObserver(); diff --git a/source/Halibut/Queue/QueueMessageSerializer.cs b/source/Halibut/Queue/QueueMessageSerializer.cs new file mode 100644 index 000000000..7e663dfc9 --- /dev/null +++ b/source/Halibut/Queue/QueueMessageSerializer.cs @@ -0,0 +1,69 @@ +using System; +using System.Collections.Generic; +using System.Globalization; +using System.IO; +using System.Text; +using Halibut.Transport.Protocol; +using Newtonsoft.Json; + +namespace Halibut.Queue +{ + /// + /// Uses the same JSON serializer used by Halibut to send messages over the wire to + /// serialise messages for the queue. Note that the queue serialises to JSON rather + /// than BSON which is what is sent over the wire. + /// + /// Based on battle-tested MessageSerializer, any quirks may be inherited from there. + /// + public class QueueMessageSerializer + { + readonly Func createStreamCapturingSerializer; + + public QueueMessageSerializer(Func createStreamCapturingSerializer) + { + this.createStreamCapturingSerializer = createStreamCapturingSerializer; + } + + public (string, IReadOnlyList) WriteMessage(T message) + { + IReadOnlyList dataStreams; + + var sb = new StringBuilder(); + using var sw = new StringWriter(sb, CultureInfo.InvariantCulture); + using (var jsonTextWriter = new JsonTextWriter(sw) { CloseOutput = false }) + { + var streamCapturingSerializer = createStreamCapturingSerializer(); + streamCapturingSerializer.Serializer.Serialize(jsonTextWriter, new MessageEnvelope(message)); + dataStreams = streamCapturingSerializer.DataStreams; + } + + return (sb.ToString(), dataStreams); + } + + public (T Message, IReadOnlyList DataStreams) ReadMessage(string json) + { + using var reader = new JsonTextReader(new StringReader(json)); + var streamCapturingSerializer = createStreamCapturingSerializer(); + var result = streamCapturingSerializer.Serializer.Deserialize>(reader); + + if (result == null) + { + throw new Exception("messageEnvelope is null"); + } + + return (result.Message, streamCapturingSerializer.DataStreams); + } + + // By making this a generic type, each message specifies the exact type it sends/expects + // And it is impossible to deserialize the wrong type - any mismatched type will refuse to deserialize + class MessageEnvelope + { + public MessageEnvelope(T message) + { + Message = message; + } + + public T Message { get; private set; } + } + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/QueuedDataStreams/IStoreDataStreamsForDistributedQueues.cs b/source/Halibut/Queue/QueuedDataStreams/IStoreDataStreamsForDistributedQueues.cs new file mode 100644 index 000000000..de390a705 --- /dev/null +++ b/source/Halibut/Queue/QueuedDataStreams/IStoreDataStreamsForDistributedQueues.cs @@ -0,0 +1,38 @@ +using System; +using System.Collections.Generic; +using System.Threading; +using System.Threading.Tasks; + +namespace Halibut.Queue.QueuedDataStreams +{ + /// + /// The Redis Queue requires that something else can store data streams. The + /// Redis Queue will call this interface for storage and retrieval of data streams. + /// + /// The ReHydrateDataStreams method will be called at most once, and each data stream passed to + /// ReHydrateDataStreams will be read at most once. Thus, it is safe to delete the DataStream from + /// storage once the DataStream `writerAsync` Func is called and will no longer return any more + /// data. This includes in the case the writerAsync method throws. + /// + public interface IStoreDataStreamsForDistributedQueues + { + /// + /// Must store the data for the given dataStreams. + /// + /// + /// + /// A string, DataStreamMetadata, containing a small amount of data that will be stored in redis, this will be + /// given to ReHydrateDataStreams + public Task StoreDataStreams(IReadOnlyList dataStreams, CancellationToken cancellationToken); + + /// + /// Updates the dataStreams `writerAsync` to write the previously stored data. Using + /// the SetWriterAsync method. + /// + /// + /// + /// + /// + public Task ReHydrateDataStreams(string dataStreamMetadata, IReadOnlyList dataStreams, CancellationToken cancellationToken); + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/Cancellation/RequestCancelledSender.cs b/source/Halibut/Queue/Redis/Cancellation/RequestCancelledSender.cs new file mode 100644 index 000000000..74aa0a2d1 --- /dev/null +++ b/source/Halibut/Queue/Redis/Cancellation/RequestCancelledSender.cs @@ -0,0 +1,49 @@ +using System; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Transport.Protocol; +using Halibut.Util; + +#if NET8_0_OR_GREATER +namespace Halibut.Queue.Redis.Cancellation +{ + public class RequestCancelledSender + { + // How long the CancelRequestMarker will sit in redis before it times out. + // If it does timeout it won't matter since the request-sender will stop sending heart beats + // causing the request-processor to cancel the request anyway. + static TimeSpan CancelRequestMarkerTTL = TimeSpan.FromMinutes(15); + + public static async Task TrySendCancellation( + IHalibutRedisTransport halibutRedisTransport, + Uri endpoint, + RequestMessage request, + ILog log) + { + log.Write(EventType.Diagnostic, "Attempting to send cancellation for request - Endpoint: {0}, ActivityId: {1}", endpoint, request.ActivityId); + + await using var cts = new CancelOnDisposeCancellationToken(); + cts.CancelAfter(TimeSpan.FromMinutes(2)); // Best efforts. + + try + { + log.Write(EventType.Diagnostic, "Publishing cancellation notification - Endpoint: {0}, ActivityId: {1}", endpoint, request.ActivityId); + await halibutRedisTransport.PublishCancellation(endpoint, request.ActivityId, cts.Token); + + log.Write(EventType.Diagnostic, "Marking request as cancelled - Endpoint: {0}, ActivityId: {1}", endpoint, request.ActivityId); + await halibutRedisTransport.MarkRequestAsCancelled(endpoint, request.ActivityId, CancelRequestMarkerTTL, cts.Token); + + log.Write(EventType.Diagnostic, "Successfully sent cancellation for request - Endpoint: {0}, ActivityId: {1}", endpoint, request.ActivityId); + } + catch (OperationCanceledException ex) + { + log.Write(EventType.Error, "Cancellation send operation timed out after 2 minutes - Endpoint: {0}, ActivityId: {1}, Error: {2}", endpoint, request.ActivityId, ex.Message); + } + catch (Exception ex) + { + log.Write(EventType.Error, "Failed to send cancellation for request - Endpoint: {0}, ActivityId: {1}, Error: {2}", endpoint, request.ActivityId, ex.Message); + } + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/Cancellation/WatchForRequestCancellation.cs b/source/Halibut/Queue/Redis/Cancellation/WatchForRequestCancellation.cs new file mode 100644 index 000000000..b7915c347 --- /dev/null +++ b/source/Halibut/Queue/Redis/Cancellation/WatchForRequestCancellation.cs @@ -0,0 +1,99 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Transport.Protocol; +using Halibut.Util; + +namespace Halibut.Queue.Redis.Cancellation +{ + public class WatchForRequestCancellation : IAsyncDisposable + { + readonly CancelOnDisposeCancellationToken requestCancelledCts = new(); + public CancellationToken RequestCancelledCancellationToken => requestCancelledCts.Token; + public bool SenderCancelledTheRequest { get; private set; } + + readonly CancelOnDisposeCancellationToken watchForCancellationTokenSource = new(); + + readonly ILog log; + + public WatchForRequestCancellation(Uri endpoint, Guid requestActivityId, IHalibutRedisTransport halibutRedisTransport, ILog log) + { + this.log = log; + log.Write(EventType.Diagnostic, "Starting to watch for request cancellation - Endpoint: {0}, ActivityId: {1}", endpoint, requestActivityId); + + var token = watchForCancellationTokenSource.Token; + var _ = Task.Run(async () => await WatchForCancellation(endpoint, requestActivityId, halibutRedisTransport, token)); + } + + async Task WatchForCancellation(Uri endpoint, Guid requestActivityId, IHalibutRedisTransport halibutRedisTransport, CancellationToken token) + { + try + { + log.Write(EventType.Diagnostic, "Subscribing to request cancellation notifications - Endpoint: {0}, ActivityId: {1}", endpoint, requestActivityId); + + await using var _ = await halibutRedisTransport.SubscribeToRequestCancellation(endpoint, requestActivityId, + async () => + { + await Task.CompletedTask; + log.Write(EventType.Diagnostic, "Received cancellation notification via subscription - Endpoint: {0}, ActivityId: {1}", endpoint, requestActivityId); + await RequestHasBeenCancelled(); + }, + token); + + log.Write(EventType.Diagnostic, "Starting polling loop for request cancellation - Endpoint: {0}, ActivityId: {1}", endpoint, requestActivityId); + + // Also poll to see if the request is cancelled since we can miss the publication. + while (!token.IsCancellationRequested) + { + await Try.IgnoringError(async () => await Task.Delay(TimeSpan.FromSeconds(60), token)); + + if(token.IsCancellationRequested) return; + + try + { + if (await halibutRedisTransport.IsRequestMarkedAsCancelled(endpoint, requestActivityId, token)) + { + log.Write(EventType.Diagnostic, "Request cancellation detected via polling - Endpoint: {0}, ActivityId: {1}", endpoint, requestActivityId); + await RequestHasBeenCancelled(); + break; + } + } + catch (Exception ex) + { + log.Write(EventType.Diagnostic, "Error while polling for request cancellation - Endpoint: {0}, ActivityId: {1}, Error: {2}", endpoint, requestActivityId, ex.Message); + } + } + + log.Write(EventType.Diagnostic, "Exiting watch loop for request cancellation - Endpoint: {0}, ActivityId: {1}", endpoint, requestActivityId); + } + catch (Exception ex) + { + if (!token.IsCancellationRequested) + { + log.Write(EventType.Error, "Unexpected error in request cancellation watcher - Endpoint: {0}, ActivityId: {1}, Error: {2}", endpoint, requestActivityId, ex.Message); + } + } + } + + async Task RequestHasBeenCancelled() + { + SenderCancelledTheRequest = true; + await requestCancelledCts.CancelAsync(); + await watchForCancellationTokenSource.CancelAsync(); + } + + public async ValueTask DisposeAsync() + { + log.Write(EventType.Diagnostic, "Disposing WatchForRequestCancellation"); + + await Try.IgnoringError(async () => await watchForCancellationTokenSource.DisposeAsync()); + await Try.IgnoringError(async () => await requestCancelledCts.DisposeAsync()); + + log.Write(EventType.Diagnostic, "WatchForRequestCancellation disposed"); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/Exceptions/CouldNotGetDataLossTokenInTimeHalibutClientException.cs b/source/Halibut/Queue/Redis/Exceptions/CouldNotGetDataLossTokenInTimeHalibutClientException.cs new file mode 100644 index 000000000..540f9c205 --- /dev/null +++ b/source/Halibut/Queue/Redis/Exceptions/CouldNotGetDataLossTokenInTimeHalibutClientException.cs @@ -0,0 +1,11 @@ +using System; + +namespace Halibut.Queue.Redis.Exceptions +{ + public class CouldNotGetDataLossTokenInTimeHalibutClientException : HalibutClientException + { + public CouldNotGetDataLossTokenInTimeHalibutClientException(string message, Exception inner) : base(message, inner) + { + } + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/Exceptions/ErrorOccuredWhenInsertingDataIntoRedisHalibutPendingRequestQueueHalibutClientException.cs b/source/Halibut/Queue/Redis/Exceptions/ErrorOccuredWhenInsertingDataIntoRedisHalibutPendingRequestQueueHalibutClientException.cs new file mode 100644 index 000000000..e06aff864 --- /dev/null +++ b/source/Halibut/Queue/Redis/Exceptions/ErrorOccuredWhenInsertingDataIntoRedisHalibutPendingRequestQueueHalibutClientException.cs @@ -0,0 +1,11 @@ +using System; + +namespace Halibut.Queue.Redis.Exceptions +{ + public class ErrorOccuredWhenInsertingDataIntoRedisHalibutPendingRequestQueueHalibutClientException : HalibutClientException + { + public ErrorOccuredWhenInsertingDataIntoRedisHalibutPendingRequestQueueHalibutClientException(string message, Exception inner) : base(message, inner) + { + } + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/Exceptions/ErrorWhilePreparingRequestForQueueHalibutClientException.cs b/source/Halibut/Queue/Redis/Exceptions/ErrorWhilePreparingRequestForQueueHalibutClientException.cs new file mode 100644 index 000000000..a4e50d769 --- /dev/null +++ b/source/Halibut/Queue/Redis/Exceptions/ErrorWhilePreparingRequestForQueueHalibutClientException.cs @@ -0,0 +1,11 @@ +using System; + +namespace Halibut.Queue.Redis.Exceptions +{ + public class ErrorWhilePreparingRequestForQueueHalibutClientException : HalibutClientException + { + public ErrorWhilePreparingRequestForQueueHalibutClientException(string message, Exception inner) : base(message, inner) + { + } + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/Exceptions/RedisDataLossHalibutClientException.cs b/source/Halibut/Queue/Redis/Exceptions/RedisDataLossHalibutClientException.cs new file mode 100644 index 000000000..e5f70e229 --- /dev/null +++ b/source/Halibut/Queue/Redis/Exceptions/RedisDataLossHalibutClientException.cs @@ -0,0 +1,11 @@ +using System; + +namespace Halibut.Queue.Redis.Exceptions +{ + public class RedisDataLossHalibutClientException : HalibutClientException + { + public RedisDataLossHalibutClientException(string message) : base(message) + { + } + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/Exceptions/RedisQueueShutdownClientException.cs b/source/Halibut/Queue/Redis/Exceptions/RedisQueueShutdownClientException.cs new file mode 100644 index 000000000..c5227d689 --- /dev/null +++ b/source/Halibut/Queue/Redis/Exceptions/RedisQueueShutdownClientException.cs @@ -0,0 +1,11 @@ +using System; + +namespace Halibut.Queue.Redis.Exceptions +{ + public class RedisQueueShutdownClientException : HalibutClientException + { + public RedisQueueShutdownClientException(string message) : base(message) + { + } + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/MessageStorage/IMessageSerialiserAndDataStreamStorage.cs b/source/Halibut/Queue/Redis/MessageStorage/IMessageSerialiserAndDataStreamStorage.cs new file mode 100644 index 000000000..519e04e1e --- /dev/null +++ b/source/Halibut/Queue/Redis/MessageStorage/IMessageSerialiserAndDataStreamStorage.cs @@ -0,0 +1,24 @@ +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Transport.Protocol; + +namespace Halibut.Queue.Redis.MessageStorage +{ + /// + /// Deals with preparing the request/response messages for storage in the + /// Redis Queue and helps with reading from the queue. + /// + /// This takes care of serialising the message into something that can be stored in + /// Redis, and calls IStoreDataStreamsForDistributedQueues for storage/retrievable + /// of DataStreams. + /// + public interface IMessageSerialiserAndDataStreamStorage + { + Task PrepareRequest(RequestMessage request, CancellationToken cancellationToken); + Task ReadRequest(RedisStoredMessage jsonRequest, CancellationToken cancellationToken); + Task PrepareResponse(ResponseMessage response, CancellationToken cancellationToken); + Task ReadResponse(RedisStoredMessage jsonResponse, CancellationToken cancellationToken); + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/MessageStorage/MessageSerialiserAndDataStreamStorage.cs b/source/Halibut/Queue/Redis/MessageStorage/MessageSerialiserAndDataStreamStorage.cs new file mode 100644 index 000000000..78fc44302 --- /dev/null +++ b/source/Halibut/Queue/Redis/MessageStorage/MessageSerialiserAndDataStreamStorage.cs @@ -0,0 +1,49 @@ +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Queue.QueuedDataStreams; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Transport.Protocol; + +namespace Halibut.Queue.Redis.MessageStorage +{ + public class MessageSerialiserAndDataStreamStorage : IMessageSerialiserAndDataStreamStorage + { + readonly QueueMessageSerializer queueMessageSerializer; + readonly IStoreDataStreamsForDistributedQueues storeDataStreamsForDistributedQueues; + + public MessageSerialiserAndDataStreamStorage(QueueMessageSerializer queueMessageSerializer, IStoreDataStreamsForDistributedQueues storeDataStreamsForDistributedQueues) + { + this.queueMessageSerializer = queueMessageSerializer; + this.storeDataStreamsForDistributedQueues = storeDataStreamsForDistributedQueues; + } + + public async Task PrepareRequest(RequestMessage request, CancellationToken cancellationToken) + { + var (jsonRequestMessage, dataStreams) = queueMessageSerializer.WriteMessage(request); + var dataStreamMetaData = await storeDataStreamsForDistributedQueues.StoreDataStreams(dataStreams, cancellationToken); + return new RedisStoredMessage(jsonRequestMessage, dataStreamMetaData); + } + + public async Task ReadRequest(RedisStoredMessage storedMessage, CancellationToken cancellationToken) + { + var (request, dataStreams) = queueMessageSerializer.ReadMessage(storedMessage.Message); + await storeDataStreamsForDistributedQueues.ReHydrateDataStreams(storedMessage.DataStreamMetadata, dataStreams, cancellationToken); + return request; + } + + public async Task PrepareResponse(ResponseMessage response, CancellationToken cancellationToken) + { + var (jsonResponseMessage, dataStreams) = queueMessageSerializer.WriteMessage(response); + var dataStreamMetaData = await storeDataStreamsForDistributedQueues.StoreDataStreams(dataStreams, cancellationToken); + return new RedisStoredMessage(jsonResponseMessage, dataStreamMetaData); + } + + public async Task ReadResponse(RedisStoredMessage storedMessage, CancellationToken cancellationToken) + { + var (response, dataStreams) = queueMessageSerializer.ReadMessage(storedMessage.Message); + await storeDataStreamsForDistributedQueues.ReHydrateDataStreams(storedMessage.DataStreamMetadata, dataStreams, cancellationToken); + return response; + } + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/NodeHeartBeat/HalibutQueueNodeSendingPulses.cs b/source/Halibut/Queue/Redis/NodeHeartBeat/HalibutQueueNodeSendingPulses.cs new file mode 100644 index 000000000..9baffb7cd --- /dev/null +++ b/source/Halibut/Queue/Redis/NodeHeartBeat/HalibutQueueNodeSendingPulses.cs @@ -0,0 +1,17 @@ +#if NET8_0_OR_GREATER +using System; + +namespace Halibut.Queue.Redis.NodeHeartBeat +{ + public enum HalibutQueueNodeSendingPulses + { + // The node the RPC is executing on. + // The node that calls QueueAndWait + RequestSenderNode, + + // The node sending/receiving the Request to/from the service. + // The node that calls Dequeue and ApplyResponse. + RequestProcessorNode + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/NodeHeartBeat/NodeHeartBeatSender.cs b/source/Halibut/Queue/Redis/NodeHeartBeat/NodeHeartBeatSender.cs new file mode 100644 index 000000000..73c606880 --- /dev/null +++ b/source/Halibut/Queue/Redis/NodeHeartBeat/NodeHeartBeatSender.cs @@ -0,0 +1,80 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Util; + +namespace Halibut.Queue.Redis.NodeHeartBeat +{ + public class NodeHeartBeatSender : IAsyncDisposable + { + readonly Uri endpoint; + readonly Guid requestActivityId; + readonly IHalibutRedisTransport halibutRedisTransport; + readonly CancelOnDisposeCancellationToken cts; + readonly ILog log; + readonly HalibutQueueNodeSendingPulses nodeSendingPulsesType; + + internal Task TaskSendingPulses; + public NodeHeartBeatSender( + Uri endpoint, + Guid requestActivityId, + IHalibutRedisTransport halibutRedisTransport, + ILog log, + HalibutQueueNodeSendingPulses nodeSendingPulsesType, + TimeSpan defaultDelayBetweenPulses) + { + this.endpoint = endpoint; + this.requestActivityId = requestActivityId; + this.halibutRedisTransport = halibutRedisTransport; + this.nodeSendingPulsesType = nodeSendingPulsesType; + cts = new CancelOnDisposeCancellationToken(); + this.log = log.ForContext(); + this.log.Write(EventType.Diagnostic, "Starting NodeHeartBeatSender for {0} node, request {1}, endpoint {2}", nodeSendingPulsesType, requestActivityId, endpoint); + TaskSendingPulses = Task.Run(() => SendPulsesWhileProcessingRequest(defaultDelayBetweenPulses, cts.Token)); + } + + async Task SendPulsesWhileProcessingRequest(TimeSpan defaultDelayBetweenPulses, CancellationToken cancellationToken) + { + log.Write(EventType.Diagnostic, "Starting heartbeat pulse loop for {0} node, request {1}", nodeSendingPulsesType, requestActivityId); + + TimeSpan delayBetweenPulse; + while (!cancellationToken.IsCancellationRequested) + { + try + { + await halibutRedisTransport.SendNodeHeartBeat(endpoint, requestActivityId, nodeSendingPulsesType, cancellationToken); + delayBetweenPulse = defaultDelayBetweenPulses; + log.Write(EventType.Diagnostic, "Successfully sent heartbeat for {0} node, request {1}, next pulse in {2} seconds", nodeSendingPulsesType, requestActivityId, delayBetweenPulse.TotalSeconds); + } + catch (Exception ex) + { + if(cancellationToken.IsCancellationRequested) + { + log.Write(EventType.Diagnostic, "Heartbeat pulse loop cancelled for {0} node, request {1}", nodeSendingPulsesType, requestActivityId); + return; + } + // Send pulses more frequently when we were unable to send a pulse. + delayBetweenPulse = defaultDelayBetweenPulses / 2; + log.WriteException(EventType.Diagnostic, "Failed to send heartbeat for {0} node, request {1}, switching to panic mode with {2} second intervals", ex, nodeSendingPulsesType, requestActivityId, delayBetweenPulse.TotalSeconds); + } + + await Try.IgnoringError(async () => await Task.Delay(delayBetweenPulse, cancellationToken)); + } + + log.Write(EventType.Diagnostic, "Heartbeat pulse loop ended for {0} node, request {1}", nodeSendingPulsesType, requestActivityId); + } + + public async ValueTask DisposeAsync() + { + log.Write(EventType.Diagnostic, "Disposing NodeHeartBeatSender for {0} node, request {1}", nodeSendingPulsesType, requestActivityId); + + await Try.IgnoringError(async () => await cts.DisposeAsync()); + + log.Write(EventType.Diagnostic, "NodeHeartBeatSender disposed for {0} node, request {1}", nodeSendingPulsesType, requestActivityId); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/NodeHeartBeat/NodeHeartBeatWatcher.cs b/source/Halibut/Queue/Redis/NodeHeartBeat/NodeHeartBeatWatcher.cs new file mode 100644 index 000000000..5d552e42b --- /dev/null +++ b/source/Halibut/Queue/Redis/NodeHeartBeat/NodeHeartBeatWatcher.cs @@ -0,0 +1,174 @@ +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Transport.Protocol; +using Halibut.Util; + +namespace Halibut.Queue.Redis.NodeHeartBeat +{ + public class NodeHeartBeatWatcher + { + public static async Task WatchThatNodeProcessingTheRequestIsStillAlive( + Uri endpoint, + RequestMessage request, + RedisPendingRequest redisPending, + IHalibutRedisTransport halibutRedisTransport, + TimeSpan timeBetweenCheckingIfRequestWasCollected, + ILog log, + TimeSpan maxTimeBetweenHeartBeetsBeforeProcessingNodeIsAssumedToBeOffline, + CancellationToken watchCancellationToken) + { + log = log.ForContext(); + // Once the pending's CT has been cancelled we no longer care to keep observing + await using var cts = new CancelOnDisposeCancellationToken(watchCancellationToken, redisPending.PendingRequestCancellationToken); + try + { + await WaitForRequestToBeCollected(endpoint, request, redisPending, halibutRedisTransport, timeBetweenCheckingIfRequestWasCollected, log, cts.Token); + + return await WatchForPulsesFromNode(endpoint, request.ActivityId, halibutRedisTransport, log, maxTimeBetweenHeartBeetsBeforeProcessingNodeIsAssumedToBeOffline, HalibutQueueNodeSendingPulses.RequestProcessorNode, cts.Token); + } + catch (Exception) when (cts.Token.IsCancellationRequested) + { + return NodeWatcherResult.NoDisconnectSeen; + } + catch (Exception) + { + return NodeWatcherResult.NodeMayHaveDisconnected; + } + } + + public static async Task WatchThatNodeWhichSentTheRequestIsStillAlive( + Uri endpoint, + Guid requestActivityId, + IHalibutRedisTransport halibutRedisTransport, + ILog log, + TimeSpan maxTimeBetweenSenderHeartBeetsBeforeSenderIsAssumedToBeOffline, + CancellationToken watchCancellationToken) + { + try + { + return await WatchForPulsesFromNode(endpoint, requestActivityId, halibutRedisTransport, log, maxTimeBetweenSenderHeartBeetsBeforeSenderIsAssumedToBeOffline, HalibutQueueNodeSendingPulses.RequestSenderNode, watchCancellationToken); + } + catch (Exception) when (watchCancellationToken.IsCancellationRequested) + { + return NodeWatcherResult.NoDisconnectSeen; + } + catch (Exception) + { + return NodeWatcherResult.NodeMayHaveDisconnected; + } + } + + static async Task WatchForPulsesFromNode( + Uri endpoint, + Guid requestActivityId, + IHalibutRedisTransport halibutRedisTransport, + ILog log, + TimeSpan maxTimeBetweenHeartBeetsBeforeNodeIsAssumedToBeOffline, + HalibutQueueNodeSendingPulses watchingForPulsesFrom, + CancellationToken watchCancellationToken) + { + log.ForContext(); + log.Write(EventType.Diagnostic, "Starting to watch for pulses from {0} node, request {1}, endpoint {2}", watchingForPulsesFrom, requestActivityId, endpoint); + + DateTimeOffset? lastHeartBeat = DateTimeOffset.Now; + + try + { + // Currently we will wait until the CT is cancelled to get a subscription, + // instead it would be better if we either + // - waited for maxTimeBetweenHeartBeetsBeforeNodeIsAssumedToBeOffline to get a subscription. + // - SubscribeToNodeHeartBeatChannel returned immediately even if it doesn't have a subscription, and instead it works + // in the background to get one unless the CT is triggered, or it is disposed. + // https://whimsical.com/subscribetonodeheartbeatchannel-should-timeout-while-waiting-to--NFWwmPkE7pTBdm2PRUC8Tf + await using var subscription = await halibutRedisTransport.SubscribeToNodeHeartBeatChannel( + endpoint, + requestActivityId, + watchingForPulsesFrom, + async () => + { + await Task.CompletedTask; + lastHeartBeat = DateTimeOffset.Now; + log.Write(EventType.Diagnostic, "Received heartbeat from {0} node, request {1}", watchingForPulsesFrom, requestActivityId); + }, watchCancellationToken); + + while (!watchCancellationToken.IsCancellationRequested) + { + var timeSinceLastHeartBeat = DateTimeOffset.Now - lastHeartBeat.Value; + if (timeSinceLastHeartBeat > maxTimeBetweenHeartBeetsBeforeNodeIsAssumedToBeOffline) + { + log.Write(EventType.Diagnostic, "{0} node appears disconnected, request {1}, last heartbeat was {2} seconds ago", watchingForPulsesFrom, requestActivityId, timeSinceLastHeartBeat.TotalSeconds); + return NodeWatcherResult.NodeMayHaveDisconnected; + } + + var timeToWait = TimeSpanHelper.Min( + TimeSpan.FromSeconds(30), + maxTimeBetweenHeartBeetsBeforeNodeIsAssumedToBeOffline - timeSinceLastHeartBeat + TimeSpan.FromSeconds(1)); + + await Try.IgnoringError(async () => await Task.Delay(timeToWait, watchCancellationToken)); + } + + log.Write(EventType.Diagnostic, "{0} node watcher cancelled, request {1}", watchingForPulsesFrom, requestActivityId); + return NodeWatcherResult.NoDisconnectSeen; + } + catch (Exception ex) when (!watchCancellationToken.IsCancellationRequested) + { + log.WriteException(EventType.Diagnostic, "Error while watching {0} node, request {1}", ex, watchingForPulsesFrom, requestActivityId); + throw; + } + } + + static async Task WaitForRequestToBeCollected( + Uri endpoint, + RequestMessage request, + RedisPendingRequest redisPending, + IHalibutRedisTransport halibutRedisTransport, + TimeSpan timeBetweenCheckingIfRequestWasCollected, + ILog log, + CancellationToken cancellationToken) + { + log = log.ForContext(); + log.Write(EventType.Diagnostic, "Waiting for request {0} to be collected from queue", request.ActivityId); + + while (!cancellationToken.IsCancellationRequested) + { + await Try.IgnoringError(async () => + { + await Task.WhenAny( + Task.Delay(timeBetweenCheckingIfRequestWasCollected, cancellationToken), + redisPending.WaitForRequestToBeMarkedAsCollected(cancellationToken)); + }); + + if(cancellationToken.IsCancellationRequested) break; + + try + { + // Has something else determined the request was collected? + if (redisPending.HasRequestBeenMarkedAsCollected) + { + log.Write(EventType.Diagnostic, "Request {0} has been marked as collected", request.ActivityId); + return; + } + + // Check ourselves if the request has been collected. + var requestIsStillOnQueue = await halibutRedisTransport.IsRequestStillOnQueue(endpoint, request.ActivityId, cancellationToken); + if (!requestIsStillOnQueue) + { + log.Write(EventType.Diagnostic, "Request {0} is no longer on queue", request.ActivityId); + await redisPending.RequestHasBeenCollectedAndWillBeTransferred(); + return; + } + } + catch (Exception ex) + { + log.WriteException(EventType.Diagnostic, "Error checking if request {0} is still on queue", ex, request.ActivityId); + } + } + + log.Write(EventType.Diagnostic, "Stopped waiting for request {0} to be collected (cancelled)", request.ActivityId); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/NodeHeartBeat/NodeWatcherResult.cs b/source/Halibut/Queue/Redis/NodeHeartBeat/NodeWatcherResult.cs new file mode 100644 index 000000000..8915633ea --- /dev/null +++ b/source/Halibut/Queue/Redis/NodeHeartBeat/NodeWatcherResult.cs @@ -0,0 +1,12 @@ +#if NET8_0_OR_GREATER +using System; + +namespace Halibut.Queue.Redis.NodeHeartBeat +{ + public enum NodeWatcherResult + { + NodeMayHaveDisconnected, + NoDisconnectSeen + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/RedisDataLossDetection/IWatchForRedisLosingAllItsData.cs b/source/Halibut/Queue/Redis/RedisDataLossDetection/IWatchForRedisLosingAllItsData.cs new file mode 100644 index 000000000..d45d49bf6 --- /dev/null +++ b/source/Halibut/Queue/Redis/RedisDataLossDetection/IWatchForRedisLosingAllItsData.cs @@ -0,0 +1,18 @@ +using System; +using System.Threading; +using System.Threading.Tasks; + +namespace Halibut.Queue.Redis.RedisDataLossDetection +{ + public interface IWatchForRedisLosingAllItsData : IAsyncDisposable + { + /// + /// Returns a Cancellation token which is triggered when data loss occurs. + /// Will cause the caller to wait until we are connected to redis and so can detect data loss. + /// + /// Time to wait for this to reach a state where it can detect data loss + /// + /// A cancellation token which is triggered when data loss occurs. + Task GetTokenForDataLossDetection(TimeSpan timeToWait, CancellationToken cancellationToken); + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/RedisDataLossDetection/WatchForRedisLosingAllItsData.cs b/source/Halibut/Queue/Redis/RedisDataLossDetection/WatchForRedisLosingAllItsData.cs new file mode 100644 index 000000000..85755c3f5 --- /dev/null +++ b/source/Halibut/Queue/Redis/RedisDataLossDetection/WatchForRedisLosingAllItsData.cs @@ -0,0 +1,134 @@ +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Util; + +namespace Halibut.Queue.Redis.RedisDataLossDetection +{ + public class WatchForRedisLosingAllItsData : IWatchForRedisLosingAllItsData + { + readonly RedisFacade redisFacade; + readonly ILog log; + + /// + /// If we are yet to contact redis to watch it for data lose, this is the delay + /// between errors used when retrying to connect to redis. + /// + internal TimeSpan SetupErrorBackoffDelay { get;} + + /// + /// The amount of time between checks to check if redis has had data lose. + /// + internal TimeSpan DataLossCheckInterval { get; } + + /// + /// The TTL of the key used for data lose detection. The TTL is reset + /// each time we check for data lose. This exists so that the data is + /// eventually removed from redis. + /// + internal TimeSpan DataLostKeyTtl { get; } + + CancelOnDisposeCancellationToken cts = new(); + + public WatchForRedisLosingAllItsData(RedisFacade redisFacade, ILog log, TimeSpan? setupDelay = null, TimeSpan? watchInterval = null, TimeSpan? keyTTL = null) + { + this.redisFacade = redisFacade; + this.log = log; + this.SetupErrorBackoffDelay = setupDelay ?? TimeSpan.FromSeconds(1); + this.DataLossCheckInterval = watchInterval ?? TimeSpan.FromSeconds(60); + this.DataLostKeyTtl = keyTTL ?? TimeSpan.FromHours(8); + var _ = Task.Run(async () => await KeepWatchingForDataLoss(cts.Token)); + } + + private TaskCompletionSource taskCompletionSource = new TaskCompletionSource(); + + /// + /// Will cause the caller to wait until we are connected to redis and so can detect datalose. + /// + /// Time to wait for this to reach a state where it can detect datalose + /// + /// A cancellation token which is triggered when data lose occurs. + public async Task GetTokenForDataLossDetection(TimeSpan timeToWait, CancellationToken cancellationToken) + { + if (taskCompletionSource.Task.IsCompleted) + { + return await taskCompletionSource.Task; + } + + await using var cts = new CancelOnDisposeCancellationToken(cancellationToken); + cts.CancelAfter(timeToWait); + return await taskCompletionSource.Task.WaitAsync(cts.Token); + } + + private async Task KeepWatchingForDataLoss(CancellationToken cancellationToken) + { + while (!cancellationToken.IsCancellationRequested) + { + await Try.IgnoringError(async () => await WatchForDataLoss(cancellationToken)); + } + } + + async Task WatchForDataLoss(CancellationToken cancellationToken) + { + string guid = Guid.NewGuid().ToString(); + var key = "WatchForDataLoss::" + guid; + var hasSetKey = false; + + log.Write(EventType.Diagnostic, "Starting Redis data loss monitoring with key {0}", key); + + await using var cts = new CancelOnDisposeCancellationToken(); + while (!cancellationToken.IsCancellationRequested) + { + try + { + if (!hasSetKey) + { + log.Write(EventType.Diagnostic, "Setting initial data loss monitoring key {0} with TTL {1} minutes", key, DataLostKeyTtl.TotalMinutes); + await redisFacade.SetString(key, guid, DataLostKeyTtl, cancellationToken); + taskCompletionSource.TrySetResult(cts.Token); + hasSetKey = true; + log.Write(EventType.Diagnostic, "Successfully set initial data loss monitoring key {0}, monitoring is now active", key); + } + else + { + var data = await redisFacade.GetString(key, cancellationToken); + if (data != guid) + { + log.Write(EventType.Error, "Redis data loss detected! Expected value {0} for key {1}, but got {2}. This indicates Redis has lost data.", guid, key, data ?? "null"); + // Anyone new will be given a new thing to wait on. + taskCompletionSource = new TaskCompletionSource(); + await Try.IgnoringError(async () => await cts.CancelAsync()); + return; + } + + await redisFacade.SetTtlForString(key, DataLostKeyTtl, cancellationToken); + } + } + catch (Exception ex) + { + log.Write(EventType.Diagnostic, "Error occurred during Redis data loss monitoring for key {0}: {1}. Will retry after delay.", key, ex.Message); + } + + await Try.IgnoringError(async () => + { + if (!hasSetKey) await Task.Delay(SetupErrorBackoffDelay, cancellationToken); + else await Task.Delay(DataLossCheckInterval, cancellationToken); + }); + + } + + log.Write(EventType.Diagnostic, "Redis data loss monitoring stopped for key {0}, cleaning up", key); + await Try.IgnoringError(async () => await redisFacade.DeleteString(key, cancellationToken)); + } + + public async ValueTask DisposeAsync() + { + log.Write(EventType.Diagnostic, "Disposing WatchForRedisLosingAllItsData"); + await cts.DisposeAsync(); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/RedisHelpers/HalibutRedisTransport.cs b/source/Halibut/Queue/Redis/RedisHelpers/HalibutRedisTransport.cs new file mode 100644 index 000000000..b324d6ce4 --- /dev/null +++ b/source/Halibut/Queue/Redis/RedisHelpers/HalibutRedisTransport.cs @@ -0,0 +1,307 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Collections.Generic; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Queue.Redis.NodeHeartBeat; +using Halibut.Util; +using Microsoft.VisualBasic.CompilerServices; +using StackExchange.Redis; + +namespace Halibut.Queue.Redis.RedisHelpers +{ + public class HalibutRedisTransport : IHalibutRedisTransport + { + const string Namespace = "octopus::server::halibut"; + + readonly RedisFacade facade; + + public HalibutRedisTransport(RedisFacade facade) + { + this.facade = facade; + } + + // Request pulse channel. + // Polling services will be notified of new request via this channel. + // The Service will subscribe to the channel, while the client will publish (pulse) + // the channel when a request is available. + static string RequestMessagesPulseChannelName(Uri endpoint) + { + return $"{Namespace}::RequestMessagesPulseChannel::{endpoint}"; + } + + public async Task SubscribeToRequestMessagePulseChannel(Uri endpoint, Action onRequestMessagePulse, CancellationToken cancellationToken) + { + var channelName = RequestMessagesPulseChannelName(endpoint); + return await facade.SubscribeToChannel(channelName, async message => + { + await Task.CompletedTask; + onRequestMessagePulse(message); + }, + cancellationToken); + } + + public async Task PulseRequestPushedToEndpoint(Uri endpoint, CancellationToken cancellationToken) + { + var channelName = RequestMessagesPulseChannelName(endpoint); + string emptyJson = "{}"; // Maybe we will actually want to share data in the future, empty json means we can add stuff later. + await facade.PublishToChannel(channelName, emptyJson, cancellationToken); + } + + // Pending Request IDs list + // A list in redis holding the set of available Pending Requests a Service can collect. + // The Service will Pop the Ids while the Client will Push new Pending Request Ids to the list. + + static string PendingRequestGuidsQueueKey(Uri endpoint) + { + return $"{Namespace}::PendingRequestGuidsQueue::{endpoint}"; + } + + public async Task PushRequestGuidOnToQueue(Uri endpoint, Guid guid, CancellationToken cancellationToken) + { + // TTL is high since it applies to all GUIDs in the queue. + var ttlForAllRequestsGuidsInList = TimeSpan.FromDays(1); + await facade.ListRightPushAsync(PendingRequestGuidsQueueKey(endpoint), guid.ToString(), ttlForAllRequestsGuidsInList, cancellationToken); + } + + public async Task TryPopNextRequestGuid(Uri endpoint, CancellationToken cancellationToken) + { + var result = await facade.ListLeftPopAsync(PendingRequestGuidsQueueKey(endpoint), cancellationToken); + return result.ToGuid(); + } + + // Pending Request Message + // Stores the Pending Request Message for collection by the service. + // Note that the service will first need to TryPopNextRequestGuid to be able to + // find the RequestMessage. + + static string RequestMessageKey(Uri endpoint, Guid requestId) + { + return $"{Namespace}::RequestMessage::{endpoint}::{requestId}"; + } + + /// + /// The amount of time on top of the requestPickupTimout, the request will stay on the queue + /// before being automatically picked up. + /// The theory being we might need some grace period where it takes some time to collect + /// the request. It is not clear if we need this. This will be addressed in: + /// https://whimsical.com/under-some-circumstances-old-requests-can-still-be-sent-to-tenta-79CoT5PpvE1n5wApB6e2Zx + /// + static readonly TimeSpan AdditionalRequestMessageTtl = TimeSpan.FromMinutes(2); + + public async Task PutRequest(Uri endpoint, Guid requestId, RedisStoredMessage requestMessage, TimeSpan requestPickupTimeout, CancellationToken cancellationToken) + { + var requestKey = RequestMessageKey(endpoint, requestId); + + var ttl = requestPickupTimeout + AdditionalRequestMessageTtl; + + var dict = RedisStoredMessageToDictionary(requestMessage); + + await facade.SetInHash(requestKey, dict, ttl, cancellationToken); + } + + /// + /// Atomically Gets and removes the request from the queue. + /// Exactly up to one caller of this method will be given the RequestMessage, all + /// other calls will get null. + /// Note: currently a minor issue exists where redis disconnecting mid "Delete" call + /// can result in the Delete succeeding but no caller know if it succeeded. Thus, + /// it might be possible that no one Gets the request. In this case normal heart beat + /// timeouts will cause the request to be failed. + /// + /// + /// + /// + /// + public async Task TryGetAndRemoveRequest(Uri endpoint, Guid requestId, CancellationToken cancellationToken) + { + var requestKey = RequestMessageKey(endpoint, requestId); + var dit = await facade.TryGetAndDeleteFromHash(requestKey, RedisStoredMessageHashFields, cancellationToken); + return DictionaryToRedisStoredMessage(dit); + } + + public async Task IsRequestStillOnQueue(Uri endpoint, Guid requestId, CancellationToken cancellationToken) + { + var requestKey = RequestMessageKey(endpoint, requestId); + return await facade.HashContainsKey(requestKey, RequestMessageField, cancellationToken); + } + + // Cancellation channel + // The node processing the request will subscribe to this channel, and the node + // sending the request will publish to this channel when the RPC has been cancelled. + static string RequestCancelledChannelName(Uri endpoint, Guid requestId) + { + return $"{Namespace}::RequestCancelledChannel::{endpoint}::{requestId}"; + } + + /// + /// + /// + /// + /// + /// Called when the RPC has been cancelled. + /// + /// + public async Task SubscribeToRequestCancellation(Uri endpoint, Guid requestId, Func onRpcCancellation, CancellationToken cancellationToken) + { + var channelName = RequestCancelledChannelName(endpoint, requestId); + return await facade.SubscribeToChannel(channelName, async foo => + { + string? response = foo.Message; + if (response is not null) await onRpcCancellation(); + }, cancellationToken); + } + + public async Task PublishCancellation(Uri endpoint, Guid requestId, CancellationToken cancellationToken) + { + var channelName = RequestCancelledChannelName(endpoint, requestId); + await facade.PublishToChannel(channelName, "{}", cancellationToken); + } + + // Request cancellation + // Since pub/sub does not have guaranteed delivery, cancellation can also + // be detected by the RequestCancelledMarker. The node processing the request + // will poll for the existence of the RequestCancelledMarker, and if found + // it knows the RPC has been cancelled. + public string RequestCancelledMarkerKey(Uri endpoint, Guid requestId) + { + return $"{Namespace}::RequestCancelledMarker::{endpoint}::{requestId}"; + } + + public async Task MarkRequestAsCancelled(Uri endpoint, Guid requestId, TimeSpan ttl, CancellationToken cancellationToken) + { + var key = RequestCancelledMarkerKey(endpoint, requestId); + await facade.SetString(key, "{}", ttl, cancellationToken); + } + + public async Task IsRequestMarkedAsCancelled(Uri endpoint, Guid requestId, CancellationToken cancellationToken) + { + var key = RequestCancelledMarkerKey(endpoint, requestId); + return (await facade.GetString(key, cancellationToken)) != null; + } + + // Node heartbeat channels (per request). + // Each unique request has two node heart beat channels. + // One channel for the `RequestSenderNode` where the node that executes the RPC, + // publishes heart beats, for the duration of the time it is waiting for the RPC + // to be executed. + // Another channel for the `RequestProcessorNode` where the node that is sending the + // request to the service (e.g. Tentacle) is publishing heart beats, for the duration + // of processing the request. + // Both nodes are able to monitor the heart beat channel of the other node to detect + // if the other node has gone offline. + + static string NodeHeartBeatChannel(Uri endpoint, Guid requestId, HalibutQueueNodeSendingPulses nodeSendingPulsesType) + { + return $"{Namespace}::NodeHeartBeatChannel::{endpoint}::{requestId}::{nodeSendingPulsesType}"; + } + + public async Task SubscribeToNodeHeartBeatChannel( + Uri endpoint, + Guid requestId, + HalibutQueueNodeSendingPulses nodeSendingPulsesType, + Func onHeartBeat, + CancellationToken cancellationToken) + { + var channelName = NodeHeartBeatChannel(endpoint, requestId, nodeSendingPulsesType); + return await facade.SubscribeToChannel(channelName, async foo => + { + string? response = foo.Message; + if (response is not null) await onHeartBeat(); + }, cancellationToken); + } + + public async Task SendNodeHeartBeat(Uri endpoint, Guid requestId, HalibutQueueNodeSendingPulses nodeSendingPulsesType, CancellationToken cancellationToken) + { + var channelName = NodeHeartBeatChannel(endpoint, requestId, nodeSendingPulsesType); + await facade.PublishToChannel(channelName, "{}", cancellationToken); + } + + // Response channel. + // The node processing the request `RequestProcessorNode` will publish to this channel + // once the Response is available. + + static string ResponseChannelName(Uri endpoint, Guid identifier) + { + return $"{Namespace}::ResponseAvailableChannel::{endpoint}::{identifier}"; + } + + public async Task SubscribeToResponseChannel( + Uri endpoint, + Guid identifier, + Func onValueReceived, + CancellationToken cancellationToken) + { + var channelName = ResponseChannelName(endpoint, identifier); + return await facade.SubscribeToChannel(channelName, async foo => + { + string? value = foo.Message; + if (value is not null) await onValueReceived(value); + }, cancellationToken); + } + + public async Task PublishThatResponseIsAvailable(Uri endpoint, Guid identifier, CancellationToken cancellationToken) + { + var channelName = ResponseChannelName(endpoint, identifier); + await facade.PublishToChannel(channelName, "{}", cancellationToken); + } + + // Response + // This is where the Response is placed in Redis. + + static string ResponseMessageKey(Uri endpoint, Guid identifier) + { + return $"{Namespace}::Response::{endpoint}::{identifier}"; + } + + public async Task SetResponseMessage(Uri endpoint, Guid identifier, RedisStoredMessage responseMessage, TimeSpan ttl, CancellationToken cancellationToken) + { + var key = ResponseMessageKey(endpoint, identifier); + var dict = RedisStoredMessageToDictionary(responseMessage); + await facade.SetInHash(key, dict, ttl, cancellationToken); + } + + public async Task GetResponseMessage(Uri endpoint, Guid identifier, CancellationToken cancellationToken) + { + var key = ResponseMessageKey(endpoint, identifier); + var dict = await facade.TryGetFromHash(key, RedisStoredMessageHashFields, cancellationToken); + return DictionaryToRedisStoredMessage(dict); + } + + public async Task DeleteResponseMessage(Uri endpoint, Guid identifier, CancellationToken cancellationToken) + { + var key = ResponseMessageKey(endpoint, identifier); + await facade.DeleteHash(key, cancellationToken); + } + + static readonly string RequestMessageField = "RequestMessageField"; + static readonly string DataStreamMetaDataField = "DataStreamMetaDataField"; + static string[] RedisStoredMessageHashFields => new[] { RequestMessageField, DataStreamMetaDataField }; + + static RedisStoredMessage? DictionaryToRedisStoredMessage(Dictionary? dit) + { + if(dit == null) return null; + var requestMessage = dit[RequestMessageField]!; + + // As it turns out Redis or our client seems to treat "" as null, which is insane + // and results in us needing to deal with that here. + var dataStreamMetaData = ""; + if(dit.TryGetValue(DataStreamMetaDataField, out var dataStreamMetaDataFromRedis)) + { + dataStreamMetaData = dataStreamMetaDataFromRedis ?? ""; + } + + return new RedisStoredMessage(requestMessage, dataStreamMetaData); + } + + static Dictionary RedisStoredMessageToDictionary(RedisStoredMessage requestMessage) + { + var dict = new Dictionary(); + dict[RequestMessageField] = requestMessage.Message; + dict[DataStreamMetaDataField] = requestMessage.DataStreamMetadata; + return dict; + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/RedisHelpers/IHalibutRedisTransport.cs b/source/Halibut/Queue/Redis/RedisHelpers/IHalibutRedisTransport.cs new file mode 100644 index 000000000..30fb4b771 --- /dev/null +++ b/source/Halibut/Queue/Redis/RedisHelpers/IHalibutRedisTransport.cs @@ -0,0 +1,60 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Queue.Redis.NodeHeartBeat; +using Halibut.Queue.Redis.RedisHelpers; +using StackExchange.Redis; + +namespace Halibut.Queue.Redis +{ + public interface IHalibutRedisTransport + { + Task SubscribeToRequestMessagePulseChannel(Uri endpoint, Action onRequestMessagePulse, CancellationToken cancellationToken); + Task PulseRequestPushedToEndpoint(Uri endpoint, CancellationToken cancellationToken); + + + Task PushRequestGuidOnToQueue(Uri endpoint, Guid guid, CancellationToken cancellationToken); + Task TryPopNextRequestGuid(Uri endpoint, CancellationToken cancellationToken); + + + Task PutRequest(Uri endpoint, Guid requestId, RedisStoredMessage requestMessage, TimeSpan requestPickupTimeout, CancellationToken cancellationToken); + Task TryGetAndRemoveRequest(Uri endpoint, Guid requestId, CancellationToken cancellationToken); + Task IsRequestStillOnQueue(Uri endpoint, Guid requestId, CancellationToken cancellationToken); + + Task SubscribeToRequestCancellation( + Uri endpoint, + Guid requestId, + Func onRpcCancellation, + CancellationToken cancellationToken); + Task PublishCancellation(Uri endpoint, Guid requestId, CancellationToken cancellationToken); + + Task MarkRequestAsCancelled(Uri endpoint, Guid requestId, TimeSpan ttl, CancellationToken cancellationToken); + Task IsRequestMarkedAsCancelled(Uri endpoint, Guid requestId, CancellationToken cancellationToken); + + + Task SubscribeToNodeHeartBeatChannel( + Uri endpoint, + Guid requestId, + HalibutQueueNodeSendingPulses nodeSendingPulsesType, + Func onHeartBeat, + CancellationToken cancellationToken); + + Task SendNodeHeartBeat(Uri endpoint, Guid requestId, HalibutQueueNodeSendingPulses nodeSendingPulsesType, CancellationToken cancellationToken); + + + Task SubscribeToResponseChannel( + Uri endpoint, + Guid identifier, + Func onValueReceived, + CancellationToken cancellationToken); + Task PublishThatResponseIsAvailable(Uri endpoint, Guid identifier, CancellationToken cancellationToken); + + + Task SetResponseMessage(Uri endpoint, Guid identifier, RedisStoredMessage responseMessage, TimeSpan ttl, CancellationToken cancellationToken); + Task GetResponseMessage(Uri endpoint, Guid identifier, CancellationToken cancellationToken); + Task DeleteResponseMessage(Uri endpoint, Guid identifier, CancellationToken cancellationToken); + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/RedisHelpers/RedisFacade.cs b/source/Halibut/Queue/Redis/RedisHelpers/RedisFacade.cs new file mode 100644 index 000000000..89617d018 --- /dev/null +++ b/source/Halibut/Queue/Redis/RedisHelpers/RedisFacade.cs @@ -0,0 +1,381 @@ +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.Linq; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Util; +using StackExchange.Redis; + +namespace Halibut.Queue.Redis.RedisHelpers +{ + public class RedisFacade : IAsyncDisposable + { + readonly Lazy connection; + readonly ILog log; + // We can survive redis being unavailable for this amount of time. + // Generally redis will try for 5s, we add our own retries to try for longer. + internal TimeSpan MaxDurationToRetryFor = TimeSpan.FromSeconds(30); + + ConnectionMultiplexer Connection => connection.Value; + + /// + /// All Keys will be prefixed with this, this allows for multiple halibuts to use + /// the same redis without interfering with each other. + /// + readonly string keyPrefix; + + readonly CancelOnDisposeCancellationToken objectLifetimeCts; + readonly CancellationToken objectLifeTimeCancellationToken; + + public RedisFacade(string configuration, string keyPrefix, ILog log) : this(ConfigurationOptions.Parse(configuration), keyPrefix, log) + { + + } + public RedisFacade(ConfigurationOptions redisOptions, string keyPrefix, ILog log) + { + this.keyPrefix = keyPrefix; + this.log = log.ForContext(); + objectLifetimeCts = new CancelOnDisposeCancellationToken(); + objectLifeTimeCancellationToken = objectLifetimeCts.Token; + + // Tells the client to make multiple attempts to create the TCP connection to redis. + redisOptions.AbortOnConnectFail = false; + + connection = new Lazy(() => + { + var multiplexer = ConnectionMultiplexer.Connect(redisOptions); + + // Subscribe to connection events + multiplexer.ConnectionFailed += OnConnectionFailed; + multiplexer.ConnectionRestored += OnConnectionRestored; + multiplexer.ErrorMessage += OnErrorMessage; + + return multiplexer; + }); + } + + void OnConnectionFailed(object? sender, ConnectionFailedEventArgs e) + { + log.Write(EventType.Error, "Redis connection failed - EndPoint: {0}, Failure: {1}, Exception: {2}", e.EndPoint, e.FailureType, e.Exception?.Message); + } + + void OnErrorMessage(object? sender, RedisErrorEventArgs e) + { + log.Write(EventType.Error, "Redis error - EndPoint: {0}, Message: {1}", e.EndPoint, e.Message); + } + + void OnConnectionRestored(object? sender, ConnectionFailedEventArgs e) + { + log.Write(EventType.Diagnostic, "Redis connection restored - EndPoint: {0}", e.EndPoint); + } + + async Task ExecuteWithRetry(Func> operation, CancellationToken cancellationToken) + { + await using var linkedTokenSource = new CancelOnDisposeCancellationToken(cancellationToken, objectLifeTimeCancellationToken); + var combinedToken = linkedTokenSource.Token; + + var retryDelay = TimeSpan.FromSeconds(1); + var stopwatch = Stopwatch.StartNew(); + + while (true) + { + combinedToken.ThrowIfCancellationRequested(); + + try + { + return await operation(); + } + catch (Exception ex) when (stopwatch.Elapsed < MaxDurationToRetryFor && !combinedToken.IsCancellationRequested) + { + log?.Write(EventType.Diagnostic, $"Redis operation failed, retrying in {retryDelay.TotalSeconds}s: {ex.Message}"); + await Task.Delay(retryDelay, combinedToken); + } + } + } + + async Task ExecuteWithRetry(Func operation, CancellationToken cancellationToken) + { + await using var linkedTokenSource = new CancelOnDisposeCancellationToken(cancellationToken, objectLifeTimeCancellationToken); + var combinedToken = linkedTokenSource.Token; + + var retryDelay = TimeSpan.FromSeconds(1); + var stopwatch = Stopwatch.StartNew(); + + while (true) + { + combinedToken.ThrowIfCancellationRequested(); + + try + { + await operation(); + return; + } + catch (Exception ex) when (stopwatch.Elapsed < MaxDurationToRetryFor && !combinedToken.IsCancellationRequested) + { + log?.Write(EventType.Diagnostic, $"Redis operation failed, retrying in {retryDelay.TotalSeconds}s: {ex.Message}"); + await Task.Delay(retryDelay, combinedToken); + } + } + } + + public bool IsConnected => connection.IsValueCreated && Connection.IsConnected; + + public async ValueTask DisposeAsync() + { + await Try.IgnoringError(async () => await objectLifetimeCts.DisposeAsync()); + + if (connection.IsValueCreated) + { + var conn = connection.Value; + + Try.IgnoringError(() => + { + // Unsubscribe from events before disposing + conn.ConnectionFailed -= OnConnectionFailed; + conn.ConnectionRestored -= OnConnectionRestored; + conn.ErrorMessage -= OnErrorMessage; + }); + + await Try.IgnoringError(async () => await conn.DisposeAsync()); + } + } + + + internal int TotalSubscribers = 0; + + public async Task SubscribeToChannel(string channelName, Func onMessage, CancellationToken cancellationToken) + { + channelName = ToPrefixedChannelName(channelName); + while (true) + { + cancellationToken.ThrowIfCancellationRequested(); + try + { + // This can throw if we are unable to connect to redis. + var channel = await Connection.GetSubscriber() + .SubscribeAsync(new RedisChannel(channelName, RedisChannel.PatternMode.Literal)); + + var disposable = new FuncAsyncDisposable(async () => + { + Interlocked.Decrement(ref TotalSubscribers); + await Try.IgnoringError(async () => await channel.UnsubscribeAsync()); + }); + + Interlocked.Increment(ref TotalSubscribers); + try + { + // Once we are connected to redis, it seems even if the connection to redis dies. + // The client will take care of re-connecting to redis. + channel.OnMessage(onMessage); + } + catch (Exception) + { + await disposable.DisposeAsync(); + throw; + } + + return disposable; + } + catch (Exception ex) + { + log?.WriteException(EventType.Diagnostic, "Failed to subscribe to Redis channel {0}, retrying in 2 seconds", ex, channelName); + await Try.IgnoringError(async () => await Task.Delay(2000, cancellationToken)); + } + } + } + + string ToPrefixedChannelName(string channelName) + { + return "channel:" + keyPrefix + ":" + channelName; + } + + public async Task PublishToChannel(string channelName, string payload, CancellationToken cancellationToken) + { + channelName = ToPrefixedChannelName(channelName); + await ExecuteWithRetry(async () => + { + var subscriber = Connection.GetSubscriber(); + await subscriber.PublishAsync(new RedisChannel(channelName, RedisChannel.PatternMode.Literal), payload); + }, cancellationToken); + } + + public async Task SetInHash(string key, Dictionary values, TimeSpan ttl, CancellationToken cancellationToken) + { + var hashKey = ToHashKey(key); + + var hashEntries = values.Select(v => new HashEntry(v.Key, v.Value)).ToArray(); + + await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + await database.HashSetAsync(hashKey, hashEntries); + }, cancellationToken); + + await SetTtlForKeyRaw(hashKey, ttl, cancellationToken); + } + + RedisKey ToHashKey(string key) + { + return "hash:" + keyPrefix + ":" + key; + } + + public async Task HashContainsKey(string key, string field, CancellationToken cancellationToken) + { + var hashKey = ToHashKey(key); + return await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + return await database.HashExistsAsync(hashKey, new RedisValue(field)); + }, cancellationToken); + } + + public async Task?> TryGetAndDeleteFromHash(string key, string[] fields, CancellationToken cancellationToken) + { + var hashKey = ToHashKey(key); + + Dictionary? dict = await RawKeyReadHashFieldsToDictionary(hashKey, fields, cancellationToken); + + // Retry does make this non-idempotent, what can happen is the key is deleted on redis. + // But we do not get a response saying it is deleted. We try again and get told + // it is already deleted. + // In the Redis Queue this can result in no-body picking up the Request, and the + // request eventually timing out. + var res = await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + return await database.KeyDeleteAsync(hashKey); + }, cancellationToken); + + if (!res) + { + // Someone else deleted this, so return nothing to make the get and delete appear to be atomic. + return null; + } + return dict; + } + + public async Task?> TryGetFromHash(string key, string[] fields, CancellationToken cancellationToken) + { + var hashKey = ToHashKey(key); + + return await RawKeyReadHashFieldsToDictionary(hashKey, fields, cancellationToken); + } + + + public async Task DeleteHash(string key, CancellationToken cancellationToken) + { + var hashKey = ToHashKey(key); + + await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + return await database.KeyDeleteAsync(hashKey); + }, cancellationToken); + } + + async Task?> RawKeyReadHashFieldsToDictionary(RedisKey hashKey, string[] fields, CancellationToken cancellationToken) + { + var dict = new Dictionary(); + foreach (var field in fields) + { + // Retry each operation independently + var value = await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + return await database.HashGetAsync(hashKey, new RedisValue(field)); + }, cancellationToken); + if(value.HasValue) dict[field] = value; + } + + if (dict.Count == 0) return null; + + return dict; + } + + RedisKey ToListKey(string key) + { + return "list:" + keyPrefix + ":" + key; + } + + public async Task ListRightPushAsync(string key, string payload, TimeSpan ttlForAllInList, CancellationToken cancellationToken) + { + var listKey = ToListKey(key); + await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + await database.ListRightPushAsync(listKey, payload); + }, cancellationToken); + + await SetTtlForKeyRaw(listKey, ttlForAllInList, cancellationToken); + } + + public async Task ListLeftPopAsync(string key, CancellationToken cancellationToken) + { + var listKey = ToListKey(key); + return await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + var value = await database.ListLeftPopAsync(listKey); + if (value.IsNull) + { + return null; + } + + return value; + }, cancellationToken); + } + + RedisKey ToStringKey(string key) + { + return "string:" + keyPrefix + ":" + key; + } + + public async Task SetString(string key, string value, TimeSpan ttl, CancellationToken cancellationToken) + { + var stringKey = ToStringKey(key); + await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + await database.StringSetAsync(stringKey, value); + }, cancellationToken); + + await SetTtlForKeyRaw(stringKey, ttl, cancellationToken); + } + + public async Task SetTtlForString(string key, TimeSpan ttl, CancellationToken cancellationToken) + { + await SetTtlForKeyRaw(ToStringKey(key), ttl, cancellationToken); + } + + public async Task GetString(string key, CancellationToken cancellationToken) + { + var stringKey = ToStringKey(key); + return await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + return await database.StringGetAsync(stringKey); + }, cancellationToken); + } + + public async Task DeleteString(string key, CancellationToken cancellationToken) + { + var stringKey = ToStringKey(key); + return await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + return await database.KeyDeleteAsync(stringKey); + }, cancellationToken); + } + + async Task SetTtlForKeyRaw(RedisKey key, TimeSpan ttl, CancellationToken cancellationToken) + { + await ExecuteWithRetry(async () => + { + var database = Connection.GetDatabase(); + await database.KeyExpireAsync(key, ttl); + }, cancellationToken); + } + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/RedisHelpers/RedisStoredMessage.cs b/source/Halibut/Queue/Redis/RedisHelpers/RedisStoredMessage.cs new file mode 100644 index 000000000..059aa6a12 --- /dev/null +++ b/source/Halibut/Queue/Redis/RedisHelpers/RedisStoredMessage.cs @@ -0,0 +1,15 @@ +namespace Halibut.Queue.Redis.RedisHelpers +{ + public class RedisStoredMessage + { + public RedisStoredMessage(string message, string dataStreamMetadata) + { + Message = message; + DataStreamMetadata = dataStreamMetadata; + } + + public string Message { get; } + + public string DataStreamMetadata { get; } + } +} \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/RedisPendingRequest.cs b/source/Halibut/Queue/Redis/RedisPendingRequest.cs new file mode 100644 index 000000000..ff3cf2b27 --- /dev/null +++ b/source/Halibut/Queue/Redis/RedisPendingRequest.cs @@ -0,0 +1,240 @@ +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Exceptions; +using Halibut.ServiceModel; +using Halibut.Transport; +using Halibut.Transport.Protocol; +using Halibut.Util; +using Nito.AsyncEx; + +namespace Halibut.Queue.Redis +{ + public class RedisPendingRequest : IDisposable + { + readonly RequestMessage request; + readonly ILog log; + readonly AsyncManualResetEvent responseWaiter = new(false); + readonly SemaphoreSlim transferLock = new(1, 1); + readonly AsyncManualResetEvent requestCollected = new(false); + readonly CancellationTokenSource pendingRequestCancellationTokenSource; + ResponseMessage? response; + + public RedisPendingRequest(RequestMessage request, ILog log) + { + this.request = request; + this.log = log.ForContext(); + + pendingRequestCancellationTokenSource = new CancellationTokenSource(); + PendingRequestCancellationToken = pendingRequestCancellationTokenSource.Token; + } + + public Task WaitForRequestToBeMarkedAsCollected(CancellationToken cancellationToken) => requestCollected.WaitAsync(cancellationToken); + + public bool HasRequestBeenMarkedAsCollected => requestCollected.IsSet; + + public RequestMessage Request => request; + + /// + /// + /// + /// + /// This will be called either when the pick-up timeout has elapsed OR if the Cancellation Token has been triggered. + /// This gives the user an opportunity to remove the pending request from shared places and optionally + /// call BeginTransfer + /// + /// Should the cancellationToken be triggered, this allows for overriding + /// the reason the cancellation token was triggered. The returned error will be thrown. + /// + public async Task WaitUntilComplete(Func checkIfPendingRequestWasCollectedOrRemoveIt, + Func overrideCancellationReason, + CancellationToken cancellationToken) + { + log.Write(EventType.MessageExchange, "Request {0} was queued", request); + + var pendingRequestPickupTimeout = Try.IgnoringError(async () => await Task.Delay(request.Destination.PollingRequestQueueTimeout, cancellationToken)); + var responseWaiterTask = responseWaiter.WaitAsync(cancellationToken); + + await Task.WhenAny(pendingRequestPickupTimeout, responseWaiterTask); + + // Response has been returned so just say we are done. + if (responseWaiter.IsSet) + { + log.Write(EventType.MessageExchange, "Request {0} was collected by the polling endpoint", request); + return; + } + + if (!requestCollected.IsSet) + { + await checkIfPendingRequestWasCollectedOrRemoveIt(); + } + + using (await transferLock.LockAsync(CancellationToken.None)) + { + if (responseWaiter.IsSet) + { + log.Write(EventType.MessageExchange, "Request {0} was collected by the polling endpoint", request); + return; + } + + if (cancellationToken.IsCancellationRequested) + { + await Try.IgnoringError(async () => await pendingRequestCancellationTokenSource.CancelAsync()); + + var cancellationException = overrideCancellationReason(); + if (cancellationException != null) + { + log.Write(EventType.MessageExchange, "Request {0} did not complete because: " + cancellationException.Message, request); + throw cancellationException; + } + + OperationCanceledException operationCanceledException; + if (!requestCollected.IsSet) + { + operationCanceledException = CreateExceptionForRequestWasCancelledBeforeCollected(request, log); + } + else + { + log.Write(EventType.MessageExchange, "Request {0} was collected by the polling endpoint, will try to cancel the request", request); + operationCanceledException = new OperationCanceledException($"Request {request} was collected by the polling endpoint, will try to cancel the request"); + } + + throw requestCollected.IsSet + ? new TransferringRequestCancelledException(operationCanceledException) + : new ConnectingRequestCancelledException(operationCanceledException); + } + + if (!requestCollected.IsSet) + { + // Request was not collected within the pickup time. + // Prevent anyone from processing the request further. + await Try.IgnoringError(async () => await pendingRequestCancellationTokenSource.CancelAsync()); + + log.Write(EventType.MessageExchange, "Request {0} timed out before it could be collected by the polling endpoint", request); + SetResponseNoLock(ResponseMessage.FromException( + request, + new TimeoutException($"A request was sent to a polling endpoint, but the polling endpoint did not collect the request within the allowed time ({request.Destination.PollingRequestQueueTimeout}), so the request timed out."), + ConnectionState.Connecting), + requestWasCollected: false); + return; + } + } + + // The request has been collected so now wait patiently for a response + log.Write(EventType.MessageExchange, "Request {0} was eventually collected by the polling endpoint", request); + try + { + await responseWaiterTask; + } + catch (Exception) when (cancellationToken.IsCancellationRequested) + { + using (await transferLock.LockAsync(CancellationToken.None)) + { + if (!responseWaiter.IsSet) + { + var cancellationException = overrideCancellationReason(); + if (cancellationException != null) + { + await Try.IgnoringError(async () => await pendingRequestCancellationTokenSource.CancelAsync()); + log.Write(EventType.MessageExchange, "Request {0} did not complete because: " + cancellationException.Message, request); + throw cancellationException; + } + + log.Write(EventType.MessageExchange, "Request {0} was cancelled before a response was received", request); + SetResponseNoLock(ResponseMessage.FromException( + request, + new TimeoutException("A request was sent to a polling endpoint, the polling endpoint collected it but the request was cancelled before the polling endpoint responded."), + ConnectionState.Connecting), + requestWasCollected: false); + await Try.IgnoringError(async () => await pendingRequestCancellationTokenSource.CancelAsync()); + } + } + } + catch (Exception) + { + // This should never happen. + log.Write(EventType.MessageExchange, "Request {0} had an internal error, unexpectedly stopped waiting for the response.", request); + await SetResponseAsync(ResponseMessage.FromException( + request, + new PendingRequestQueueInternalException($"Request {request.Id} had an internal error, unexpectedly stopped waiting for the response.")), + requestWasCollected: false); + } + } + + public static OperationCanceledException CreateExceptionForRequestWasCancelledBeforeCollected(RequestMessage request, ILog log) + { + log.Write(EventType.MessageExchange, "Request {0} was cancelled before it could be collected by the polling endpoint", request); + return new OperationCanceledException($"Request {request} was cancelled before it could be collected by the polling endpoint"); + } + + public async Task RequestHasBeenCollectedAndWillBeTransferred() + { + // The PendingRequest is Disposed at the end of QueueAndWaitAsync but a race condition + // exists in the current approach that means DequeueAsync could pick this request up after + // it has been disposed. At that point we are no longer interested in the PendingRequest so + // this is "ok" and wrapping BeginTransfer in a try..catch.. ensures we don't error if the + // race condition occurs and also stops the polling tentacle dequeuing the request successfully. + try + { + using (await transferLock.LockAsync(CancellationToken.None)) + { + // Check if the request has already been completed or if the request has been cancelled + // to ensure we don't dequeue an already completed or already cancelled request + + var requestHasBeenCollected = this.requestCollected.IsSet; + requestCollected.Set(); + return !requestHasBeenCollected + && !responseWaiter.IsSet + && !pendingRequestCancellationTokenSource.IsCancellationRequested; + } + } + catch (ObjectDisposedException) + { + return false; + } + } + + public ResponseMessage Response => response ?? throw new InvalidOperationException("Response has not been set."); + public CancellationToken PendingRequestCancellationToken { get; } + + public async Task SetResponse(ResponseMessage response) + { + // If someone is calling this then we know for sure they collected the request + return await SetResponseAsync(response, requestWasCollected: true); + } + + async Task SetResponseAsync(ResponseMessage response, bool requestWasCollected) + { + using (await transferLock.LockAsync(CancellationToken.None)) + { + return SetResponseNoLock(response, requestWasCollected); + } + } + + ResponseMessage SetResponseNoLock(ResponseMessage response, bool requestWasCollected) + { + if (this.response != null) + { + return this.response; + } + + this.response = response; + responseWaiter.Set(); + if (requestWasCollected) + { + requestCollected.Set(); // Also the request has been collected, if we have a response. + } + + return this.response; + } + + public void Dispose() + { + transferLock?.Dispose(); + } + } + +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/RedisPendingRequestQueue.cs b/source/Halibut/Queue/Redis/RedisPendingRequestQueue.cs new file mode 100644 index 000000000..e55f399f5 --- /dev/null +++ b/source/Halibut/Queue/Redis/RedisPendingRequestQueue.cs @@ -0,0 +1,527 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Collections.Concurrent; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Queue.Redis.Cancellation; +using Halibut.Queue.Redis.Exceptions; +using Halibut.Queue.Redis.MessageStorage; +using Halibut.Queue.Redis.NodeHeartBeat; +using Halibut.Queue.Redis.RedisDataLossDetection; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Queue.Redis.ResponseMessageTransfer; +using Halibut.ServiceModel; +using Halibut.Transport.Protocol; +using Halibut.Util; +using Nito.AsyncEx; + +namespace Halibut.Queue.Redis +{ + class RedisPendingRequestQueue : IPendingRequestQueue, IDisposable + { + readonly Uri endpoint; + readonly IWatchForRedisLosingAllItsData watchForRedisLosingAllItsData; + readonly ILog log; + readonly IHalibutRedisTransport halibutRedisTransport; + readonly HalibutTimeoutsAndLimits halibutTimeoutsAndLimits; + readonly IMessageSerialiserAndDataStreamStorage messageSerialiserAndDataStreamStorage; + readonly AsyncManualResetEvent hasItemsForEndpoint = new(); + + readonly CancelOnDisposeCancellationToken queueCts = new (); + internal ConcurrentDictionary DisposablesForInFlightRequests = new(); + + readonly CancellationToken queueToken; + + // Used for testing. + int numberOfInFlightRequestsThatHaveReachedTheStageOfBeingReadyForCollection = 0; + + Task RequestMessageAvailablePulseChannelSubscriberDisposer { get; } + + public bool IsEmpty => Count == 0; + public int Count => numberOfInFlightRequestsThatHaveReachedTheStageOfBeingReadyForCollection; + + // The timespan is more generous for the sender going offline, since if it does go offline, + // under some cases the request completing is advantageous. That node needs to + // re-do the entire RPC for idempotent RPCs this might mean that the task required is already done. + internal TimeSpan RequestSenderNodeHeartBeatTimeout { get; set; } = TimeSpan.FromSeconds(90); + + // How often the Request Sender sends a heart beat. + internal TimeSpan RequestSenderNodeHeartBeatRate { get; set; } = TimeSpan.FromSeconds(15); + + /// + /// The amount of time since the last heart beat from the node sending the request to Tentacle + /// before the node is assumed to be offline. + /// + /// Setting this too high means things above the RPC might not have time to retry. + /// + public TimeSpan RequestReceiverNodeHeartBeatTimeout { get; set; } = TimeSpan.FromSeconds(60); + + // How often the Request Receiver node sends a heart beat. + internal TimeSpan RequestReceiverNodeHeartBeatRate { get; set; } = TimeSpan.FromSeconds(15); + + // How long the response message can live in redis. + internal TimeSpan TTLOfResponseMessage { get; set; } = TimeSpan.FromMinutes(20); + + internal TimeSpan TimeBetweenCheckingIfRequestWasCollected { get; set; } = TimeSpan.FromSeconds(30); + + public RedisPendingRequestQueue( + Uri endpoint, + IWatchForRedisLosingAllItsData watchForRedisLosingAllItsData, + ILog log, + IHalibutRedisTransport halibutRedisTransport, + IMessageSerialiserAndDataStreamStorage messageSerialiserAndDataStreamStorage, + HalibutTimeoutsAndLimits halibutTimeoutsAndLimits) + { + this.endpoint = endpoint; + this.watchForRedisLosingAllItsData = watchForRedisLosingAllItsData; + this.log = log.ForContext(); + this.messageSerialiserAndDataStreamStorage = messageSerialiserAndDataStreamStorage; + this.halibutRedisTransport = halibutRedisTransport; + this.halibutTimeoutsAndLimits = halibutTimeoutsAndLimits; + this.queueToken = queueCts.Token; + + // Ideally we would only subscribe subscribers which are using this queue. + RequestMessageAvailablePulseChannelSubscriberDisposer = Task.Run(async () => await this.halibutRedisTransport.SubscribeToRequestMessagePulseChannel(endpoint, _ => hasItemsForEndpoint.Set(), queueToken)); + } + + internal async Task WaitUntilQueueIsSubscribedToReceiveMessages() => await RequestMessageAvailablePulseChannelSubscriberDisposer; + + async Task DataLossCancellationToken(CancellationToken? cancellationToken) + { + await using var cts = new CancelOnDisposeCancellationToken(queueCts.Token, cancellationToken ?? CancellationToken.None); + return await watchForRedisLosingAllItsData.GetTokenForDataLossDetection(TimeSpan.FromSeconds(30), cts.Token); + } + + public async Task QueueAndWaitAsync(RequestMessage request, CancellationToken requestCancellationToken) + { + CancellationToken dataLossCt; + try + { + dataLossCt = await DataLossCancellationToken(requestCancellationToken); + } + catch (Exception ex) + { + if (requestCancellationToken.IsCancellationRequested) throw RedisPendingRequest.CreateExceptionForRequestWasCancelledBeforeCollected(request, log); + throw new CouldNotGetDataLossTokenInTimeHalibutClientException("Unable to reconnect to redis to get data loss detection CT", ex); + } + + Exception? CancellationReason() + { + if (dataLossCt.IsCancellationRequested) return new RedisDataLossHalibutClientException($"Request {request.ActivityId} was cancelled because we detected that redis lost all of its data."); + if (queueToken.IsCancellationRequested) return new RedisQueueShutdownClientException($"Request {request.ActivityId} was cancelled because the queue is shutting down."); + return null; + } + + Exception? CreateCancellationExceptionIfCancelled() + { + if (requestCancellationToken.IsCancellationRequested) return RedisPendingRequest.CreateExceptionForRequestWasCancelledBeforeCollected(request, log); + return CancellationReason(); + } + + + await using var cts = new CancelOnDisposeCancellationToken(queueCts.Token, requestCancellationToken, dataLossCt); + var cancellationToken = cts.Token; + + using var pending = new RedisPendingRequest(request, log); + + RedisStoredMessage messageToStore; + try + { + messageToStore = await messageSerialiserAndDataStreamStorage.PrepareRequest(request, cancellationToken); + } + catch (Exception ex) + { + throw CreateCancellationExceptionIfCancelled() + ?? new ErrorWhilePreparingRequestForQueueHalibutClientException($"Request {request.ActivityId} failed since an error occured when preparing request for queue", ex); + } + + + // Start listening for a response to the request, we don't want to miss the response. + await using var pollAndSubscribeToResponse = new PollAndSubscribeToResponse(endpoint, request.ActivityId, halibutRedisTransport, log); + + var tryClearRequestFromQueueAtMostOnce = new AsyncLazy(async () => await TryClearRequestFromQueue(pending)); + try + { + await using var senderPulse = new NodeHeartBeatSender(endpoint, request.ActivityId, halibutRedisTransport, log, HalibutQueueNodeSendingPulses.RequestSenderNode, RequestSenderNodeHeartBeatRate); + // Make the request available before we tell people it is available. + try + { + await halibutRedisTransport.PutRequest(endpoint, request.ActivityId, messageToStore, request.Destination.PollingRequestQueueTimeout, cancellationToken); + await halibutRedisTransport.PushRequestGuidOnToQueue(endpoint, request.ActivityId, cancellationToken); + await halibutRedisTransport.PulseRequestPushedToEndpoint(endpoint, cancellationToken); + } + catch (Exception ex) + { + throw CreateCancellationExceptionIfCancelled() + ?? new ErrorOccuredWhenInsertingDataIntoRedisHalibutPendingRequestQueueHalibutClientException($"Request {request.ActivityId} failed since an error occured inserting the data into the queue", ex); + } + + Interlocked.Increment(ref numberOfInFlightRequestsThatHaveReachedTheStageOfBeingReadyForCollection); + try + { + // We must be careful here to ensure we will always return. + + var watchProcessingNodeStillHasHeartBeat = WatchProcessingNodeIsStillConnectedInBackground(request, pending, cancellationToken); + var waitingForResponse = WaitForResponse(pollAndSubscribeToResponse, request, cancellationToken); + var pendingRequestWaitUntilComplete = pending.WaitUntilComplete( + async () => await tryClearRequestFromQueueAtMostOnce.Task, + CancellationReason, + cancellationToken); + + cts.AwaitTasksBeforeCTSDispose(watchProcessingNodeStillHasHeartBeat, waitingForResponse, pendingRequestWaitUntilComplete); + + await Task.WhenAny(waitingForResponse, pendingRequestWaitUntilComplete, watchProcessingNodeStillHasHeartBeat); + + if (pendingRequestWaitUntilComplete.IsCompleted || cancellationToken.IsCancellationRequested) + { + await pendingRequestWaitUntilComplete; + return pending.Response!; + } + + if (waitingForResponse.IsCompleted) + { + var response = await waitingForResponse; + if (response != null) + { + return await pending.SetResponse(response); + } + else if(!cancellationToken.IsCancellationRequested) + { + // We are no longer waiting for a response and have no response. + // The cancellation token has not been set so the request is not going to be cancelled. + // It is unclear how we got into this state, but lets at least error out. + return await pending.SetResponse(ResponseMessage.FromError(request, "Queue unexpectedly stopped waiting for a response")); + } + } + + if (watchProcessingNodeStillHasHeartBeat.IsCompleted) + { + var watcherResult = await watchProcessingNodeStillHasHeartBeat; + if (watcherResult == NodeWatcherResult.NodeMayHaveDisconnected) + { + // Make a list ditch effort to check if a response exists now. + if (await pollAndSubscribeToResponse.TryGetResponseFromRedis("Watcher", cancellationToken)) + { + var response = await waitingForResponse; + if (response != null) + { + return await pending.SetResponse(response); + } + } + + return await pending.SetResponse(ResponseMessage.FromError(request, "The node processing the request did not send a heartbeat for long enough, and so the node is now assumed to be offline.")); + } + } + + return await pending.SetResponse(ResponseMessage.FromError(request, "Impossible queue state reached")); + } + finally + { + Interlocked.Decrement(ref numberOfInFlightRequestsThatHaveReachedTheStageOfBeingReadyForCollection); + } + } + finally + { + InBackgroundSendCancellationIfRequestWasCancelled(request, pending); + // Make an attempt to ensure the request is removed from redis, if we are unsure it was removed. + var background = Task.Run(async () => await Try.IgnoringError(async () => + { + if (pending.HasRequestBeenMarkedAsCollected + || !pollAndSubscribeToResponse.ResponseJson.IsCompletedSuccessfully) + { + await tryClearRequestFromQueueAtMostOnce.Task; + } + })); + } + } + + + void InBackgroundSendCancellationIfRequestWasCancelled(RequestMessage request, RedisPendingRequest redisPending) + { + if (redisPending.PendingRequestCancellationToken.IsCancellationRequested) + { + log.Write(EventType.Diagnostic, "Request {0} was cancelled, sending cancellation to endpoint {1}", request.ActivityId, endpoint); + Task.Run(async () => await RequestCancelledSender.TrySendCancellation(halibutRedisTransport, endpoint, request, log)); + } + else + { + log.Write(EventType.Diagnostic, "Request {0} was not cancelled, no cancellation needed for endpoint {1}", request.ActivityId, endpoint); + } + } + + async Task WatchProcessingNodeIsStillConnectedInBackground(RequestMessage request, RedisPendingRequest redisPending, CancellationToken cancellationToken) + { + await Task.Yield(); + + return await NodeHeartBeatWatcher.WatchThatNodeProcessingTheRequestIsStillAlive( + endpoint, + request, + redisPending, + halibutRedisTransport, + TimeBetweenCheckingIfRequestWasCollected, + log, + RequestReceiverNodeHeartBeatTimeout, + cancellationToken); + } + + async Task TryClearRequestFromQueue(RedisPendingRequest redisPending) + { + var request = redisPending.Request; + log.Write(EventType.Diagnostic, "Attempting to clear request {0} from queue for endpoint {1}", request.ActivityId, endpoint); + + // The time the message is allowed to sit on the queue for has elapsed. + // Let's try to pop if from the queue, either: + // - We pop it, which means it was never collected so let pending deal with the timeout. + // - We could not pop it, which means it was collected. + try + { + if (redisPending.HasRequestBeenMarkedAsCollected) + { + log.Write(EventType.Diagnostic, "Request {0} has already been marked as collected, skipping queue removal for endpoint {1}", request.ActivityId, endpoint); + return false; + } + await using var cts = new CancelOnDisposeCancellationToken(); + cts.CancelAfter(TimeSpan.FromMinutes(2)); // Best efforts. + var requestMessage = await halibutRedisTransport.TryGetAndRemoveRequest(endpoint, request.ActivityId, cts.Token); + if (requestMessage != null) + { + log.Write(EventType.Diagnostic, "Successfully removed request {0} from queue - request was never collected by a processing node", request.ActivityId); + return true; + } + else + { + await redisPending.RequestHasBeenCollectedAndWillBeTransferred(); + log.Write(EventType.Diagnostic, "Request {0} was not found in queue - it was already collected by a processing node", request.ActivityId); + } + } + catch (Exception ex) + { + log.WriteException(EventType.Error, "Failed to clear request {0} from queue for endpoint {1}", ex, request.ActivityId, endpoint); + } + return false; + } + + async Task WaitForResponse( + PollAndSubscribeToResponse pollAndSubscribeToResponse, + RequestMessage requestMessage, + CancellationToken cancellationToken) + { + await Task.Yield(); + var activityId = requestMessage.ActivityId; + RedisStoredMessage responseJson; + try + { + log.Write(EventType.Diagnostic, "Waiting for response for request {0}", activityId); + responseJson = await pollAndSubscribeToResponse.ResponseJson.WaitAsync(cancellationToken); + log.Write(EventType.Diagnostic, "Received response JSON for request {0}, deserializing", activityId); + } + catch (Exception ex) + { + log.WriteException(EventType.Error, "Error while processing response for request {0}", ex, activityId); + return null; + } + + try + { + var response = await messageSerialiserAndDataStreamStorage.ReadResponse(responseJson, cancellationToken); + log.Write(EventType.Diagnostic, "Successfully deserialized response for request {0}", activityId); + return response; + } + catch (Exception ex) + { + log.Write(EventType.Error, "Error deserializing response for request {0}", activityId); + return ResponseMessage.FromException(requestMessage, new Exception("Error occured when reading data from the queue", ex)); + } + } + + public async Task DequeueAsync(CancellationToken cancellationToken) + { + // Is it good or bad that redis exceptions will bubble out of here? + // It will kill the TCP connection, which will force re-connect (in perhaps a backoff function) + // This could result in connecting to a node that is actually connected to redis. It could also + // cause a cascade of failure from high load. + var pending = await DequeueNextAsync(); + if (pending == null) return null; + + var disposables = new DisposableCollection(); + try + { + // There is a chance the data loss occured after we got the data but before here. + // In that case we will just time out because of the lack of heart beats. + var dataLossCT = await watchForRedisLosingAllItsData.GetTokenForDataLossDetection(TimeSpan.FromSeconds(30), queueToken); + + disposables.AddAsyncDisposable(new NodeHeartBeatSender(endpoint, pending.ActivityId, halibutRedisTransport, log, HalibutQueueNodeSendingPulses.RequestProcessorNode, RequestReceiverNodeHeartBeatRate)); + var watcher = new WatchForRequestCancellationOrSenderDisconnect(endpoint, pending.ActivityId, halibutRedisTransport, RequestSenderNodeHeartBeatTimeout, log); + disposables.AddAsyncDisposable(watcher); + + var cts = new CancelOnDisposeCancellationToken(watcher.RequestProcessingCancellationToken, dataLossCT); + disposables.AddAsyncDisposable(cts); + + var response = new RequestMessageWithCancellationToken(pending, cts.Token); + DisposablesForInFlightRequests[pending.ActivityId] = new WatcherAndDisposables(disposables, cts.Token, watcher); + return response; + } + catch (Exception) + { + await Try.IgnoringError(async () => await disposables.DisposeAsync()); + throw; + } + } + + public class WatcherAndDisposables : IAsyncDisposable + { + readonly DisposableCollection disposableCollection; + public CancellationToken RequestCancelledForAnyReasonCancellationToken { get; } + public WatchForRequestCancellationOrSenderDisconnect Watcher { get; } + + public WatcherAndDisposables(DisposableCollection disposableCollection, CancellationToken requestCancelledForAnyReasonCancellationToken, WatchForRequestCancellationOrSenderDisconnect watcher) + { + this.disposableCollection = disposableCollection; + this.RequestCancelledForAnyReasonCancellationToken = requestCancelledForAnyReasonCancellationToken; + this.Watcher = watcher; + } + + public async ValueTask DisposeAsync() + { + await Try.IgnoringError(async () => await disposableCollection.DisposeAsync()); + } + } + + public const string RequestAbandonedMessage = "The request was abandoned, possibly because the node processing the request shutdown or redis lost all of its data."; + + public async Task ApplyResponse(ResponseMessage response, Guid requestActivityId) + { + log.Write(EventType.MessageExchange, "Applying response for request {0}", requestActivityId); + WatcherAndDisposables? watcherAndDisposables = null; + if (!DisposablesForInFlightRequests.TryRemove(requestActivityId, out watcherAndDisposables)) + { + log.Write(EventType.Diagnostic, "No in-flight request resources found to dispose for request {0}", requestActivityId); + } + + try + { + if (response == null) + { + log.Write(EventType.Diagnostic, "Response is null for request {0}, skipping apply", requestActivityId); + return; + } + + log.Write(EventType.MessageExchange, "Preparing response payload for request {0}", requestActivityId); + var cancellationToken = CancellationToken.None; + + // This node has now completed the RPC, and so the response must be sent + // back to the node which sent the response + + if (watcherAndDisposables != null && watcherAndDisposables.RequestCancelledForAnyReasonCancellationToken.IsCancellationRequested) + { + if (!watcherAndDisposables.Watcher.SenderCancelledTheRequest) + { + log.Write(EventType.Diagnostic, "Response for request {0}, has been overridden with an abandon message as the request was abandoned", requestActivityId); + response = ResponseMessage.FromException(response, new HalibutClientException(RequestAbandonedMessage)); + } + } + var responseStoredMessage = await messageSerialiserAndDataStreamStorage.PrepareResponse(response, cancellationToken); + log.Write(EventType.MessageExchange, "Sending response message for request {0}", requestActivityId); + await ResponseMessageSender.SendResponse(halibutRedisTransport, endpoint, requestActivityId, responseStoredMessage, TTLOfResponseMessage, log); + log.Write(EventType.MessageExchange, "Successfully applied response for request {0}", requestActivityId); + } + catch (Exception ex) + { + log.WriteException(EventType.Error, "Error applying response for request {0}", ex, requestActivityId); + throw; + } + finally + { + log.Write(EventType.Diagnostic, "Disposing in-flight request resources for request {0}", requestActivityId); + if (watcherAndDisposables != null) + { + await watcherAndDisposables.DisposeAsync(); + } + } + } + + async Task DequeueNextAsync() + { + await using var cts = new CancelOnDisposeCancellationToken(queueToken); + try + { + hasItemsForEndpoint.Reset(); + + var first = await TryRemoveNextItemFromQueue(cts.Token); + if (first != null) + { + return first; + } + + await Task.WhenAny( + hasItemsForEndpoint.WaitAsync(cts.Token), + Task.Delay(halibutTimeoutsAndLimits.PollingQueueWaitTimeout, cts.Token)); + + if (!hasItemsForEndpoint.IsSet) + { + // Timed out waiting for something to go on the queue, send back a null to tentacle + // to keep the connection healthy. + return null; + } + + return await TryRemoveNextItemFromQueue(cts.Token); + } + catch (Exception ex) + { + if (!queueToken.IsCancellationRequested) + { + log.WriteException(EventType.Error, "Error occured dequeuing from the queue", ex); + // It is very likely a queue error means every tentacle will return an error. + // Add a random delay to help avoid every client coming back at exactly the same time. + await Task.Delay(TimeSpan.FromSeconds(new Random().Next(15)), cts.Token); + } + throw; + } + finally + { + await cts.CancelAsync(); + } + } + + async Task TryRemoveNextItemFromQueue(CancellationToken cancellationToken) + { + while (true) + { + var activityId = await halibutRedisTransport.TryPopNextRequestGuid(endpoint, cancellationToken); + + if (activityId is null) + { + // Nothing is on the queue. + return null; + } + + var jsonRequest = await halibutRedisTransport.TryGetAndRemoveRequest(endpoint, activityId.Value, cancellationToken); + + if (jsonRequest == null) + { + // This request has been picked up by someone else, go around the loop and look for something else to do. + continue; + } + + var request = await messageSerialiserAndDataStreamStorage.ReadRequest(jsonRequest, cancellationToken); + log.Write(EventType.Diagnostic, "Successfully collected request {0} from queue for endpoint {1}", request.ActivityId, endpoint); + + return request; + } + } + + public async ValueTask DisposeAsync() + { + await Try.IgnoringError(async () => await queueCts.DisposeAsync()); + await Try.IgnoringError(async () => await (await RequestMessageAvailablePulseChannelSubscriberDisposer).DisposeAsync()); + } + + public void Dispose() + { + DisposeAsync().GetAwaiter().GetResult(); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/RedisPendingRequestQueueFactory.cs b/source/Halibut/Queue/Redis/RedisPendingRequestQueueFactory.cs new file mode 100644 index 000000000..4d50bc983 --- /dev/null +++ b/source/Halibut/Queue/Redis/RedisPendingRequestQueueFactory.cs @@ -0,0 +1,52 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Queue.QueuedDataStreams; +using Halibut.Queue.Redis.MessageStorage; +using Halibut.Queue.Redis.RedisDataLossDetection; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.ServiceModel; + +namespace Halibut.Queue.Redis +{ + public class RedisPendingRequestQueueFactory : IPendingRequestQueueFactory + { + readonly QueueMessageSerializer queueMessageSerializer; + readonly IStoreDataStreamsForDistributedQueues dataStreamStorage; + readonly HalibutRedisTransport halibutRedisTransport; + readonly ILogFactory logFactory; + readonly HalibutTimeoutsAndLimits halibutTimeoutsAndLimits; + readonly IWatchForRedisLosingAllItsData watchForRedisLosingAllItsData; + + public RedisPendingRequestQueueFactory( + QueueMessageSerializer queueMessageSerializer, + IStoreDataStreamsForDistributedQueues dataStreamStorage, + IWatchForRedisLosingAllItsData watchForRedisLosingAllItsData, + HalibutRedisTransport halibutRedisTransport, + HalibutTimeoutsAndLimits halibutTimeoutsAndLimits, + ILogFactory logFactory) + { + this.queueMessageSerializer = queueMessageSerializer; + this.dataStreamStorage = dataStreamStorage; + this.halibutRedisTransport = halibutRedisTransport; + this.logFactory = logFactory; + this.halibutTimeoutsAndLimits = halibutTimeoutsAndLimits; + this.watchForRedisLosingAllItsData = watchForRedisLosingAllItsData; + } + + public IPendingRequestQueue CreateQueue(Uri endpoint) + { + return new RedisPendingRequestQueue(endpoint, + watchForRedisLosingAllItsData, + logFactory.ForEndpoint(endpoint), + halibutRedisTransport, + new MessageSerialiserAndDataStreamStorage(queueMessageSerializer, dataStreamStorage), + halibutTimeoutsAndLimits); + } + + } +} + +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/ResponseMessageTransfer/PollAndSubscribeToResponse.cs b/source/Halibut/Queue/Redis/ResponseMessageTransfer/PollAndSubscribeToResponse.cs new file mode 100644 index 000000000..cd561d31b --- /dev/null +++ b/source/Halibut/Queue/Redis/ResponseMessageTransfer/PollAndSubscribeToResponse.cs @@ -0,0 +1,174 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Util; +using Nito.AsyncEx; + +namespace Halibut.Queue.Redis.ResponseMessageTransfer +{ + public class PollAndSubscribeToResponse : IAsyncDisposable + { + readonly CancelOnDisposeCancellationToken objectLifeTimeCts; + readonly ILog log; + readonly IHalibutRedisTransport halibutRedisTransport; + readonly Uri endpoint; + readonly Guid activityId; + readonly LinearBackoffStrategy pollBackoffStrategy; + + readonly TaskCompletionSource responseJsonCompletionSource = new(); + + /// + /// An awaitable task that returns when the response is available. + /// + public Task ResponseJson => responseJsonCompletionSource.Task; + + public PollAndSubscribeToResponse(Uri endpoint, Guid activityId, IHalibutRedisTransport halibutRedisTransport, ILog log) + { + this.log = log.ForContext(); + + this.endpoint = endpoint; + this.activityId = activityId; + this.halibutRedisTransport = halibutRedisTransport; + this.pollBackoffStrategy = new LinearBackoffStrategy( + TimeSpan.FromSeconds(15), // Initial delay: 15s + TimeSpan.FromSeconds(15), // Increment: 15s + TimeSpan.FromMinutes(2) // Maximum delay: 2 minutes + ); + this.log.Write(EventType.Diagnostic, "Starting to watch for response - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + + objectLifeTimeCts = new CancelOnDisposeCancellationToken(); + var token = objectLifeTimeCts.Token; + objectLifeTimeCts.AwaitTasksBeforeCTSDispose(Task.Run(async () => await WaitForResponse(token))); + } + + async Task WaitForResponse(CancellationToken token) + { + try + { + log.Write(EventType.Diagnostic, "Subscribing to response notifications - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + + // This could wait forever to subscribe to redis if redis is offline. We need some way of limiting how long we take + // to subscribe. + // https://whimsical.com/subscribetonodeheartbeatchannel-should-timeout-while-waiting-to--NFWwmPkE7pTBdm2PRUC8Tf + await using var _ = await halibutRedisTransport.SubscribeToResponseChannel(endpoint, activityId, + async _ => + { + + log.Write(EventType.Diagnostic, "Received response notification via subscription - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + await TryGetResponseFromRedis("subscription", token); + }, + token); + + log.Write(EventType.Diagnostic, "Starting polling loop for response - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + + // We actually want a delay before we actually have a go at polling for the response, since it makes + // no sense to send a Request and expect an immediate reply. + pollBackoffStrategy.Try(); + + // Also poll to see if the value is set since we can miss the publication. + while (!token.IsCancellationRequested) + { + var delay = pollBackoffStrategy.GetSleepPeriod(); + log.Write(EventType.Diagnostic, "Waiting {0} seconds before next poll for response - Endpoint: {1}, ActivityId: {2}", delay.TotalSeconds, endpoint, activityId); + await Try.IgnoringError(async () => await Task.Delay(delay, token)); + if(token.IsCancellationRequested) break; + log.Write(EventType.Diagnostic, "Done waiting going to poll for response - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + + try + { + pollBackoffStrategy.Try(); + if (await TryGetResponseFromRedis("polling", token)) + { + break; + } + } + catch (Exception ex) + { + log.Write(EventType.Diagnostic, "Error while polling for response - Endpoint: {0}, ActivityId: {1}, Error: {2}", endpoint, activityId, ex.Message); + } + } + + log.Write(EventType.Diagnostic, "Exiting watch loop for response - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + } + catch (Exception ex) + { + if (!token.IsCancellationRequested) + { + log.Write(EventType.Error, "Unexpected error in response watcher - Endpoint: {0}, ActivityId: {1}, Error: {2}", endpoint, activityId, ex.Message); + } + } + } + + readonly SemaphoreSlim trySetResultSemaphore = new(1, 1); + + /// + /// Makes an attempt to get the response from redis. + /// + /// + /// + /// true if a response message is available. + public async Task TryGetResponseFromRedis(string detectedBy, CancellationToken token) + { + using var l = await trySetResultSemaphore.LockAsync(token); + + if (responseJsonCompletionSource.Task.IsCompleted) return true; + + var responseJson = await halibutRedisTransport.GetResponseMessage(endpoint, activityId, token); + + if (responseJson != null) + { + log.Write(EventType.Diagnostic, "Response detected via {0} - Endpoint: {1}, ActivityId: {2}", detectedBy, endpoint, activityId); + + await DeleteResponseFromRedis(detectedBy, token); + + TrySetResponse(responseJson); + await Try.IgnoringError(async () => await objectLifeTimeCts.CancelAsync()); + log.Write(EventType.Diagnostic, "Cancelling polling loop for response - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + return true; + } + + return false; + } + + async Task DeleteResponseFromRedis(string detectedBy, CancellationToken token) + { + try + { + await halibutRedisTransport.DeleteResponseMessage(endpoint, activityId, token); + } + catch (Exception ex) + { + log.Write(EventType.Error, "Failed to delete response from Redis via {0} - Endpoint: {1}, ActivityId: {2}, Error: {3}", detectedBy, endpoint, activityId, ex.Message); + } + } + + void TrySetResponse(RedisStoredMessage value) + { + try + { + responseJsonCompletionSource.TrySetResult(value); + } + catch (Exception ex) + { + log.Write(EventType.Error, "Failed to set response - Endpoint: {0}, ActivityId: {1}, Error: {2}", endpoint, activityId, ex.Message); + } + } + + public async ValueTask DisposeAsync() + { + log.Write(EventType.Diagnostic, "Disposing GenericWatcher for response - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + + await Try.IgnoringError(async () => await objectLifeTimeCts.CancelAsync()); + + // If the message task is not yet complete, then mark it as cancelled + Try.IgnoringError(() => responseJsonCompletionSource.TrySetCanceled()); + + log.Write(EventType.Diagnostic, "Disposed GenericWatcher for response - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/ResponseMessageTransfer/ResponseMessageSender.cs b/source/Halibut/Queue/Redis/ResponseMessageTransfer/ResponseMessageSender.cs new file mode 100644 index 000000000..5625f1365 --- /dev/null +++ b/source/Halibut/Queue/Redis/ResponseMessageTransfer/ResponseMessageSender.cs @@ -0,0 +1,47 @@ +#if NET8_0_OR_GREATER +using System; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Queue.Redis.RedisHelpers; +using Halibut.Util; + +namespace Halibut.Queue.Redis.ResponseMessageTransfer +{ + public class ResponseMessageSender + { + public static async Task SendResponse( + IHalibutRedisTransport halibutRedisTransport, + Uri endpoint, + Guid activityId, + RedisStoredMessage responseMessage, + TimeSpan ttl, + ILog log) + { + log.Write(EventType.Diagnostic, "Attempting to set response for - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + + await using var cts = new CancelOnDisposeCancellationToken(); + // More than ten minutes to send the response to redis, seems sus. + cts.CancelAfter(TimeSpan.FromMinutes(10)); + + try + { + log.Write(EventType.Diagnostic, "Marking response as set - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + await halibutRedisTransport.SetResponseMessage(endpoint, activityId, responseMessage, ttl, cts.Token); + + log.Write(EventType.Diagnostic, "Publishing response notification - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + await halibutRedisTransport.PublishThatResponseIsAvailable(endpoint, activityId, cts.Token); + + log.Write(EventType.Diagnostic, "Successfully set response - Endpoint: {0}, ActivityId: {1}", endpoint, activityId); + } + catch (OperationCanceledException ex) + { + log.Write(EventType.Error, "Set response operation timed out after 2 minutes - Endpoint: {0}, ActivityId: {1}, Error: {2}", endpoint, activityId, ex.Message); + } + catch (Exception ex) + { + log.Write(EventType.Error, "Failed to set response - Endpoint: {0}, ActivityId: {1}, Error: {2}", endpoint, activityId, ex.Message); + } + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Queue/Redis/WatchForRequestCancellationOrSenderDisconnect.cs b/source/Halibut/Queue/Redis/WatchForRequestCancellationOrSenderDisconnect.cs new file mode 100644 index 000000000..1a249437a --- /dev/null +++ b/source/Halibut/Queue/Redis/WatchForRequestCancellationOrSenderDisconnect.cs @@ -0,0 +1,82 @@ + +#if NET8_0_OR_GREATER +using System; +using System.Threading; +using System.Threading.Tasks; +using Halibut.Diagnostics; +using Halibut.Queue.Redis.Cancellation; +using Halibut.Queue.Redis.NodeHeartBeat; +using Halibut.Util; + +namespace Halibut.Queue.Redis +{ + + public class WatchForRequestCancellationOrSenderDisconnect : IAsyncDisposable + { + readonly CancelOnDisposeCancellationToken requestCancellationTokenSource; + public CancellationToken RequestProcessingCancellationToken { get; } + + readonly CancelOnDisposeCancellationToken keepWatchingCancellationToken; + + readonly DisposableCollection disposableCollection = new(); + + readonly WatchForRequestCancellation watchForRequestCancellation; + public bool SenderCancelledTheRequest => watchForRequestCancellation.SenderCancelledTheRequest; + + public WatchForRequestCancellationOrSenderDisconnect( + Uri endpoint, + Guid requestActivityId, + IHalibutRedisTransport halibutRedisTransport, + TimeSpan nodeOfflineTimeoutBetweenHeartBeatsFromSender, + ILog log) + { + try + { + watchForRequestCancellation = new WatchForRequestCancellation(endpoint, requestActivityId, halibutRedisTransport, log); + disposableCollection.AddAsyncDisposable(watchForRequestCancellation); + + requestCancellationTokenSource = new CancelOnDisposeCancellationToken(watchForRequestCancellation.RequestCancelledCancellationToken); + disposableCollection.AddAsyncDisposable(requestCancellationTokenSource); + RequestProcessingCancellationToken = requestCancellationTokenSource.Token; + + keepWatchingCancellationToken = new CancelOnDisposeCancellationToken(); + disposableCollection.AddAsyncDisposable(keepWatchingCancellationToken); + + Task.Run(() => WatchThatNodeWhichSentTheRequestIsStillAlive(endpoint, requestActivityId, halibutRedisTransport, nodeOfflineTimeoutBetweenHeartBeatsFromSender, log)); + } + catch (Exception) + { + Try.IgnoringError(async () => await disposableCollection.DisposeAsync()).GetAwaiter().GetResult(); + throw; + } + } + + async Task WatchThatNodeWhichSentTheRequestIsStillAlive(Uri endpoint, Guid requestActivityId, IHalibutRedisTransport halibutRedisTransport, TimeSpan nodeOfflineTimeoutBetweenHeartBeatsFromSender, ILog log) + { + var watchCancellationToken = keepWatchingCancellationToken.Token; + try + { + var res = await NodeHeartBeatWatcher + .WatchThatNodeWhichSentTheRequestIsStillAlive(endpoint, requestActivityId, halibutRedisTransport, log, nodeOfflineTimeoutBetweenHeartBeatsFromSender, watchCancellationToken); + if (res == NodeWatcherResult.NodeMayHaveDisconnected) + { + await requestCancellationTokenSource.CancelAsync(); + } + } + catch (Exception) when (watchCancellationToken.IsCancellationRequested) + { + log.Write(EventType.Diagnostic, "Sender node watcher cancelled for request {0}, endpoint {1}", requestActivityId, endpoint); + } + catch (Exception ex) + { + log.WriteException(EventType.Error, "Error watching sender node for request {0}, endpoint {1}", ex, requestActivityId, endpoint); + } + } + + public async ValueTask DisposeAsync() + { + await disposableCollection.DisposeAsync(); + } + } +} +#endif \ No newline at end of file diff --git a/source/Halibut/Transport/Protocol/HalibutContractResolver.cs b/source/Halibut/Transport/Protocol/HalibutContractResolver.cs index 8b975bd56..71cafcd25 100644 --- a/source/Halibut/Transport/Protocol/HalibutContractResolver.cs +++ b/source/Halibut/Transport/Protocol/HalibutContractResolver.cs @@ -12,9 +12,11 @@ public class HalibutContractResolver : DefaultContractResolver public override JsonContract ResolveContract(Type type) { - if (type == typeof(DataStream)) + // Halibut supports sub classing of DataStream, over the wire we will send only the + // DataStream itself. + if (typeof(DataStream).IsAssignableFrom(type)) { - var contract = base.ResolveContract(type); + var contract = base.ResolveContract(typeof(DataStream)); // The contract is shared, so we need to make sure multiple threads don't try to edit it at the same time. if (!HaveAddedCaptureOnSerializeCallback) { diff --git a/source/Halibut/Transport/Protocol/MessageSerializer.cs b/source/Halibut/Transport/Protocol/MessageSerializer.cs index a677dedd3..201b182d6 100644 --- a/source/Halibut/Transport/Protocol/MessageSerializer.cs +++ b/source/Halibut/Transport/Protocol/MessageSerializer.cs @@ -14,8 +14,7 @@ namespace Halibut.Transport.Protocol { public class MessageSerializer : IMessageSerializer { - readonly ITypeRegistry typeRegistry; - readonly Func createStreamCapturingSerializer; + internal readonly Func CreateStreamCapturingSerializer; readonly IMessageSerializerObserver observer; readonly long readIntoMemoryLimitBytes; readonly long writeIntoMemoryLimitBytes; @@ -29,19 +28,13 @@ internal MessageSerializer( long writeIntoMemoryLimitBytes, ILogFactory logFactory) { - this.typeRegistry = typeRegistry; - this.createStreamCapturingSerializer = createStreamCapturingSerializer; + this.CreateStreamCapturingSerializer = createStreamCapturingSerializer; this.observer = observer; this.readIntoMemoryLimitBytes = readIntoMemoryLimitBytes; this.writeIntoMemoryLimitBytes = writeIntoMemoryLimitBytes; deflateReflector = new DeflateStreamInputBufferReflector(logFactory.ForPrefix(nameof(MessageSerializer))); } - public void AddToMessageContract(params Type[] types) // kept for backwards compatibility - { - typeRegistry.AddToMessageContract(types); - } - public async Task> WriteMessageAsync(Stream stream, T message, CancellationToken cancellationToken) { IReadOnlyList serializedStreams; @@ -58,7 +51,7 @@ public async Task> WriteMessageAsync(Stream stream, // for the moment this MUST be object so that the $type property is included // If it is not, then an old receiver (eg, old tentacle) will not be able to understand messages from a new sender (server) // Once ALL sources and targets are deserializing to MessageEnvelope, (ReadBsonMessage) then this can be changed to T - var streamCapturingSerializer = createStreamCapturingSerializer(); + var streamCapturingSerializer = CreateStreamCapturingSerializer(); streamCapturingSerializer.Serializer.Serialize(bson, new MessageEnvelope { Message = message! }); serializedStreams = streamCapturingSerializer.DataStreams; @@ -159,7 +152,7 @@ public async Task> WriteMessageAsync(Stream stream, (MessageEnvelope MessageEnvelope, IReadOnlyList DataStreams) DeserializeMessageAndDataStreams(JsonReader reader) { - var streamCapturingSerializer = createStreamCapturingSerializer(); + var streamCapturingSerializer = CreateStreamCapturingSerializer(); var result = streamCapturingSerializer.Serializer.Deserialize>(reader); if (result == null) diff --git a/source/Halibut/Transport/Protocol/ResponseMessage.cs b/source/Halibut/Transport/Protocol/ResponseMessage.cs index 6e6a1c3a1..f73dc6ca8 100644 --- a/source/Halibut/Transport/Protocol/ResponseMessage.cs +++ b/source/Halibut/Transport/Protocol/ResponseMessage.cs @@ -32,6 +32,11 @@ public static ResponseMessage FromException(RequestMessage request, Exception ex { return new ResponseMessage { Id = request.Id, Error = ServerErrorFromException(ex, connectionState) }; } + + public static ResponseMessage FromException(ResponseMessage response, Exception ex, ConnectionState connectionState = ConnectionState.Unknown) + { + return new ResponseMessage { Id = response.Id, Error = ServerErrorFromException(ex, connectionState) }; + } internal static ServerError ServerErrorFromException(Exception ex, ConnectionState connectionState = ConnectionState.Unknown) { diff --git a/source/Halibut/Util/CancelOnDisposeCancellationToken.cs b/source/Halibut/Util/CancelOnDisposeCancellationToken.cs new file mode 100644 index 000000000..1b4d4c637 --- /dev/null +++ b/source/Halibut/Util/CancelOnDisposeCancellationToken.cs @@ -0,0 +1,100 @@ +#nullable enable +using System; +using System.Collections.Concurrent; +using System.Collections.Generic; +using System.Linq; +using System.Threading; +using System.Threading.Tasks; + +namespace Halibut.Util +{ + + /// + /// Helps with safely working with CancellationTokenSources. + /// + /// CancellationTokens and CancellationTokenSources can be tricky to work with since: + /// - Asking for a token from a disposed CTS throws which is often surprising. + /// - Disposal of a CTS does not cancel the token. + /// - Even if the CTS is cancelled then dispose, race conditions exists where some + /// tasks using the cancelled token DO NOT GET CANCELLED e.g. Task.Delay(); + /// + /// To help with some of those this class: + /// - Gets a copy of the Token from the CTS, before it is disposed. So asking + /// for the token never throws. + /// - Always cancels the CTS before disposing of it, so anything with the token + /// general (except in dotnet race condition cases) gets cancelled. + /// - Supports awaiting tasks that are using the CTS's Token in dispose. Specifically + /// when disposed this class will cancel the CTS, then await those tasks given to it + /// (ignoring errors) and only then disposing the CTS. This avoids the bugs/race + /// conditions in Dotnet. + /// + /// + public sealed class CancelOnDisposeCancellationToken : IAsyncDisposable + { + readonly CancellationTokenSource cancellationTokenSource; + bool disposed; + + readonly ConcurrentBag tasks = new(); + + public CancelOnDisposeCancellationToken(params CancellationToken[] token) + : this(CancellationTokenSource.CreateLinkedTokenSource(token)) + { + } + public CancelOnDisposeCancellationToken() : this(new CancellationTokenSource()) + { + } + + CancelOnDisposeCancellationToken(CancellationTokenSource cancellationTokenSource) + { + this.cancellationTokenSource = cancellationTokenSource; + Token = cancellationTokenSource.Token; + } + + public CancellationToken Token { get; } + + public async ValueTask DisposeAsync() + { + if (disposed) + { + return; + } + + disposed = true; + + await Try.IgnoringError(async () => await CancelAsync()); + + // Wait for any tasks that are using the token, before disposal + await Task.WhenAll(tasks.Select(t => Try.IgnoringError(() => t))); + + Try.IgnoringError(() => cancellationTokenSource.Dispose()); + } + + public async Task CancelAsync() + { +#if NET8_0_OR_GREATER + await cancellationTokenSource.CancelAsync(); +#else + await Task.CompletedTask; + cancellationTokenSource.Cancel(); +#endif + } + + public void CancelAfter(TimeSpan timeSpan) + { + cancellationTokenSource.CancelAfter(timeSpan); + } + + /// + /// Tasks supplied here will be awaited on in the dispose method after + /// the Token is cancelled and before the token is disposed. + /// + /// + public void AwaitTasksBeforeCTSDispose(params Task[] tasksUsingToken) + { + foreach (var task in tasksUsingToken) + { + tasks.Add(task); + } + } + } +} \ No newline at end of file diff --git a/source/Halibut/Util/FuncAsyncDisposable.cs b/source/Halibut/Util/FuncAsyncDisposable.cs new file mode 100644 index 000000000..23ff504b6 --- /dev/null +++ b/source/Halibut/Util/FuncAsyncDisposable.cs @@ -0,0 +1,20 @@ +using System; +using System.Threading.Tasks; + +namespace Halibut.Util +{ + public class FuncAsyncDisposable : IAsyncDisposable + { + readonly Func disposer; + + public FuncAsyncDisposable(Func disposer) + { + this.disposer = disposer; + } + + public async ValueTask DisposeAsync() + { + await this.disposer(); + } + } +} \ No newline at end of file diff --git a/source/Halibut/Util/LinearBackoffStrategy.cs b/source/Halibut/Util/LinearBackoffStrategy.cs new file mode 100644 index 000000000..9f6cfc774 --- /dev/null +++ b/source/Halibut/Util/LinearBackoffStrategy.cs @@ -0,0 +1,95 @@ +using System; + +namespace Halibut.Util +{ + /// + /// A simple linear delay backoff strategy that increases the delay by a fixed increment + /// on each retry attempt (e.g., 1s, 2s, 3s, 4s, etc.). + /// + public class LinearBackoffStrategy + { + int attemptCount; + + public LinearBackoffStrategy(TimeSpan initialDelay, TimeSpan increment, TimeSpan maximumDelay) + { + if (initialDelay < TimeSpan.Zero) + throw new ArgumentOutOfRangeException(nameof(initialDelay), "Initial delay must be non-negative"); + if (increment <= TimeSpan.Zero) + throw new ArgumentOutOfRangeException(nameof(increment), "Increment must be greater than zero"); + if (maximumDelay < initialDelay) + throw new ArgumentOutOfRangeException(nameof(maximumDelay), "Maximum delay must be greater than or equal to initial delay"); + + InitialDelay = initialDelay; + Increment = increment; + MaximumDelay = maximumDelay; + attemptCount = 0; + } + + public TimeSpan InitialDelay { get; } + public TimeSpan Increment { get; } + public TimeSpan MaximumDelay { get; } + public int AttemptCount => attemptCount; + + /// + /// Creates a LinearBackoffStrategy with sensible defaults: + /// Initial delay of 1 second, increment of 1 second, maximum delay of 30 seconds. + /// + public static LinearBackoffStrategy Create() + { + return new LinearBackoffStrategy( + initialDelay: TimeSpan.FromSeconds(1), + increment: TimeSpan.FromSeconds(1), + maximumDelay: TimeSpan.FromSeconds(30) + ); + } + + /// + /// Records a retry attempt and increments the internal attempt counter. + /// + public virtual void Try() + { + attemptCount++; + } + + /// + /// Resets the backoff strategy after a successful operation. + /// + public virtual void Success() + { + attemptCount = 0; + } + + /// + /// Gets the delay period for the current attempt number. + /// The delay increases linearly: initialDelay + (attemptCount - 1) * increment. + /// + public virtual TimeSpan GetSleepPeriod() + { + if (attemptCount <= 0) + { + return TimeSpan.Zero; + } + + var delay = InitialDelay + TimeSpan.FromTicks((attemptCount - 1) * Increment.Ticks); + + // Cap at maximum delay + return delay > MaximumDelay ? MaximumDelay : delay; + } + + /// + /// Calculates the delay for a specific attempt number without modifying internal state. + /// + public TimeSpan CalculateDelayForAttempt(int attemptNumber) + { + if (attemptNumber <= 0) + { + return TimeSpan.Zero; + } + + var delay = InitialDelay + TimeSpan.FromTicks((attemptNumber - 1) * Increment.Ticks); + + // Cap at maximum delay + return delay > MaximumDelay ? MaximumDelay : delay; + } + } +} \ No newline at end of file diff --git a/source/Halibut/Util/StringExtensionMethods.cs b/source/Halibut/Util/StringExtensionMethods.cs new file mode 100644 index 000000000..71f3fc01a --- /dev/null +++ b/source/Halibut/Util/StringExtensionMethods.cs @@ -0,0 +1,19 @@ +using System; +using System.Diagnostics.CodeAnalysis; + +namespace Halibut.Util +{ + public static class StringExtensionMethods + { + [return: NotNullIfNotNull("str")] + public static Guid? ToGuid(this string? str) + { + if (str == null) + { + return null; + } + + return Guid.Parse(str); + } + } +} \ No newline at end of file diff --git a/source/Halibut/Util/TimeSpanHelper.cs b/source/Halibut/Util/TimeSpanHelper.cs new file mode 100644 index 000000000..c64922c10 --- /dev/null +++ b/source/Halibut/Util/TimeSpanHelper.cs @@ -0,0 +1,12 @@ +using System; + +namespace Halibut.Util +{ + public static class TimeSpanHelper + { + public static TimeSpan Min(TimeSpan t1, TimeSpan t2) + { + return t1 < t2 ? t1 : t2; + } + } +} \ No newline at end of file diff --git a/source/Halibut/Util/Try.cs b/source/Halibut/Util/Try.cs index 4c01d353c..0aba28cb5 100644 --- a/source/Halibut/Util/Try.cs +++ b/source/Halibut/Util/Try.cs @@ -34,6 +34,30 @@ public static SilentStreamDisposer CatchingErrorOnDisposal(Stream streamToDispos { return new SilentStreamDisposer(streamToDispose, onFailure); } + + public static void IgnoringError(Action tryThisAction) + { + try + { + tryThisAction(); + } + catch + { + // ignored + } + } + + public static async Task IgnoringError(Func tryThisAction) + { + try + { + await tryThisAction(); + } + catch + { + // ignored + } + } } } \ No newline at end of file diff --git a/source/Halibut/Util/net48Helpers/NotNullWhenAttribute.cs b/source/Halibut/Util/net48Helpers/NotNullWhenAttribute.cs new file mode 100644 index 000000000..60798dc05 --- /dev/null +++ b/source/Halibut/Util/net48Helpers/NotNullWhenAttribute.cs @@ -0,0 +1,27 @@ + + + +// This is not available in net48 + +#if NETFRAMEWORK +using System; + +#nullable enable +namespace System.Diagnostics.CodeAnalysis +{ + /// Specifies that the output will be non-null if the named parameter is non-null. + [AttributeUsage(AttributeTargets.Property | AttributeTargets.Parameter | AttributeTargets.ReturnValue, AllowMultiple = true, Inherited = false)] + public sealed class NotNullIfNotNullAttribute : Attribute + { + /// Initializes the attribute with the associated parameter name. + /// The associated parameter name. The output will be non-null if the argument to the parameter specified is non-null. + public NotNullIfNotNullAttribute(string parameterName) => this.ParameterName = parameterName; + + /// Gets the associated parameter name. + /// The associated parameter name. The output will be non-null if the argument to the parameter specified is non-null. + public string ParameterName { get; } + } +} + + +#endif \ No newline at end of file