Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
497 changes: 482 additions & 15 deletions Cargo.lock

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions sdac-lib/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,14 @@ license = "MIT"
description = "Software Defined Acclerated Compute"
homepage = "https://github.com/xertai/sdac"
edition = "2021"
build = "build-cuda-types.rs"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, hope you don't mind some random comments since I've landed here with curiosity from Steve's latest post...

This to me is a sign that you probably want a sub-crate, that can form a hermetic boundary around your code generation needs. build files can be a bit tricky - one common failure mode I see is people who end up rebuilding everything every build because they haven't annotated things correctly, and it can help a lot to have that isolated to a small crate.


[lib]
crate-type = ["cdylib"]

[build-dependencies]
bindgen = "0.63.0"

[dependencies]
tarpc = { version = "0.31.0", features = ["full"] }
tokio = { version = "1.24.2", features = ["macros", "net", "rt-multi-thread"] }
Expand All @@ -20,3 +24,4 @@ libc = "0.2.139"
futures = "0.3.25"
anyhow = "1.0.68"
service = { version = "0.0.0", path = "../service" }
clib = "0.2.1"
31 changes: 31 additions & 0 deletions sdac-lib/build-cuda-types.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
extern crate bindgen;

use std::env;
use std::path::{Path, PathBuf};

// Install NVIDIA CUDA prior to building the bindings with `cargo build`.
// https://docs.rs/bindgen/latest/bindgen/struct.Builder.html
fn main() {
let cdir = std::env::var("CUDA_DIR").unwrap_or("/usr/local/cuda-11.8".to_string());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/usr/local/cuda-11.8 is going to change over time and over OS; see for instance https://askubuntu.com/questions/1375718/no-usr-local-cuda-directory-after-cuda-installation

I think you're going to want a helper function that probes some N places to find the actual path. Some folk working on build systems are also trying to make sure all contributing state to a build can be materialized to disk, so perhaps having it in a config file or some such would help too.

let cuda_dir = Path::new(&cdir);

let bindings = bindgen::Builder::default()
.header(cuda_dir.join("include/cuda.h").display().to_string())
.header(cuda_dir.join("include/cuda_runtime_api.h").display().to_string())
.allowlist_type("CU.*")
.allowlist_type("cuda.*")
.derive_eq(true)
.array_pointers_in_arguments(true)
.generate()
.unwrap();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to want to add CargoCallbacks here, otherwise cargo won't know when to rebuild your bindings. https://docs.rs/bindgen/latest/bindgen/struct.CargoCallbacks.html


let target_path = PathBuf::from(env::var("OUT_DIR").unwrap());
bindings
.write_to_file(target_path.join("cuda_types.rs"))
.expect("Couldn't write bindings!");

println!(
"Wrote bindings to {}",
target_path.join("cuda_types.rs").display()
);
}
213 changes: 213 additions & 0 deletions sdac-lib/src/device.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
#![allow(non_upper_case_globals)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]

use service::*;
use futures::executor::block_on;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a red flag outside of test code. It is super easy to deadlock when using block_on. tokio-rs/tokio#3717 for instance. I don't know what motivates the use of it yet, but would really encourage you to either write blocking that is completely async ignorant, or await-using code. This permits you to push a lot more concerns onto the compiler.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I think I see what you're doing: you're writing a library that exposes the CUDA API to some clients, but remotes it.

Thus you have this stack:

1 CUDA client
|
2 CUDA thunk (sync as CUDA is)
|
3 client - async - that talks to a remote CUDA instance

What I would do here is use crossbeam - just in the same process - so that rather than blocking in 2 to get 3 to do something you make 1 sync call to crossbeam to send and then 1 sync call to read.

  1. then becomes

3 client - sync - that talks to a local async client over crossbeam

This will add a layer, but remove all the risk of deadlocking etc. 2) becomes a real thread, and is now strictly separate from the async core.

use std::mem::size_of;
use std::ffi::CString;
use std::sync::{Mutex};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly encourage running cargo clippy before committing - its quite fast, and will identify little things like this - the {} can be removed - that make reading code just that little bit nicer.

use tarpc::{context};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a very nice import ordering that isn't quite popular enough to be put into rustfmt; but rust-analyzer supports: grouping by

  1. standard
  2. external crates
  3. in-repo crates
  4. this crate

Here this would look like

use std::ffi::CString;
use std::sync::Mutex;
use std::mem::size_of;

use futures::executor::block_on;
use tarpc::context;

use service::*;


pub fn cuGetErrorString(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these non (idiomatic rust names)[https://rust-lang.github.io/api-guidelines/naming.html] required by some external standard? If not, I really encourage not using them.

Can I read mixedCase ? yes; but everywhere you interoperate with idiomatic code it will add friction and visual dissonance.

If they are from an external standard, I suggest a trivial trait + impl that thunks all the mixedCase names to snake_case, isolating it in one place in your codebase; then use snake_case everywhere else.

client: &Mutex<service::CudaClient>,
error: CUresult,
pStr: *mut ::std::os::raw::c_char,
) -> CUresult {
let (strName, res) = block_on(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this case, there's no need to block_on:

remove the block_on, use .await at the end of .cuGetErrorString. tarpc should be giving you threadsafe behaviour - not pointing to a transient memory space. Though the use of a pointer across an RPC is really hard to reason about.

Concrete suggestion for improvement: don't do this across the RPC barrier: Construct the error object within the RPC server, not on the calling side.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I commend to you https://github.com/Rust-GPU/Rust-CUDA/blob/8a6cb734d21d5582052fa5b38089d1aa0f4d582f/crates/cust/src/error.rs#L98 as an alternative? That is, use the existing crate. I'm sure there's some reason you're not, but its not obvious to me.

client
.lock()
.unwrap()
.cuGetErrorString(context::current(), error),
)
.unwrap();

if res != cudaError_enum_CUDA_SUCCESS {
return res;
}

let cs = CString::new(strName).unwrap();
unsafe {
libc::strcpy(pStr, cs.as_ptr());
}

res
}

pub fn cuGetErrorName(
client: &Mutex<service::CudaClient>,
error: CUresult,
pStr: *mut ::std::os::raw::c_char,
) -> CUresult {
let (strName, res) = block_on(
client
.lock()
.unwrap()
.cuGetErrorName(context::current(), error),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments here; but also - combining both these RPCs into one slightly fatter call is probably an optimisation: in the success case you can have a Result<...>::Ok(), and in the error case grab the string at the same time.

)
.unwrap();

if res != cudaError_enum_CUDA_SUCCESS {
return res;
}

let cs = CString::new(strName).unwrap();
unsafe {
libc::strcpy(pStr, cs.as_ptr());
}

res
}

pub fn cuInit(client: &Mutex<service::CudaClient>,flags: ::std::os::raw::c_uint) -> CUresult {
block_on(
client
.lock()
.unwrap()
.cuInit(context::current(), flags),
)
.unwrap()
}

pub fn cuDeviceGetName(client: &Mutex<service::CudaClient>,
name: *mut ::std::os::raw::c_char,
len: ::std::os::raw::c_int,
dev: CUdevice,
) -> CUresult {
let (strName, res) = block_on(client.lock().unwrap().cuDeviceGetName(
context::current(),
len,
dev,
))
.unwrap();

let cs = CString::new(strName).unwrap();
unsafe {
libc::strcpy(name, cs.as_ptr());
}

res
}

pub fn cuDeviceGetCount(client: &Mutex<service::CudaClient>,count: *mut ::std::os::raw::c_int) -> CUresult {
let (cnt, res) = block_on(
client
.lock()
.unwrap()
.cuDeviceGetCount(context::current()),
)
.unwrap();

unsafe {
*count = cnt;
}

res
}

pub fn cuDeviceGet(client: &Mutex<service::CudaClient>,
device: *mut CUdevice,
ordinal: ::std::os::raw::c_int,
) -> CUresult {
let (dev, res) = block_on(
client
.lock()
.unwrap()
.cuDeviceGet(context::current(), ordinal),
)
.unwrap();

unsafe {
*device = dev;
}

res
}

pub fn cuMemAlloc_v2(client: &Mutex<service::CudaClient>,
dptr: *mut CUdeviceptr,
bytesize: ::std::os::raw::c_ulonglong,
) -> CUresult {
let (ptr, res) = block_on(
client
.lock()
.unwrap()
.cuMemAlloc_v2(context::current(), bytesize as usize),
)
.unwrap();

unsafe {
*dptr = ptr;
}

res
}

pub fn cuMemcpyDtoH_v2(client: &Mutex<service::CudaClient>,
dstHost: *mut ::std::os::raw::c_void,
srcDevice: CUdeviceptr,
ByteCount: ::std::os::raw::c_ulonglong,
) -> CUresult {
let (data, res) = block_on(
client
.lock()
.unwrap()
.cuMemcpyDtoH_v2(context::current(), srcDevice, ByteCount as usize),
)
.unwrap();

if res!= cudaError_enum_CUDA_SUCCESS {
return res;
}

unsafe {
libc::memcpy(dstHost, data.as_ptr() as *const libc::c_void, ByteCount as usize);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about the amount of unsafe - another plug for both doing the reification to safe constructs close to the actual CUDA calls, and for reusing the existing rust cuda bindings; I think they'll be very helpful


0
}

pub fn cuMemcpyHtoD_v2(client: &Mutex<service::CudaClient>,
dstDevice: CUdeviceptr,
srcHost: *const ::std::os::raw::c_void,
ByteCount: ::std::os::raw::c_ulonglong,
) -> CUresult {
let data = unsafe{ Vec::<u8>::from_raw_parts(srcHost, ByteCount, size_of(u8))};
block_on(
client
.lock()
.unwrap()
.cuMemcpyHtoD_v2(context::current(), dstDevice, data, ByteCount as usize),
)
.unwrap()
}

pub fn cuMemFree_v2(client: &Mutex<service::CudaClient>,dptr: CUdeviceptr) -> CUresult {
block_on(
client
.lock()
.unwrap()
.cuMemFree_v2(context::current(), dptr),
)
.unwrap()
}

pub fn cuDeviceTotalMem_v2(
client: &Mutex<service::CudaClient>,
bytes: *mut usize,
dev: CUdevice,
) -> CUresult{
let (cnt, res) = block_on(
client
.lock()
.unwrap()
.cuDeviceTotalMem_v2(context::current(), dev),
)
.unwrap();

unsafe {
*bytes = cnt;
}

res
}
33 changes: 33 additions & 0 deletions sdac-lib/src/global.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
use futures::executor::block_on;

use std::borrow::BorrowMut;

use std::sync::{Mutex, Once};
use tarpc::{client, tokio_serde::formats::Json};

static mut CUDA_CLIENT: Option<Mutex<service::CudaClient>> = None;
static INIT: Once = Once::new();

pub fn client<'a>() -> &'a Mutex<service::CudaClient> {
INIT.call_once(|| {
// Since this access is inside a call_once, before any other accesses, it is safe
unsafe {
let transport = block_on(tarpc::serde_transport::tcp::connect(
"[::1]:50055",
Json::default,
))
.unwrap();

// WorldClient is generated by the service attribute. It has a constructor `new` that takes a
// config and any Transport as input.
let client = service::CudaClient::new(client::Config::default(), transport).spawn();

*CUDA_CLIENT.borrow_mut() = Some(Mutex::new(client));
}
});

// As long as this function is the only place with access to the static variable,
// giving out a read-only borrow here is safe because it is guaranteed no more mutable
// references will exist at this point or in the future.
unsafe { CUDA_CLIENT.as_ref().unwrap() }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commend to you the once_cell crate. It will make this safer - literally, allow removal of unsafe{} - and a bit smaller too.

Loading