Skip to content

Steps for project migration

mihir jha edited this page Mar 15, 2024 · 19 revisions

Install the utility on third machine or bastion host:

Development mode

  1. Clone the repo and run python3 -m pip install --editable . .
  2. Check if the command cmlutil is running or not.
  3. By installing the CLI in editable mode, any changes done to the source code would reflect in real-time without the need for re-installing again.

For production

  1. To install from main branch:
python3 -m pip install git+https://github.com/cloudera/cmlutils@main
  1. Or from a feature or release branch:
python3 -m pip install git+https://github.com/cloudera/cmlutils@<branch-name>

Pre-Migration Step/s

Pre-Export Validations

  • Check if user exists and is authorised to migrate the project
  • Rsync Custom Runtime is available in Source Runtime Catalog
  • Check if the Intermediate/Bastion machine has sufficient disk space available to download the project.

Export steps:

  • Create export-config.ini file inside <home-dir>/.cmlutils directory. Inside the export-config.ini file, create a section for each project, where you can include project-specific configurations. For common configurations shared across projects, place them in the DEFAULT section.
  • Example export-config.ini file:
[DEFAULT]
url=<Source-Workspace-url>
output_dir=~/Documents/temp_dir
ca_path=~/Documents/custom-ca-source.pem
username=user-default
apiv1_key=default-dummy-key

[Project-A]
username=user-1
apiv1_key=user1-api-key

[Project-B]
username=user-2
apiv1_key=user-2-api-key

[Project-C] # Uses [DEFAULT] configuration as it doesn't have specific configuration

Configuration used:

  1. username: username of the user who is migrating the project. (Mandatory)
  2. url: Source workspace URL (Mandatory)
  3. apiv1_key: Source API v1/Legacy API key (Mandatory)
  4. output_dir: temporary directory on the local machine where the project data/metadata would be stored. (Mandatory)
  5. ca_path: path to a CA (Certifying Authority) bundle to use, in case python is not able to pick up CA from the system and ssl certificate verification fails. Issue is generally seen with MacOS. (Optional)
  • If you wish to skip certain files or directories during export, create .exportignore file at the root of Source project (i.e. /home/cdsw). The .exportignore file follows the same semantics as that of .gitgnore.

  • To export the project run the following command:

cmlutil project export -p "Project-A" 
or
cmlutil project export -p "Project-C"

Note: Project-name above should match one of the section names in the export-config.ini file.

  • Folder with the project name will be created inside the output directory (~/Documents/temp_dir). If the project folder already exists, then the data will be overwritten.
  • All the project files, artifacts and logs corresponding to the project will be downloaded in the project folder.
  • Export metrics JSON will be created that will have info related to the exported project

Pre-Import Validations

  • Check if user exists and is authorised to migrate the project
  • Rsync Custom Runtime is available in Target Runtime Catalog
  • Check if local output directory and project metadata file exists on the Intermediate/Bastion machine.

Import Steps:

  • Create import-config.ini file inside <home-dir>/.cmlutils directory. Inside the import-config.ini file, create a section for each project, where you can include project-specific configurations. For common configurations shared across projects, place them in the DEFAULT section.

Example file:

[DEFAULT]
url=<Destination-Workspace-url>
output_dir=~/Documents/temp_dir
ca_path=~/Documents/custom-ca-target.pem
username=user-default
apiv1_key=user-default-dummy-key

[Project-A]
username=user-1
apiv1_key=user-1-api-key

[Project-B]
username=user-2
apiv1_key=user-2-api-key

[Project-C] # Uses [DEFAULT] configuration as it doesn't have specific configuration

Configuration used:

  1. username: username of a user who is migrating the project. (Mandatory)
  2. url: Target workspace URL (Mandatory)
  3. apiv1_key: Target API v1/Legacy API key (Mandatory)
  4. output_dir: temporary directory on the local machine from where the project will be uploaded. (Mandatory)
  5. ca_path: path to a CA (Certifying Authority) bundle to use, in case python is not able to pick up CA from the system and ssl certificate verification fails. Issue is generally seen with MacOS. (Optional)
  • To import the project run the following command:
cmlutil project import -p "Project-A" 
or
cmlutil project import -p "Project-B"

Note: Project-name above should match one of the section names in the import-config.ini file.

  • The project will be created in the destination workspace if it does not exist already.
  • Import metrics JSON will be created that will have info related to the imported project

Import with Validation

  • To import a project and initiate validation, execute the following commands:
cmlutil project import -p "Project-A" -v 
or
cmlutil project import -p "Project-B" --verify

This command initiates a session in the source and validates the following aspects:

  1. Consistency of project files between the source and local directories.
  2. Consistency of project files between the local directory and the destination.
  3. Consistency in the count of Jobs, Models, and Applications between the source and destination.
  4. Consistency in the metadata of Jobs, Models, and Applications between the source and destination.

These validations ensure the integrity and accuracy of the project import process.

Clone this wiki locally