-
Notifications
You must be signed in to change notification settings - Fork 9
Steps for project migration
mihir jha edited this page Mar 15, 2024
·
19 revisions
Development mode
- Clone the repo and run
python3 -m pip install --editable .. - Check if the command
cmlutilis running or not. - By installing the CLI in editable mode, any changes done to the source code would reflect in real-time without the need for re-installing again.
For production
- To install from
mainbranch:
python3 -m pip install git+https://github.com/cloudera/cmlutils@main
- Or from a feature or release branch:
python3 -m pip install git+https://github.com/cloudera/cmlutils@<branch-name>
- Please carefully go through Legacy Engine Migration
- Check if user exists and is authorised to migrate the project
- Rsync Custom Runtime is available in Source Runtime Catalog
- Check if the Intermediate/Bastion machine has sufficient disk space available to download the project.
- Create
export-config.inifile inside<home-dir>/.cmlutilsdirectory. Inside theexport-config.inifile, create a section for each project, where you can include project-specific configurations. For common configurations shared across projects, place them in the DEFAULT section. - Example export-config.ini file:
[DEFAULT]
url=<Source-Workspace-url>
output_dir=~/Documents/temp_dir
ca_path=~/Documents/custom-ca-source.pem
username=user-default
apiv1_key=default-dummy-key
[Project-A]
username=user-1
apiv1_key=user1-api-key
[Project-B]
username=user-2
apiv1_key=user-2-api-key
[Project-C] # Uses [DEFAULT] configuration as it doesn't have specific configuration
Configuration used:
- username: username of the user who is migrating the project. (Mandatory)
- url: Source workspace URL (Mandatory)
- apiv1_key: Source API v1/Legacy API key (Mandatory)
- output_dir: temporary directory on the local machine where the project data/metadata would be stored. (Mandatory)
- ca_path: path to a CA (Certifying Authority) bundle to use, in case python is not able to pick up CA from the system and ssl certificate verification fails. Issue is generally seen with MacOS. (Optional)
-
If you wish to skip certain files or directories during export, create
.exportignorefile at the root of Source project (i.e. /home/cdsw). The.exportignorefile follows the same semantics as that of.gitgnore. -
To export the project run the following command:
cmlutil project export -p "Project-A"
or
cmlutil project export -p "Project-C"
Note: Project-name above should match one of the section names in the export-config.ini file.
- Folder with the project name will be created inside the output directory
(~/Documents/temp_dir). If the project folder already exists, then the data will be overwritten. - All the project files, artifacts and logs corresponding to the project will be downloaded in the project folder.
- Export metrics JSON will be created that will have info related to the exported project
- Check if user exists and is authorised to migrate the project
- Rsync Custom Runtime is available in Target Runtime Catalog
- Check if local output directory and project metadata file exists on the Intermediate/Bastion machine.
- Create
import-config.inifile inside<home-dir>/.cmlutilsdirectory. Inside theimport-config.inifile, create a section for each project, where you can include project-specific configurations. For common configurations shared across projects, place them in the DEFAULT section.
Example file:
[DEFAULT]
url=<Destination-Workspace-url>
output_dir=~/Documents/temp_dir
ca_path=~/Documents/custom-ca-target.pem
username=user-default
apiv1_key=user-default-dummy-key
[Project-A]
username=user-1
apiv1_key=user-1-api-key
[Project-B]
username=user-2
apiv1_key=user-2-api-key
[Project-C] # Uses [DEFAULT] configuration as it doesn't have specific configuration
Configuration used:
- username: username of a user who is migrating the project. (Mandatory)
- url: Target workspace URL (Mandatory)
- apiv1_key: Target API v1/Legacy API key (Mandatory)
- output_dir: temporary directory on the local machine from where the project will be uploaded. (Mandatory)
- ca_path: path to a CA (Certifying Authority) bundle to use, in case python is not able to pick up CA from the system and ssl certificate verification fails. Issue is generally seen with MacOS. (Optional)
- To import the project run the following command:
cmlutil project import -p "Project-A"
or
cmlutil project import -p "Project-B"
Note: Project-name above should match one of the section names in the import-config.ini file.
- The project will be created in the destination workspace if it does not exist already.
- Import metrics JSON will be created that will have info related to the imported project
- To import a project and initiate validation, execute the following commands:
cmlutil project import -p "Project-A" -v
or
cmlutil project import -p "Project-B" --verify
This command initiates a session in the source and validates the following aspects:
- Consistency of project files between the source and local directories.
- Consistency of project files between the local directory and the destination.
- Consistency in the count of Jobs, Models, and Applications between the source and destination.
- Consistency in the metadata of Jobs, Models, and Applications between the source and destination.
These validations ensure the integrity and accuracy of the project import process.