diff --git a/.env b/.env new file mode 100644 index 00000000000..6d99d85b3a7 --- /dev/null +++ b/.env @@ -0,0 +1,5 @@ +APP_IMAGE=gdcc/dataverse:unstable +POSTGRES_VERSION=17 +DATAVERSE_DB_USER=dataverse +SOLR_VERSION=9.8.0 +SKIP_DEPLOY=0 \ No newline at end of file diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 00000000000..9860024f70a --- /dev/null +++ b/.gitattributes @@ -0,0 +1,4 @@ +# https://www.git-scm.com/docs/gitattributes + +# This set mandatory LF line endings for .sh files preventing from windows users to having to change the value of their git config --global core.autocrlf to 'false' or 'input' +*.sh text eol=lf \ No newline at end of file diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 00000000000..5c9ad7581f8 --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1,8 @@ + +# Any container related stuff should be assigned to / reviewed by Oliver and/or Phil +modules/container-configbaker/** @poikilotherm @pdurbin +modules/container-base/** @poikilotherm @pdurbin +src/main/docker/** @poikilotherm @pdurbin +docker-compose-dev.yml @poikilotherm @pdurbin +.github/workflows/scripts/containers** @poikilotherm @pdurbin +.github/workflows/container_* @poikilotherm @pdurbin diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 00000000000..3dba7d52109 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,58 @@ +--- +name: Bug report +about: Did you encounter something unexpected or incorrect in the Dataverse software? + We'd like to hear about it! +title: '' +labels: 'Type: Bug' +assignees: '' + +--- + + +**What steps does it take to reproduce the issue?** + +* When does this issue occur? + + +* Which page(s) does it occurs on? + + +* What happens? + + +* To whom does it occur (all users, curators, superusers)? + + +* What did you expect to happen? + + + +**Which version of Dataverse are you using?** + + + +**Any related open or closed issues to this bug report?** + + +**Screenshots:** + +No matter the issue, screenshots are always welcome. + +To add a screenshot, please use one of the following formats and/or methods described here: + +* https://help.github.com/en/articles/file-attachments-on-issues-and-pull-requests +* + + +**Are you thinking about creating a pull request for this issue?** +Help is always welcome, is this bug something you or your organization plan to fix? diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 00000000000..7365cb4317c --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,39 @@ +--- +name: Feature request +about: Suggest an idea or new feature for the Dataverse software! +title: 'Feature Request:' +labels: 'Type: Feature' +assignees: '' + +--- + + + +**Overview of the Feature Request** + + +**What kind of user is the feature intended for?** +(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin) + + +**What inspired the request?** + + +**What existing behavior do you want changed?** + + +**Any brand new behavior do you want to add to Dataverse?** + + +**Any open or closed issues related to this feature request?** + +**Are you thinking about creating a pull request for this feature?** +Help is always welcome, is this feature something you or your organization plan to implement? diff --git a/.github/ISSUE_TEMPLATE/idea_proposal.md b/.github/ISSUE_TEMPLATE/idea_proposal.md new file mode 100644 index 00000000000..8cb6c7bfafe --- /dev/null +++ b/.github/ISSUE_TEMPLATE/idea_proposal.md @@ -0,0 +1,40 @@ +--- +name: Idea proposal +about: Propose a new idea for discussion to improve the Dataverse software! +title: 'Suggestion:' +labels: 'Type: Suggestion' +assignees: '' + +--- + + + +**Overview of the Suggestion** + + +**What kind of user is the suggestion intended for?** +(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin) + + +**What inspired this idea?** + + +**What existing behavior do you want changed?** + + +**Any brand new behavior do you want to add to Dataverse?** + + +**Any open or closed issues related to this suggestion?** + + +**Are you thinking about creating a pull request for this issue?** +Help is always welcome, is this idea something you or your organization plan to implement? diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 00000000000..f2a779bbf21 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,15 @@ +**What this PR does / why we need it**: + +**Which issue(s) this PR closes**: + +- Closes # + +**Special notes for your reviewer**: + +**Suggestions on how to test this**: + +**Does this PR introduce a user interface change? If mockups are available, please link/include them here**: + +**Is there a release notes update needed for this change?**: + +**Additional documentation**: diff --git a/.github/SECURITY.md b/.github/SECURITY.md new file mode 100644 index 00000000000..c36e26c8330 --- /dev/null +++ b/.github/SECURITY.md @@ -0,0 +1,7 @@ +# Security + +To report a security vulnerability please email security@dataverse.org as explained at https://guides.dataverse.org/en/latest/installation/config.html#reporting-security-issues + +Advice on securing your installation can be found at https://guides.dataverse.org/en/latest/installation/config.html#securing-your-installation + +Security practices and procedures used by the Dataverse team are described at https://guides.dataverse.org/en/latest/developers/security.html diff --git a/.github/actions/setup-maven/action.yml b/.github/actions/setup-maven/action.yml new file mode 100644 index 00000000000..4cf09f34231 --- /dev/null +++ b/.github/actions/setup-maven/action.yml @@ -0,0 +1,37 @@ +--- +name: "Setup Maven and Caches" +description: "Determine Java version and setup Maven, including necessary caches." +inputs: + git-reference: + description: 'The git reference (branch/tag) to check out' + required: false + default: '${{ github.ref }}' + pom-paths: + description: "List of paths to Maven POM(s) for cache dependency setup" + required: false + default: 'pom.xml' +runs: + using: composite + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + ref: ${{ inputs.git-reference }} + - name: Determine Java version by reading the Maven property + shell: bash + run: | + echo "JAVA_VERSION=$(grep '' ${GITHUB_WORKSPACE}/modules/dataverse-parent/pom.xml | cut -f2 -d'>' | cut -f1 -d'<')" | tee -a ${GITHUB_ENV} + - name: Set up JDK ${{ env.JAVA_VERSION }} + id: setup-java + uses: actions/setup-java@v4 + with: + java-version: ${{ env.JAVA_VERSION }} + distribution: 'temurin' + cache: 'maven' + cache-dependency-path: ${{ inputs.pom-paths }} + - name: Download common cache on branch cache miss + if: ${{ steps.setup-java.outputs.cache-hit != 'true' }} + uses: actions/cache/restore@v4 + with: + key: dataverse-maven-cache + path: ~/.m2/repository diff --git a/.github/dependabot.yml b/.github/dependabot.yml new file mode 100644 index 00000000000..6325029dac1 --- /dev/null +++ b/.github/dependabot.yml @@ -0,0 +1,11 @@ +# Set update schedule for GitHub Actions +# https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/keeping-your-actions-up-to-date-with-dependabot + +version: 2 +updates: + + - package-ecosystem: "github-actions" + directory: "/" + schedule: + # Check for updates to GitHub Actions daily + interval: "daily" diff --git a/.github/workflows/check_property_files.yml b/.github/workflows/check_property_files.yml new file mode 100644 index 00000000000..505310aab35 --- /dev/null +++ b/.github/workflows/check_property_files.yml @@ -0,0 +1,32 @@ +name: "Properties Check" +on: + pull_request: + paths: + - "src/**/*.properties" + - "scripts/api/data/metadatablocks/*" +jobs: + duplicate_keys: + name: Duplicate Keys + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Run duplicates detection script + shell: bash + run: tests/check_duplicate_properties.sh + + metadata_blocks_properties: + name: Metadata Blocks Properties + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Setup GraalVM + Native Image + uses: graalvm/setup-graalvm@v1 + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + java-version: '21' + distribution: 'graalvm-community' + - name: Setup JBang + uses: jbangdev/setup-jbang@main + - name: Run metadata block properties verification script + shell: bash + run: tests/verify_mdb_properties.sh diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml new file mode 100644 index 00000000000..9f4e94b9f5b --- /dev/null +++ b/.github/workflows/codeql.yml @@ -0,0 +1,104 @@ +# For most projects, this workflow file will not need changing; you simply need +# to commit it to your repository. +# +# You may wish to alter this file to override the set of languages analyzed, +# or to provide custom queries or build logic. +# +# ******** NOTE ******** +# We have attempted to detect the languages in your repository. Please check +# the `language` matrix defined below to confirm you have the correct set of +# supported CodeQL languages. +# +name: "CodeQL Advanced" + +on: + push: + branches: [ "develop", "master" ] + pull_request: + branches: [ "develop", "master" ] + schedule: + - cron: '30 6 * * 4' + +jobs: + analyze: + name: Analyze (${{ matrix.language }}) + # Runner size impacts CodeQL analysis time. To learn more, please see: + # - https://gh.io/recommended-hardware-resources-for-running-codeql + # - https://gh.io/supported-runners-and-hardware-resources + # - https://gh.io/using-larger-runners (GitHub.com only) + # Consider using larger runners or machines with greater resources for possible analysis time improvements. + runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }} + permissions: + # required for all workflows + security-events: write + + # required to fetch internal or private CodeQL packs + packages: read + + # only required for workflows in private repositories + actions: read + contents: read + + strategy: + fail-fast: false + matrix: + include: + - language: actions + build-mode: none + - language: java-kotlin + build-mode: none # This mode only analyzes Java. Set this to 'autobuild' or 'manual' to analyze Kotlin too. + - language: javascript-typescript + build-mode: none + - language: python + build-mode: none + # CodeQL supports the following values keywords for 'language': 'actions', 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift' + # Use `c-cpp` to analyze code written in C, C++ or both + # Use 'java-kotlin' to analyze code written in Java, Kotlin or both + # Use 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both + # To learn more about changing the languages that are analyzed or customizing the build mode for your analysis, + # see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning. + # If you are analyzing a compiled language, you can modify the 'build-mode' for that language to customize how + # your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + # Add any setup steps before running the `github/codeql-action/init` action. + # This includes steps like installing compilers or runtimes (`actions/setup-node` + # or others). This is typically only required for manual builds. + # - name: Setup runtime (example) + # uses: actions/setup-example@v1 + + # Initializes the CodeQL tools for scanning. + - name: Initialize CodeQL + uses: github/codeql-action/init@v3 + with: + languages: ${{ matrix.language }} + build-mode: ${{ matrix.build-mode }} + # If you wish to specify custom queries, you can do so here or in a config file. + # By default, queries listed here will override any specified in a config file. + # Prefix the list here with "+" to use these queries and those in the config file. + + # For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs + # queries: security-extended,security-and-quality + + # If the analyze step fails for one of the languages you are analyzing with + # "We were unable to automatically build your code", modify the matrix above + # to set the build mode to "manual" for that language. Then modify this step + # to build your code. + # â„šī¸ Command-line programs to run using the OS shell. + # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun + - if: matrix.build-mode == 'manual' + shell: bash + run: | + echo 'If you are using a "manual" build mode for one or more of the' \ + 'languages you are analyzing, replace this with the commands to build' \ + 'your code, for example:' + echo ' make bootstrap' + echo ' make release' + exit 1 + + - name: Perform CodeQL Analysis + uses: github/codeql-action/analyze@v3 + with: + category: "/language:${{matrix.language}}" diff --git a/.github/workflows/container_app_pr.yml b/.github/workflows/container_app_pr.yml new file mode 100644 index 00000000000..a4c52805156 --- /dev/null +++ b/.github/workflows/container_app_pr.yml @@ -0,0 +1,95 @@ +--- +name: Preview Application Container Image + +# TODO: merge this workflow into the existing container_app_push.yaml flow - there's not much difference! + +on: + # We only run the push commands if we are asked to by an issue comment with the correct command. + # This workflow is always taken from the default branch and runs in repo context with access to secrets. + repository_dispatch: + types: [ push-image-command ] + +env: + PLATFORMS: "linux/amd64,linux/arm64" + +jobs: + deploy: + name: "Package & Push" + runs-on: ubuntu-latest + # Only run in upstream repo - avoid unnecessary runs in forks + if: ${{ github.repository_owner == 'IQSS' }} + steps: + # Checkout the pull request code as when merged + - uses: actions/checkout@v4 + with: + ref: 'refs/pull/${{ github.event.client_payload.pull_request.number }}/merge' + - uses: actions/setup-java@v4 + with: + java-version: "17" + distribution: 'adopt' + - uses: actions/cache@v4 + with: + path: ~/.m2 + key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }} + restore-keys: ${{ runner.os }}-m2 + + # Note: Accessing, pushing tags etc. to GHCR will only succeed in upstream because secrets. + - name: Login to Github Container Registry + uses: docker/login-action@v3 + with: + registry: ghcr.io + username: ${{ secrets.GHCR_USERNAME }} + password: ${{ secrets.GHCR_TOKEN }} + + - name: Set up QEMU for multi-arch builds + uses: docker/setup-qemu-action@v3 + + # Get the image tag from either the command or default to branch name (Not used for now) + #- name: Get the target tag name + # id: vars + # run: | + # tag=${{ github.event.client_payload.slash_command.args.named.tag }} + # if [[ -z "$tag" ]]; then tag=$(echo "${{ github.event.client_payload.pull_request.head.ref }}" | tr '\\/_:&+,;#*' '-'); fi + # echo "IMAGE_TAG=$tag" >> $GITHUB_ENV + + # Set image tag to branch name of the PR + - name: Set image tag to branch name + run: | + echo "IMAGE_TAG=$(echo "${{ github.event.client_payload.pull_request.head.ref }}" | tr '\\/_:&+,;#*' '-')" >> $GITHUB_ENV + + # Necessary to split as otherwise the submodules are not available (deploy skips install) + - name: Build app and configbaker container image with local architecture and submodules (profile will skip tests) + run: > + mvn -B -f modules/dataverse-parent + -P ct -pl edu.harvard.iq:dataverse -am + install + - name: Deploy multi-arch application and configbaker container image + run: > + mvn -Pct deploy + -Dapp.image.tag=${{ env.IMAGE_TAG }} + -Ddocker.registry=ghcr.io -Ddocker.platforms=${{ env.PLATFORMS }} + + - uses: marocchino/sticky-pull-request-comment@v2 + with: + header: registry-push + hide_and_recreate: true + hide_classify: "OUTDATED" + number: ${{ github.event.client_payload.pull_request.number }} + message: | + :package: Pushed preview images as + ``` + ghcr.io/gdcc/dataverse:${{ env.IMAGE_TAG }} + ``` + ``` + ghcr.io/gdcc/configbaker:${{ env.IMAGE_TAG }} + ``` + :ship: [See on GHCR](https://github.com/orgs/gdcc/packages/container). Use by referencing with full name as printed above, mind the registry name. + + # Leave a note when things have gone sideways + - uses: peter-evans/create-or-update-comment@v4 + if: ${{ failure() }} + with: + issue-number: ${{ github.event.client_payload.pull_request.number }} + body: > + :package: Could not push preview images :disappointed:. + See [log](https://github.com/IQSS/dataverse/actions/runs/${{ github.run_id }}) for details. diff --git a/.github/workflows/container_app_push.yml b/.github/workflows/container_app_push.yml new file mode 100644 index 00000000000..0472ab97dee --- /dev/null +++ b/.github/workflows/container_app_push.yml @@ -0,0 +1,158 @@ +--- +name: Application Container Image + +on: + # We are deliberately *not* running on push events here to avoid double runs. + # Instead, push events will trigger from the base image and maven unit tests via workflow_call. + workflow_call: + inputs: + base-image-ref: + type: string + description: "Reference of the base image to build on in full qualified form [/]/:" + required: false + default: "gdcc/base:unstable" + pull_request: + branches: + - develop + - master + paths: + - 'src/main/docker/**' + - 'modules/container-configbaker/**' + - '.github/workflows/container_app_push.yml' + +env: + IMAGE_TAG: unstable + REGISTRY: "" # Empty means default to Docker Hub + PLATFORMS: "linux/amd64,linux/arm64" + +jobs: + build: + name: "Build & Test" + runs-on: ubuntu-latest + permissions: + contents: read + packages: write + pull-requests: write + # Only run in upstream repo - avoid unnecessary runs in forks + if: ${{ github.repository_owner == 'IQSS' }} + + steps: + - name: Checkout and Setup Maven + uses: IQSS/dataverse/.github/actions/setup-maven@develop + with: + pom-paths: | + pom.xml + modules/container-configbaker/pom.xml + modules/dataverse-parent/pom.xml + + # TODO: Add a filter step here, that avoids building the image if this is a PR and there are other files touched than declared above. + # Use https://github.com/dorny/paths-filter to solve this. This will ensure we do not run this twice if this workflow + # will be triggered by the other workflows already (base image or java changes) + # To become a part of #10618. + + - name: Build app and configbaker container image with local architecture and submodules (profile will skip tests) + run: > + mvn -B -f modules/dataverse-parent + -P ct -pl edu.harvard.iq:dataverse -am + $( [[ -n "${{ inputs.base-image-ref }}" ]] && echo "-Dbase.image=${{ inputs.base-image-ref }}" ) + install + + # TODO: add smoke / integration testing here (add "-Pct -DskipIntegrationTests=false") + + # Note: Accessing, pushing tags etc. to DockerHub or GHCR will only succeed in upstream because secrets. + # We check for them here and subsequent jobs can rely on this to decide if they shall run. + check-secrets: + needs: build + name: Check for Secrets Availability + runs-on: ubuntu-latest + outputs: + available: ${{ steps.secret-check.outputs.available }} + steps: + - id: secret-check + # perform secret check & put boolean result as an output + shell: bash + run: | + if [ "${{ secrets.DOCKERHUB_TOKEN }}" != '' ]; then + echo "available=true" >> $GITHUB_OUTPUT; + else + echo "available=false" >> $GITHUB_OUTPUT; + fi + + deploy: + needs: check-secrets + name: "Package & Publish" + runs-on: ubuntu-latest + # Only run this job if we have access to secrets. This is true for events like push/schedule which run in the + # context of the main repo, but for PRs only true if coming from the main repo! Forks have no secret access. + # + # Note: The team's decision was to not auto-deploy an image on any git push where no PR exists (yet). + # Accordingly, only run for push events on the 'develop' branch. + if: needs.check-secrets.outputs.available == 'true' && + ( github.event_name != 'push' || ( github.event_name == 'push' && github.ref_name == 'develop' )) + steps: + - name: Checkout and Setup Maven + uses: IQSS/dataverse/.github/actions/setup-maven@develop + with: + pom-paths: | + pom.xml + modules/container-configbaker/pom.xml + modules/dataverse-parent/pom.xml + + # Depending on context, we push to different targets. Login accordingly. + - if: github.event_name != 'pull_request' + name: Log in to Docker Hub registry + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_TOKEN }} + - if: ${{ github.event_name == 'pull_request' }} + name: Login to Github Container Registry + uses: docker/login-action@v3 + with: + registry: ghcr.io + username: ${{ secrets.GHCR_USERNAME }} + password: ${{ secrets.GHCR_TOKEN }} + + - name: Set up QEMU for multi-arch builds + uses: docker/setup-qemu-action@v3 + + - name: Add rolling image tag when pushing to develop + if: ${{ github.event_name == 'push' && github.ref_name == 'develop' }} + run: | + echo "ADDITIONAL_TAGS=-Ddocker.tags.upcoming=$( mvn initialize help:evaluate -Pct -Dexpression=app.image.tag -Dapp.image.tag='${app.image.version}-${base.image.flavor}' -q -DforceStdout )" | tee -a "$GITHUB_ENV" + - name: Re-set image tag and container registry when on PR + if: ${{ github.event_name == 'pull_request' }} + run: | + echo "IMAGE_TAG=$(echo "$GITHUB_HEAD_REF" | tr '\\/_:&+,;#*' '-')" | tee -a "$GITHUB_ENV" + echo "REGISTRY='-Ddocker.registry=ghcr.io'" | tee -a "$GITHUB_ENV" + + # Necessary to split as otherwise the submodules are not available (deploy skips install) + - name: Build app and configbaker container image with local architecture and submodules (profile will skip tests) + run: > + mvn -B -f modules/dataverse-parent + -P ct -pl edu.harvard.iq:dataverse -am + $( [[ -n "${{ inputs.base-image-ref }}" ]] && echo "-Dbase.image=${{ inputs.base-image-ref }}" ) + install + - name: Deploy multi-arch application and configbaker container image + run: > + mvn + -Dapp.image.tag=${{ env.IMAGE_TAG }} ${{ env.ADDITIONAL_TAGS }} + $( [[ -n "${{ inputs.base-image-ref }}" ]] && echo "-Dbase.image=${{ inputs.base-image-ref }}" ) + ${{ env.REGISTRY }} -Ddocker.platforms=${{ env.PLATFORMS }} + -P ct deploy + + - uses: marocchino/sticky-pull-request-comment@v2 + if: ${{ github.event_name == 'pull_request' }} + with: + header: registry-push + hide_and_recreate: true + hide_classify: "OUTDATED" + message: | + :package: Pushed preview images as + ``` + ghcr.io/gdcc/dataverse:${{ env.IMAGE_TAG }} + ``` + ``` + ghcr.io/gdcc/configbaker:${{ env.IMAGE_TAG }} + ``` + :ship: [See on GHCR](https://github.com/orgs/gdcc/packages/container). Use by referencing with full name as printed above, mind the registry name. diff --git a/.github/workflows/container_base_push.yml b/.github/workflows/container_base_push.yml new file mode 100644 index 00000000000..3b375e13864 --- /dev/null +++ b/.github/workflows/container_base_push.yml @@ -0,0 +1,111 @@ +--- +name: Base Container Image + +on: + push: + branches: + - 'develop' + # "Path filters are not evaluated for pushes of tags" https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore + paths: + - 'modules/container-base/**' + - '!modules/container-base/src/backports/**' + - '!modules/container-base/README.md' + - 'modules/dataverse-parent/pom.xml' + - '.github/workflows/container_base_push.yml' + + # These TODOs are left for #10618 + # TODO: we are missing a workflow_call option here, so we can trigger this flow from pr comments and maven tests (keep the secrets availability in mind!) + # TODO: we are missing a pull_request option here (filter for stuff that would trigger the maven runs!) so we can trigger preview builds for them when coming from the main repo (keep the secrets availability in mind!) + +env: + PLATFORMS: linux/amd64,linux/arm64 + DEVELOPMENT_BRANCH: develop + +jobs: + build: + name: Base Image + runs-on: ubuntu-latest + permissions: + contents: read + packages: read + # Only run in upstream repo - avoid unnecessary runs in forks + if: ${{ github.repository_owner == 'IQSS' }} + outputs: + base-image-ref: ${{ steps.determine-name.outputs.full-ref }} + + steps: + - name: Checkout and Setup Maven + uses: IQSS/dataverse/.github/actions/setup-maven@develop + with: + pom-paths: modules/container-base/pom.xml + + # Note: Accessing, pushing tags etc. to DockerHub will only succeed in upstream and + # on events in context of upstream because secrets. PRs run in context of forks by default! + - name: Log in to the Container registry + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_TOKEN }} + + # In case this is a push to develop, we care about buildtime. + # Configure a remote ARM64 build host in addition to the local AMD64 in two steps. + - name: Setup SSH agent + uses: webfactory/ssh-agent@v0.9.1 + with: + ssh-private-key: ${{ secrets.BUILDER_ARM64_SSH_PRIVATE_KEY }} + - name: Provide the known hosts key and the builder config + run: | + echo "${{ secrets.BUILDER_ARM64_SSH_HOST_KEY }}" > ~/.ssh/known_hosts + mkdir -p modules/container-base/target/buildx-state/buildx/instances + cat > modules/container-base/target/buildx-state/buildx/instances/maven << EOF + { "Name": "maven", + "Driver": "docker-container", + "Dynamic": false, + "Nodes": [{"Name": "maven0", + "Endpoint": "unix:///var/run/docker.sock", + "Platforms": [{"os": "linux", "architecture": "amd64"}], + "DriverOpts": null, + "Flags": ["--allow-insecure-entitlement=network.host"], + "Files": null}, + {"Name": "maven1", + "Endpoint": "ssh://${{ secrets.BUILDER_ARM64_SSH_CONNECTION }}", + "Platforms": [{"os": "linux", "architecture": "arm64"}], + "DriverOpts": null, + "Flags": ["--allow-insecure-entitlement=network.host"], + "Files": null}]} + EOF + + # Determine the base image name we are going to use from here on + - name: Determine base image name + id: determine-name + run: | + BASE_IMAGE=$( mvn initialize help:evaluate -Pct -f modules/container-base -Dexpression=base.image -q -DforceStdout ) + BASE_IMAGE_UPCOMING=$( mvn initialize help:evaluate -Pct -f modules/container-base -Dexpression=base.image -Dbase.image.tag.suffix="" -q -DforceStdout ) + + echo "BASE_IMAGE=${BASE_IMAGE}" | tee -a "${GITHUB_ENV}" + echo "BASE_IMAGE_UPCOMING=${BASE_IMAGE_UPCOMING}" | tee -a "${GITHUB_ENV}" + echo "full-ref=${BASE_IMAGE_UPCOMING}" | tee -a "$GITHUB_OUTPUT" + + - name: Configure update of "latest" tag for development branch + id: develop-tag + run: | + echo "tag-options=-Ddocker.tags.develop=unstable -Ddocker.tags.upcoming=${BASE_IMAGE_UPCOMING#*:}" | tee -a "${GITHUB_OUTPUT}" + + - name: Deploy multi-arch base container image to Docker Hub + id: build + run: | + mvn -f modules/container-base -Pct deploy -Ddocker.noCache -Ddocker.platforms=${{ env.PLATFORMS }} \ + -Ddocker.imagePropertyConfiguration=override ${{ steps.develop-tag.outputs.tag-options }} + + push-app-img: + name: "Rebase & Publish App Image" + permissions: + contents: read + packages: write + pull-requests: write + secrets: inherit + needs: + - build + uses: ./.github/workflows/container_app_push.yml + with: + base-image-ref: ${{ needs.build.outputs.base-image-ref }} diff --git a/.github/workflows/container_maintenance.yml b/.github/workflows/container_maintenance.yml new file mode 100644 index 00000000000..ec149c7b377 --- /dev/null +++ b/.github/workflows/container_maintenance.yml @@ -0,0 +1,277 @@ +--- +name: Container Images Scheduled Maintenance + +on: + # TODO: think about adding a (filtered) push event trigger here in case we change the patches + # --- + # Allow manual workflow triggers in case we need to repair images on Docker Hub (build and replace) + workflow_dispatch: + inputs: + force_build: + type: boolean + required: false + default: false + description: "Build and deploy even if no newer Java images or package updates are found." + dry_run: + type: boolean + required: false + default: false + description: "Run in dry-run mode (no builds, verify logic)" + damp_run: + type: boolean + required: false + default: false + description: "Run in damp-run mode (build but don't push)" + schedule: + - cron: '23 3 * * 0' # Run for 'develop' every Sunday at 03:23 UTC + release: + types: [published] + +env: + PLATFORMS: linux/amd64,linux/arm64 + NUM_PAST_RELEASES: 3 + +jobs: + discover: + name: Discover supported releases + runs-on: ubuntu-latest + permissions: + contents: read + packages: read + outputs: + branches: ${{ steps.discover.outputs.branches }} + develop-branch: ${{ steps.discover.outputs.develop-branch }} + steps: + - name: Discover maintained releases + id: discover + run: | + DEVELOPMENT_BRANCH=$( curl -f -sS https://api.github.com/repos/${{ github.repository }} | jq -r '.default_branch' ) + echo "develop-branch=$DEVELOPMENT_BRANCH" | tee -a "${GITHUB_OUTPUT}" + + SUPPORTED_BRANCHES=$( curl -f -sS https://api.github.com/repos/IQSS/dataverse/releases | jq -r " .[0:${{ env.NUM_PAST_RELEASES }}] | .[].tag_name, \"${DEVELOPMENT_BRANCH}\" " | tr "\n" " " ) + echo "branches=$SUPPORTED_BRANCHES" | tee -a "${GITHUB_OUTPUT}" + + base-image: + name: Base Image Matrix Build + runs-on: ubuntu-latest + permissions: + contents: read + packages: read + # Only run in upstream repo - avoid unnecessary runs in forks + if: ${{ github.repository_owner == 'IQSS' }} + needs: + - discover + outputs: + # This is a JSON map with keys of branch names (supported releases & develop) and values containing an array of known image tags for the branch + # Example: {"v6.6": ["latest", "6.6-noble", "6.6-noble-r1"], "v6.5": ["6.5-noble", "6.5-noble-r5"], "v6.4": ["6.4-noble", "6.4-noble-r12"], "develop": ["unstable", "6.7-noble", "6.7-noble-p6.2025.3-j17"]} + supported_tag_matrix: ${{ steps.execute.outputs.supported_tag_matrix }} + + # This is a JSON list containing a flattened map of branch names and the latest non-rolling tag + # Example: [ "v6.6=gdcc/base:6.6-noble-r1", "v6.5=gdcc/base:6.5-noble-r5", "v6.4=gdcc/base:6.4-noble-r12", "develop=gdcc/base:6.7-noble-p6.2025.3-j17" ] + rebuilt_images: ${{ steps.execute.outputs.rebuilt_images }} + + steps: + - name: Checkout and Setup Maven + uses: IQSS/dataverse/.github/actions/setup-maven@develop + with: + pom-paths: modules/container-base/pom.xml + + # Note: Accessing, pushing tags etc. to DockerHub will only succeed in upstream and + # on events in context of upstream because secrets. PRs run in context of forks by default! + - name: Log in to the Container registry + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_TOKEN }} + - name: Set up QEMU for multi-arch builds + uses: docker/setup-qemu-action@v3 + with: + platforms: ${{ env.PLATFORMS }} + + # Execute matrix build for the discovered branches + - name: Execute build matrix script + id: execute + run: > + FORCE_BUILD=$( [[ "${{ inputs.force_build }}" = "true" ]] && echo 1 || echo 0 ) + DRY_RUN=$( [[ "${{ inputs.dry_run }}" = "true" ]] && echo 1 || echo 0 ) + DAMP_RUN=$( [[ "${{ inputs.damp_run }}" = "true" ]] && echo 1 || echo 0 ) + DEVELOPMENT_BRANCH=${{ needs.discover.outputs.develop-branch }} + .github/workflows/scripts/containers/maintain-base.sh ${{ needs.discover.outputs.branches }} + + application-image: + name: "Application Image Matrix Build" + runs-on: ubuntu-latest + permissions: + contents: read + packages: read + needs: + - discover + - base-image + # Only run in upstream repo - avoid unnecessary runs in forks. + # TODO: If we add a push trigger later, we might want to prepend "always() &&" to ignore the status of the base job. Needs further investigation. + if: ${{ github.repository_owner == 'IQSS' }} + outputs: + supported_tag_matrix: ${{ steps.execute.outputs.supported_tag_matrix }} + rebuilt_images: ${{ steps.execute.outputs.rebuilt_images }} + steps: + - name: Checkout and Setup Maven + uses: IQSS/dataverse/.github/actions/setup-maven@develop + with: + pom-paths: ./pom.xml + + # Note: Accessing, pushing tags etc. to DockerHub will only succeed in upstream and + # on events in context of upstream because secrets. PRs run in context of forks by default! + - name: Log in to the Container registry + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_TOKEN }} + - name: Set up QEMU for multi-arch builds + uses: docker/setup-qemu-action@v3 + with: + platforms: ${{ env.PLATFORMS }} + + # Execute matrix build for the discovered branches + - name: Execute build matrix script + id: execute + run: > + FORCE_BUILD=$( [[ "${{ inputs.force_build }}" = "true" ]] && echo 1 || echo 0 ) + DRY_RUN=$( [[ "${{ inputs.dry_run }}" = "true" ]] && echo 1 || echo 0 ) + DAMP_RUN=$( [[ "${{ inputs.damp_run }}" = "true" ]] && echo 1 || echo 0 ) + DEVELOPMENT_BRANCH=${{ needs.discover.outputs.develop-branch }} + .github/workflows/scripts/containers/maintain-application.sh ${{ needs.discover.outputs.branches }} + + configbaker-image: + name: "ConfigBaker Image Matrix Build" + runs-on: ubuntu-latest + permissions: + contents: read + packages: read + needs: + - discover + # Only run in upstream repo - avoid unnecessary runs in forks. + # TODO: If we add a push trigger later, we might want to prepend "always() &&" to ignore the status of the base job. Needs further investigation. + if: ${{ github.repository_owner == 'IQSS' }} + outputs: + supported_tag_matrix: ${{ steps.execute.outputs.supported_tag_matrix }} + rebuilt_images: ${{ steps.execute.outputs.rebuilt_images }} + steps: + - name: Checkout and Setup Maven + uses: IQSS/dataverse/.github/actions/setup-maven@develop + with: + pom-paths: ./pom.xml + + # Note: Accessing, pushing tags etc. to DockerHub will only succeed in upstream and + # on events in context of upstream because secrets. PRs run in context of forks by default! + - name: Log in to the Container registry + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_TOKEN }} + - name: Set up QEMU for multi-arch builds + uses: docker/setup-qemu-action@v3 + with: + platforms: ${{ env.PLATFORMS }} + - name: Setup Trivy binary for vulnerability scanning + uses: aquasecurity/setup-trivy@v0.2.4 + with: + version: v0.63.0 + + # Execute matrix build for the discovered branches + - name: Execute build matrix script + id: execute + run: > + FORCE_BUILD=$( [[ "${{ inputs.force_build }}" = "true" ]] && echo 1 || echo 0 ) + DRY_RUN=$( [[ "${{ inputs.dry_run }}" = "true" ]] && echo 1 || echo 0 ) + DAMP_RUN=$( [[ "${{ inputs.damp_run }}" = "true" ]] && echo 1 || echo 0 ) + DEVELOPMENT_BRANCH=${{ needs.discover.outputs.develop-branch }} + .github/workflows/scripts/containers/maintain-configbaker.sh ${{ needs.discover.outputs.branches }} + + hub-description: + name: Push description to DockerHub + runs-on: ubuntu-latest + permissions: + contents: read + packages: read + needs: + - base-image + - application-image + - configbaker-image + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + ### BASE IMAGE + - name: Render README for base image + if: toJSON(needs.base-image.outputs.rebuilt_images) != '[]' + run: | + TAGS_JSON='${{ needs.base-image.outputs.supported_tag_matrix }}' + echo "$TAGS_JSON" | jq -r 'keys | sort | reverse | .[]' | + while IFS= read -r branch; do + echo \ + "- \`$( echo "$TAGS_JSON" | jq --arg v "$branch" -r '.[$v] | join("`, `")' )\`" \ + "([Dockerfile](https://github.com/IQSS/dataverse/blob/${branch}/modules/container-base/src/main/docker/Dockerfile)," \ + "[Patches](https://github.com/IQSS/dataverse/blob/develop/modules/container-base/src/backports/${branch}))" \ + | tee -a "${GITHUB_WORKSPACE}/tags-base.md" + done + sed -i -e "/<\!-- TAG BLOCK HERE -->/r ${GITHUB_WORKSPACE}/tags-base.md" "./modules/container-base/README.md" + cat "./modules/container-base/README.md" + - name: Push description to DockerHub for base image + if: ${{ ! inputs.dry_run && ! inputs.damp_run && toJSON(needs.base-image.outputs.rebuilt_images) != '[]' }} + uses: peter-evans/dockerhub-description@v4 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_TOKEN }} + repository: gdcc/base + short-description: "Dataverse Base Container image providing Payara application server and optimized configuration" + readme-filepath: ./modules/container-base/README.md + + ### APPLICATION IMAGE + - name: Render README for application image + if: toJSON(needs.application-image.outputs.rebuilt_images) != '[]' + run: | + TAGS_JSON='${{ needs.application-image.outputs.supported_tag_matrix }}' + echo "$TAGS_JSON" | jq -r 'keys | sort | reverse | .[]' | + while IFS= read -r branch; do + echo \ + "- \`$( echo "$TAGS_JSON" | jq --arg v "$branch" -r '.[$v] | join("`, `")' )\`" \ + "([Dockerfile](https://github.com/IQSS/dataverse/blob/${branch}/src/main/docker/Dockerfile)," \ + "[Patches](https://github.com/IQSS/dataverse/blob/develop/src/backports/${branch}))" \ + | tee -a "${GITHUB_WORKSPACE}/tags-app.md" + done + sed -i -e "/<\!-- TAG BLOCK HERE -->/r ${GITHUB_WORKSPACE}/tags-app.md" "./src/main/docker/README.md" + cat "./src/main/docker/README.md" + - name: Push description to DockerHub for application image + if: ${{ ! inputs.dry_run && ! inputs.damp_run && toJSON(needs.application-image.outputs.rebuilt_images) != '[]' }} + uses: peter-evans/dockerhub-description@v4 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_TOKEN }} + repository: gdcc/dataverse + short-description: "Dataverse Application Container Image providing the executable" + readme-filepath: ./src/main/docker/README.md + + ### CONFIGBAKER IMAGE + - name: Render README for config baker image + if: toJSON(needs.configbaker-image.outputs.rebuilt_images) != '[]' + run: | + TAGS_JSON='${{ needs.configbaker-image.outputs.supported_tag_matrix }}' + echo "$TAGS_JSON" | jq -r 'keys | sort | reverse | .[]' | + while IFS= read -r branch; do + echo \ + "- \`$( echo "$TAGS_JSON" | jq --arg v "$branch" -r '.[$v] | join("`, `")' )\`" \ + "([Dockerfile](https://github.com/IQSS/dataverse/blob/${branch}/modules/container-configbaker/Dockerfile)," \ + "[Patches](https://github.com/IQSS/dataverse/blob/develop/modules/container-configbaker/backports/${branch}))" \ + | tee -a "${GITHUB_WORKSPACE}/tags-config.md" + done + sed -i -e "/<\!-- TAG BLOCK HERE -->/r ${GITHUB_WORKSPACE}/tags-config.md" "./modules/container-configbaker/README.md" + cat "./modules/container-configbaker/README.md" + - name: Push description to DockerHub for config baker image + if: ${{ ! inputs.dry_run && ! inputs.damp_run && toJSON(needs.configbaker-image.outputs.rebuilt_images) != '[]' }} + uses: peter-evans/dockerhub-description@v4 + with: + username: ${{ secrets.DOCKERHUB_USERNAME }} + password: ${{ secrets.DOCKERHUB_TOKEN }} + repository: gdcc/base + short-description: "Dataverse Config Baker Container Image providing setup tooling and more" + readme-filepath: ./modules/container-configbaker/README.md diff --git a/.github/workflows/copy_labels.yml b/.github/workflows/copy_labels.yml new file mode 100644 index 00000000000..83824b0125a --- /dev/null +++ b/.github/workflows/copy_labels.yml @@ -0,0 +1,19 @@ +name: Copy labels from issue to pull request + +on: + pull_request: + types: [opened] + +jobs: + copy-labels: + # Avoid being triggered by forks + if: "! github.event.pull_request.head.repo.fork && github.actor != 'dependabot[bot]'" + permissions: + pull-requests: write + runs-on: ubuntu-latest + name: Copy labels from linked issues + steps: + - name: copy-labels + uses: michalvankodev/copy-issue-labels@v1.3.0 + with: + repo-token: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/cypress_ui.yml.future b/.github/workflows/cypress_ui.yml.future new file mode 100644 index 00000000000..0823233fdeb --- /dev/null +++ b/.github/workflows/cypress_ui.yml.future @@ -0,0 +1,42 @@ +############################################################################### +# +# THIS IS AN OLD TRAVIS-CI.ORG JOB FILE +# To be used with Github Actions, it would be necessary to refactor it. +# In addition, it needs to be rewritten to use our modern containers. +# Keeping it as the future example it has been before. +# See also #5846 +# +############################################################################### + +services: + - docker + +jobs: + include: + # Execute Cypress for UI testing + # see https://docs.cypress.io/guides/guides/continuous-integration.html + - stage: test + language: node_js + node_js: + - "10" + addons: + apt: + packages: + # Ubuntu 16+ does not install this dependency by default, so we need to install it ourselves + - libgconf-2-4 + cache: + # Caches $HOME/.npm when npm ci is default script command + # Caches node_modules in all other cases + npm: true + directories: + # we also need to cache folder with Cypress binary + - ~/.cache + before_install: + - cd tests + install: + - npm ci + before_script: + - ./run_docker_dataverse.sh + script: + # --key needs to be injected using CYPRESS_RECORD_KEY to keep it secret + - $(npm bin)/cypress run --record diff --git a/.github/workflows/deploy_beta_testing.yml b/.github/workflows/deploy_beta_testing.yml new file mode 100644 index 00000000000..7a236a316fb --- /dev/null +++ b/.github/workflows/deploy_beta_testing.yml @@ -0,0 +1,90 @@ +name: 'Deploy to Beta Testing' + +on: + push: + branches: + - develop + +concurrency: + group: deploy-beta-testing + cancel-in-progress: false + +jobs: + build: + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v4 + + - uses: actions/setup-java@v4 + with: + distribution: 'zulu' + java-version: '17' + + - name: Enable API Session Auth feature flag + working-directory: src/main/resources/META-INF + run: echo -e "dataverse.feature.api-session-auth=true" >> microprofile-config.properties + + - name: Set build number + run: scripts/installer/custom-build-number + + - name: Build application war + run: mvn package + + - name: Get war file name + working-directory: target + run: echo "war_file=$(ls *.war | head -1)">> $GITHUB_ENV + + - name: Upload war artifact + uses: actions/upload-artifact@v4 + with: + name: built-app + path: ./target/${{ env.war_file }} + + deploy-to-payara: + needs: build + if: ${{ github.repository_owner == 'IQSS' }} + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v4 + + - name: Download war artifact + uses: actions/download-artifact@v5 + with: + name: built-app + path: ./ + + - name: Get war file name + run: echo "war_file=$(ls *.war | head -1)">> $GITHUB_ENV + + - name: Copy war file to remote instance + uses: appleboy/scp-action@master + with: + host: ${{ secrets.BETA_PAYARA_INSTANCE_HOST }} + username: ${{ secrets.BETA_PAYARA_INSTANCE_USERNAME }} + key: ${{ secrets.BETA_PAYARA_INSTANCE_SSH_PRIVATE_KEY }} + source: './${{ env.war_file }}' + target: '/home/${{ secrets.BETA_PAYARA_INSTANCE_USERNAME }}' + overwrite: true + + - name: Execute payara war deployment remotely + uses: appleboy/ssh-action@v1.2.2 + env: + INPUT_WAR_FILE: ${{ env.war_file }} + with: + host: ${{ secrets.BETA_PAYARA_INSTANCE_HOST }} + username: ${{ secrets.BETA_PAYARA_INSTANCE_USERNAME }} + key: ${{ secrets.BETA_PAYARA_INSTANCE_SSH_PRIVATE_KEY }} + envs: INPUT_WAR_FILE + script: | + APPLICATION_NAME=dataverse-backend + ASADMIN='/usr/local/payara6/bin/asadmin --user admin' + $ASADMIN undeploy $APPLICATION_NAME + #$ASADMIN stop-domain + #rm -rf /usr/local/payara6/glassfish/domains/domain1/generated + #rm -rf /usr/local/payara6/glassfish/domains/domain1/osgi-cache + #$ASADMIN start-domain + $ASADMIN deploy --name $APPLICATION_NAME $INPUT_WAR_FILE + #$ASADMIN stop-domain + #$ASADMIN start-domain diff --git a/.github/workflows/guides_build_sphinx.yml b/.github/workflows/guides_build_sphinx.yml new file mode 100644 index 00000000000..a3b5882626c --- /dev/null +++ b/.github/workflows/guides_build_sphinx.yml @@ -0,0 +1,28 @@ +name: "Guides Build Status" +on: + pull_request: + paths: + - "doc/sphinx-guides/**/*.rst" + - "doc/sphinx-guides/**/requirements.txt" + - "doc/sphinx-guides/**/conf.py" + +jobs: + docs: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - id: lookup + run: | + echo "sphinx_version=$(grep Sphinx== ./doc/sphinx-guides/requirements.txt | tr -s "=" | cut -f 2 -d=)" | tee -a "${GITHUB_OUTPUT}" + - run: | + sudo apt-get update -q + sudo apt-get install -qqy --no-install-recommends graphviz + - uses: sphinx-notes/pages@v3 + with: + documentation_path: ./doc/sphinx-guides/source + requirements_path: ./doc/sphinx-guides/requirements.txt + sphinx_version: ${{ steps.lookup.outputs.sphinx_version }} + sphinx_build_options: "-W" + cache: false + publish: false + diff --git a/.github/workflows/maven_cache_management.yml b/.github/workflows/maven_cache_management.yml new file mode 100644 index 00000000000..fedf63b7c54 --- /dev/null +++ b/.github/workflows/maven_cache_management.yml @@ -0,0 +1,101 @@ +name: Maven Cache Management + +on: + # Every push to develop should trigger cache rejuvenation (dependencies might have changed) + push: + branches: + - develop + # According to https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy + # all caches are deleted after 7 days of no access. Make sure we rejuvenate every 7 days to keep it available. + schedule: + - cron: '23 2 * * 0' # Run for 'develop' every Sunday at 02:23 UTC (3:23 CET, 21:23 ET) + # Enable manual cache management + workflow_dispatch: + # Delete branch caches once a PR is merged + pull_request: + types: + - closed + +env: + COMMON_CACHE_KEY: "dataverse-maven-cache" + COMMON_CACHE_PATH: "~/.m2/repository" + +jobs: + seed: + name: Drop and Re-Seed Local Repository + runs-on: ubuntu-latest + if: ${{ github.event_name != 'pull_request' }} + permissions: + # Write permission needed to delete caches + # See also: https://docs.github.com/en/rest/actions/cache?apiVersion=2022-11-28#delete-a-github-actions-cache-for-a-repository-using-a-cache-id + actions: write + contents: read + steps: + - name: Checkout repository + uses: actions/checkout@v4 + - name: Determine Java version from Parent POM + run: echo "JAVA_VERSION=$(grep '' modules/dataverse-parent/pom.xml | cut -f2 -d'>' | cut -f1 -d'<')" >> ${GITHUB_ENV} + - name: Set up JDK ${{ env.JAVA_VERSION }} + uses: actions/setup-java@v4 + with: + java-version: ${{ env.JAVA_VERSION }} + distribution: temurin + - name: Seed common cache + run: | + mvn -B -f modules/dataverse-parent dependency:go-offline dependency:resolve-plugins + # This non-obvious order is due to the fact that the download via Maven above will take a very long time (7-8 min). + # Jobs should not be left without a cache. Deleting and saving in one go leaves only a small chance for a cache miss. + - name: Drop common cache + run: | + gh extension install actions/gh-actions-cache + echo "🛒 Fetching list of cache keys" + cacheKeys=$(gh actions-cache list -R ${{ github.repository }} -B develop | cut -f 1 ) + + ## Setting this to not fail the workflow while deleting cache keys. + set +e + echo "đŸ—‘ī¸ Deleting caches..." + for cacheKey in $cacheKeys + do + gh actions-cache delete $cacheKey -R ${{ github.repository }} -B develop --confirm + done + echo "✅ Done" + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Save the common cache + uses: actions/cache@v4 + with: + path: ${{ env.COMMON_CACHE_PATH }} + key: ${{ env.COMMON_CACHE_KEY }} + enableCrossOsArchive: true + + # Let's delete feature branch caches once their PR is merged - we only have 10 GB of space before eviction kicks in + deplete: + name: Deplete feature branch caches + runs-on: ubuntu-latest + if: ${{ github.event_name == 'pull_request' }} + permissions: + # `actions:write` permission is required to delete caches + # See also: https://docs.github.com/en/rest/actions/cache?apiVersion=2022-11-28#delete-a-github-actions-cache-for-a-repository-using-a-cache-id + actions: write + contents: read + steps: + - name: Checkout repository + uses: actions/checkout@v4 + - name: Cleanup caches + run: | + gh extension install actions/gh-actions-cache + + BRANCH=refs/pull/${{ github.event.pull_request.number }}/merge + echo "🛒 Fetching list of cache keys" + cacheKeysForPR=$(gh actions-cache list -R ${{ github.repository }} -B $BRANCH | cut -f 1 ) + + ## Setting this to not fail the workflow while deleting cache keys. + set +e + echo "đŸ—‘ī¸ Deleting caches..." + for cacheKey in $cacheKeysForPR + do + gh actions-cache delete $cacheKey -R ${{ github.repository }} -B $BRANCH --confirm + done + echo "✅ Done" + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/maven_unit_test.yml b/.github/workflows/maven_unit_test.yml new file mode 100644 index 00000000000..f0cf461d8e6 --- /dev/null +++ b/.github/workflows/maven_unit_test.yml @@ -0,0 +1,180 @@ +name: Maven Tests + +on: + push: + # Only run for development and feature branches. Don't waste CPU cycles testing + # master when the PR to update it from develop already ran these tests. + branches: + - '*' + - '!master' + paths: + - "**.java" + - "**.sql" + - "pom.xml" + - "modules/**/pom.xml" + - "!modules/container-base/**" + - "!modules/dataverse-spi/**" + pull_request: + paths: + - "**.java" + - "**.sql" + - "pom.xml" + - "modules/**/pom.xml" + - "!modules/container-base/**" + - "!modules/dataverse-spi/**" + +jobs: + unittest: + name: (${{ matrix.status}} / JDK ${{ matrix.jdk }}) Unit Tests + strategy: + fail-fast: false + matrix: + jdk: [ '17' ] + experimental: [false] + status: ["Stable"] + continue-on-error: ${{ matrix.experimental }} + runs-on: ubuntu-latest + steps: + # TODO: As part of #10618 change to setup-maven custom action + # Basic setup chores + - uses: actions/checkout@v4 + - name: Set up JDK ${{ matrix.jdk }} + uses: actions/setup-java@v4 + with: + java-version: ${{ matrix.jdk }} + distribution: temurin + cache: maven + + # The reason why we use "install" here is that we want the submodules to be available in the next step. + # Also, we can cache them this way for jobs triggered by this one. We need to skip ITs here, as we run + # them in the next job - but install usually runs through verify phase. + - name: Build with Maven and run unit tests + run: > + mvn -B -f modules/dataverse-parent + -Dtarget.java.version=${{ matrix.jdk }} + -DcompilerArgument=-Xlint:unchecked -P all-unit-tests + -DskipIntegrationTests + -pl edu.harvard.iq:dataverse -am + install + + # We don't want to cache the WAR file, so delete it + - run: rm -rf ~/.m2/repository/edu/harvard/iq/dataverse + + # Upload the built war file. For download, it will be wrapped in a ZIP by GitHub. + # See also https://github.com/actions/upload-artifact#zipped-artifact-downloads + - uses: actions/upload-artifact@v4 + with: + name: dataverse-java${{ matrix.jdk }}.war + path: target/dataverse*.war + retention-days: 7 + + # Store the build for the next step (integration test) to avoid recompilation and to transfer coverage reports + - run: | + tar -cvf java-builddir.tar target + tar -cvf java-m2-selection.tar ~/.m2/repository/io/gdcc/dataverse-* + - uses: actions/upload-artifact@v4 + with: + name: java-artifacts + path: | + java-builddir.tar + java-m2-selection.tar + retention-days: 3 + + integration-test: + runs-on: ubuntu-latest + needs: unittest + name: (${{ matrix.status}} / JDK ${{ matrix.jdk }}) Integration Tests + strategy: + fail-fast: false + matrix: + jdk: [ '17' ] + experimental: [ false ] + status: [ "Stable" ] + # + # JDK 17 builds disabled due to non-essential fails marking CI jobs as completely failed within + # Github Projects, PR lists etc. This was consensus on Slack #dv-tech. See issue #8094 + # (This is a limitation of how Github is currently handling these things.) + # + #include: + # - jdk: '17' + # experimental: true + # status: "Experimental" + continue-on-error: ${{ matrix.experimental }} + steps: + # TODO: As part of #10618 change to setup-maven custom action + # Basic setup chores + - uses: actions/checkout@v4 + - name: Set up JDK ${{ matrix.jdk }} + uses: actions/setup-java@v4 + with: + java-version: ${{ matrix.jdk }} + distribution: temurin + cache: maven + + # Get the build output from the unit test job + - uses: actions/download-artifact@v5 + with: + name: java-artifacts + - run: | + tar -xvf java-builddir.tar + tar -xvf java-m2-selection.tar -C / + + # Run integration tests (but not unit tests again) + - run: mvn -DskipUnitTests -Dtarget.java.version=${{ matrix.jdk }} verify + + # Wrap up and send to coverage job + - run: tar -cvf java-reportdir.tar target/site + - uses: actions/upload-artifact@v4 + with: + name: java-reportdir + path: java-reportdir.tar + retention-days: 3 + + coverage-report: + runs-on: ubuntu-latest + needs: integration-test + name: Coverage Report Submission + steps: + # TODO: As part of #10618 change to setup-maven custom action + # Basic setup chores + - uses: actions/checkout@v4 + - uses: actions/setup-java@v4 + with: + java-version: '17' + distribution: temurin + cache: maven + + # Get the build output from the integration test job + - uses: actions/download-artifact@v5 + with: + name: java-reportdir + - run: tar -xvf java-reportdir.tar + + # Deposit Code Coverage + - name: Deposit Code Coverage + env: + CI_NAME: github + COVERALLS_SECRET: ${{ secrets.GITHUB_TOKEN }} + # The coverage commit is sometimes flaky. Don't bail out just because this optional step failed. + continue-on-error: true + run: > + mvn -B + -DrepoToken=${COVERALLS_SECRET} -DpullRequest=${{ github.event.number }} + jacoco:report coveralls:report + + # NOTE: this may be extended with adding a report to the build output, leave a comment, send to Sonarcloud, ... + + # TODO: Add a filter step here, that avoids calling the app image release workflow if there are changes to the base image. + # Use https://github.com/dorny/paths-filter to solve this. Will require and additional job or adding to integration-test job. + # This way we ensure that we're not running the app image flow with a non-matching base image. + # To become a part of #10618. + + push-app-img: + name: Publish App Image + permissions: + contents: read + packages: write + pull-requests: write + needs: integration-test + uses: ./.github/workflows/container_app_push.yml + secrets: inherit diff --git a/.github/workflows/pr_comment_commands.yml b/.github/workflows/pr_comment_commands.yml new file mode 100644 index 00000000000..06b11b1ac5b --- /dev/null +++ b/.github/workflows/pr_comment_commands.yml @@ -0,0 +1,20 @@ +name: PR Comment Commands +on: + issue_comment: + types: [created] +jobs: + dispatch: + # Avoid being triggered by forks in upstream + if: ${{ github.repository_owner == 'IQSS' }} + runs-on: ubuntu-latest + steps: + - name: Dispatch + uses: peter-evans/slash-command-dispatch@v4 + with: + # This token belongs to @dataversebot and has sufficient scope. + token: ${{ secrets.GHCR_TOKEN }} + commands: | + push-image + repository: IQSS/dataverse + # Commenter must have at least write permission to repo to trigger dispatch + permission: write diff --git a/.github/workflows/reviewdog_checkstyle.yml b/.github/workflows/reviewdog_checkstyle.yml new file mode 100644 index 00000000000..804b04f696a --- /dev/null +++ b/.github/workflows/reviewdog_checkstyle.yml @@ -0,0 +1,21 @@ +name: Maven CheckStyle Task +on: + pull_request: + paths: + - "**.java" + +jobs: + checkstyle_job: + runs-on: ubuntu-latest + name: Checkstyle job + steps: + - name: Checkout + uses: actions/checkout@v4 + - name: Run check style + uses: nikitasavinov/checkstyle-action@master + with: + fail_on_error: true + reporter: github-pr-review + checkstyle_config: checkstyle.xml + github_token: ${{ secrets.GITHUB_TOKEN }} + diff --git a/.github/workflows/scripts/containers/maintain-application.sh b/.github/workflows/scripts/containers/maintain-application.sh new file mode 100755 index 00000000000..b68c2a53d96 --- /dev/null +++ b/.github/workflows/scripts/containers/maintain-application.sh @@ -0,0 +1,202 @@ +#!/bin/bash + +# A matrix-like job to maintain a number of releases as well as the latest snap of Dataverse. + +# PREREQUISITES: +# - You have Java, Maven, QEMU and Docker all setup and ready to go +# - You obviously checked out the develop branch, otherwise you'd not be executing this script +# - You added all the branch names you want to run maintenance for as arguments +# Optional, but recommended: +# - You added a DEVELOPMENT_BRANCH env var to your runner/job env with the name of the development branch +# - You added a FORCE_BUILD=0|1 env var to indicate if the base image build should be forced +# - You added a PLATFORMS env var with all the target platforms you want to build for +# Optional: +# - Use DRY_RUN=1 env var to skip actually building, but see how the tag lookups play out +# - Use DAMP_RUN=1 env var to skip pushing images, but build them + +# NOTE: +# This script is a culmination of Github Action steps into a single script. +# The reason to put all of this in here is due to the complexity of the Github Action and the limitation of the +# matrix support in Github actions, where outputs cannot be aggregated or otherwise used further. + +set -euo pipefail + +# Get all the inputs +# If not within a runner, just print to stdout (duplicating the output in case of tee usage, but that's ok for testing) +GITHUB_OUTPUT=${GITHUB_OUTPUT:-"/proc/self/fd/1"} +GITHUB_ENV=${GITHUB_ENV:-"/proc/self/fd/1"} +GITHUB_WORKSPACE=${GITHUB_WORKSPACE:-"$(pwd)"} +GITHUB_SERVER_URL=${GITHUB_SERVER_URL:-"https://github.com"} +GITHUB_REPOSITORY=${GITHUB_REPOSITORY:-"IQSS/dataverse"} + +MAINTENANCE_WORKSPACE="${GITHUB_WORKSPACE}/maintenance-job" + +DEVELOPMENT_BRANCH="${DEVELOPMENT_BRANCH:-"develop"}" +FORCE_BUILD="${FORCE_BUILD:-"0"}" +DRY_RUN="${DRY_RUN:-"0"}" +DAMP_RUN="${DAMP_RUN:-"0"}" +PLATFORMS="${PLATFORMS:-"linux/amd64,linux/arm64"}" + +# Setup and validation +if [[ -z "$*" ]]; then + >&2 echo "You must give a list of branch names as arguments" + exit 1; +fi + +if (( DRY_RUN + DAMP_RUN > 1 )); then + >&2 echo "You must either use DRY_RUN=1 or DAMP_RUN=1, but not both" + exit 1; +fi + +source "$( dirname "$0" )/utils.sh" + +# Delete old stuff if present +rm -rf "$MAINTENANCE_WORKSPACE" +mkdir -p "$MAINTENANCE_WORKSPACE" + +# Store the image tags we maintain in this array (same order as branches array!) +# This list will be used to build the support matrix within the Docker Hub image description +SUPPORTED_ROLLING_TAGS=() +# Store the tags of application images we are actually rebuilding +# Takes the from "branch-name=app-image-ref" +REBUILT_APP_IMAGES=() + +for BRANCH in "$@"; do + echo "::group::Running maintenance for $BRANCH" + + # 0. Determine if this is a development branch and the most current release + IS_DEV=0 + if [[ "$BRANCH" = "$DEVELOPMENT_BRANCH" ]]; then + IS_DEV=1 + fi + IS_CURRENT_RELEASE=0 + if [[ "$BRANCH" = $( curl -f -sS "https://api.github.com/repos/$GITHUB_REPOSITORY/releases" | jq -r '.[0].tag_name' ) ]]; then + IS_CURRENT_RELEASE=1 + fi + + # 1. Let's get the maintained sources + git clone -c advice.detachedHead=false --depth 1 --branch "$BRANCH" "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}" "$MAINTENANCE_WORKSPACE/$BRANCH" + # Switch context + cd "$MAINTENANCE_WORKSPACE/$BRANCH" + + # 2. Now let's apply the patches (we have them checked out in $GITHUB_WORKSPACE, not necessarily in this local checkout) + echo "Checking for patches..." + if [[ -d ${GITHUB_WORKSPACE}/src/backports/$BRANCH ]]; then + echo "Applying patches now." + find "${GITHUB_WORKSPACE}/src/backports/$BRANCH" -type f -name '*.patch' -print0 | xargs -0 -n1 patch -p1 -l -s -i + fi + + # 3a. Determine the base image ref (/:) + BASE_IMAGE_REF="" + # For the dev branch we want to full flexi stack tag, to detect stack upgrades requiring new build + if (( IS_DEV )); then + BASE_IMAGE_REF=$( mvn initialize help:evaluate -Pct -f . -Dexpression=base.image -q -DforceStdout ) + else + # Frist, get the rolling tag of the base image + ROLLING_BASE_REF=$( mvn initialize help:evaluate -Pct -f . -Dexpression=base.image -Dbase.image.tag.suffix="" -q -DforceStdout ) + # Now, we want to build any release branch application images on top of a _fixed_ tag, so let's fetch the newest fixed tag + CURRENT_REV=$( current_revision "$ROLLING_BASE_REF" ) + BASE_IMAGE_REF="$ROLLING_BASE_REF-r$CURRENT_REV" + fi + echo "Determined BASE_IMAGE_REF=$BASE_IMAGE_REF from Maven" + + # 3b. Determine the app image ref (/:) + APP_IMAGE_REF="" + if (( IS_DEV )); then + # Results in the rolling tag for the dev branch + APP_IMAGE_REF=$( mvn initialize help:evaluate -Pct -f . -Dexpression=app.image -q -DforceStdout ) + else + # Results in the rolling tag for the release branch (the fixed tag will be determined from this rolling tag) + # shellcheck disable=SC2016 + APP_IMAGE_REF=$( mvn initialize help:evaluate -Pct -f . -Dexpression=app.image -Dapp.image.tag='${app.image.version}-${base.image.flavor}' -q -DforceStdout ) + fi + echo "Determined APP_IMAGE_REF=$APP_IMAGE_REF from Maven" + + # 4. Check for Base image updates + NEWER_BASE_IMAGE=0 + if check_newer_parent "$BASE_IMAGE_REF" "$APP_IMAGE_REF"; then + NEWER_BASE_IMAGE=1 + fi + + # 5. Get current immutable revision tag if not on the dev branch + REV=$( current_revision "$APP_IMAGE_REF" ) + CURRENT_REV_TAG="${APP_IMAGE_REF#*:}-r$REV" + NEXT_REV_TAG="${APP_IMAGE_REF#*:}-r$(( REV + 1 ))" + + # 6. Let's put together what tags we want added to this build run + TAG_OPTIONS="" + if ! (( IS_DEV )); then + TAG_OPTIONS="-Dapp.image=$APP_IMAGE_REF -Ddocker.tags.revision=$NEXT_REV_TAG" + # In case of the current release, add the "latest" tag as well. + if (( IS_CURRENT_RELEASE )); then + TAG_OPTIONS="$TAG_OPTIONS -Ddocker.tags.latest=latest" + fi + else + # shellcheck disable=SC2016 + UPCOMING_TAG=$( mvn initialize help:evaluate -Pct -f . -Dexpression=app.image.tag -Dapp.image.tag='${app.image.version}-${base.image.flavor}' -q -DforceStdout ) + TAG_OPTIONS="-Ddocker.tags.upcoming=$UPCOMING_TAG" + + # For the dev branch we only have rolling tags and can add them now already + SUPPORTED_ROLLING_TAGS+=("[\"unstable\", \"$UPCOMING_TAG\"]") + fi + echo "Determined these additional Maven tag options: $TAG_OPTIONS" + + # 8. Let's build the base image if necessary + NEWER_IMAGE=0 + if (( NEWER_BASE_IMAGE + FORCE_BUILD > 0 )); then + if ! (( DRY_RUN )); then + # Build the application image, but skip the configbaker image (that's a different job)! + # shellcheck disable=SC2046 + mvn -Pct -f . deploy -Ddocker.noCache -Ddocker.platforms="${PLATFORMS}" \ + -Dconf.skipBuild -Dbase.image="${BASE_IMAGE_REF}" \ + -Ddocker.imagePropertyConfiguration=override $TAG_OPTIONS \ + $( if (( DAMP_RUN )); then echo "-Ddocker.skip.push -Ddocker.skip.tag"; fi ) + else + echo "Skipping Maven build as requested by DRY_RUN=1" + fi + NEWER_IMAGE=1 + # Save the information about the immutable or rolling tag we just built + if ! (( IS_DEV )); then + REBUILT_APP_IMAGES+=("$BRANCH=${APP_IMAGE_REF%:*}:$NEXT_REV_TAG") + else + REBUILT_APP_IMAGES+=("$BRANCH=$APP_IMAGE_REF") + fi + else + echo "No rebuild necessary, we're done here." + fi + + # 9. Add list of rolling and immutable tags for release builds + if ! (( IS_DEV )); then + RELEASE_TAGS_LIST="[" + if (( IS_CURRENT_RELEASE )); then + RELEASE_TAGS_LIST+="\"latest\", " + fi + RELEASE_TAGS_LIST+="\"${APP_IMAGE_REF#*:}\", " + if (( NEWER_IMAGE )); then + RELEASE_TAGS_LIST+="\"$NEXT_REV_TAG\"]" + else + RELEASE_TAGS_LIST+="\"$CURRENT_REV_TAG\"]" + fi + SUPPORTED_ROLLING_TAGS+=("${RELEASE_TAGS_LIST}") + fi + + echo "::endgroup::" +done + +# Built the output which images have actually been rebuilt as JSON +REBUILT_IMAGES="[" +for IMAGE in "${REBUILT_APP_IMAGES[@]}"; do + REBUILT_IMAGES+=" \"$IMAGE\" " +done +REBUILT_IMAGES+="]" +echo "rebuilt_images=${REBUILT_IMAGES// /, }" | tee -a "${GITHUB_OUTPUT}" + +# Built the supported rolling tags matrix as JSON +SUPPORTED_TAGS="{" +for (( i=0; i < ${#SUPPORTED_ROLLING_TAGS[@]} ; i++ )); do + j=$((i+1)) + SUPPORTED_TAGS+="\"${!j}\": ${SUPPORTED_ROLLING_TAGS[$i]}" + (( i < ${#SUPPORTED_ROLLING_TAGS[@]}-1 )) && SUPPORTED_TAGS+=", " +done +SUPPORTED_TAGS+="}" +echo "supported_tag_matrix=$SUPPORTED_TAGS" | tee -a "$GITHUB_OUTPUT" diff --git a/.github/workflows/scripts/containers/maintain-base.sh b/.github/workflows/scripts/containers/maintain-base.sh new file mode 100755 index 00000000000..5b9ae738b98 --- /dev/null +++ b/.github/workflows/scripts/containers/maintain-base.sh @@ -0,0 +1,196 @@ +#!/bin/bash + +# A matrix-like job to maintain a number of releases as well as the latest snap of Dataverse. + +# PREREQUISITES: +# - You have Java, Maven, QEMU and Docker all setup and ready to go +# - You obviously checked out the develop branch, otherwise you'd not be executing this script +# - You added all the branch names you want to run maintenance for as arguments +# Optional, but recommended: +# - You added a DEVELOPMENT_BRANCH env var to your runner/job env with the name of the development branch +# - You added a FORCE_BUILD=0|1 env var to indicate if the base image build should be forced +# - You added a PLATFORMS env var with all the target platforms you want to build for +# Optional: +# - Use DRY_RUN=1 env var to skip actually building, but see how the tag lookups play out +# - Use DAMP_RUN=1 env var to skip pushing images, but build them + +# NOTE: +# This script is a culmination of Github Action steps into a single script. +# The reason to put all of this in here is due to the complexity of the Github Action and the limitation of the +# matrix support in Github actions, where outputs cannot be aggregated or otherwise used further. + +set -euo pipefail + +# Get all the inputs +# If not within a runner, just print to stdout (duplicating the output in case of tee usage, but that's ok for testing) +GITHUB_OUTPUT=${GITHUB_OUTPUT:-"/proc/self/fd/1"} +GITHUB_ENV=${GITHUB_ENV:-"/proc/self/fd/1"} +GITHUB_WORKSPACE=${GITHUB_WORKSPACE:-"$(pwd)"} +GITHUB_SERVER_URL=${GITHUB_SERVER_URL:-"https://github.com"} +GITHUB_REPOSITORY=${GITHUB_REPOSITORY:-"IQSS/dataverse"} + +MAINTENANCE_WORKSPACE="${GITHUB_WORKSPACE}/maintenance-job" + +DEVELOPMENT_BRANCH="${DEVELOPMENT_BRANCH:-"develop"}" +FORCE_BUILD="${FORCE_BUILD:-"0"}" +DRY_RUN="${DRY_RUN:-"0"}" +DAMP_RUN="${DAMP_RUN:-"0"}" +PLATFORMS="${PLATFORMS:-"linux/amd64,linux/arm64"}" + +# Setup and validation +if [[ -z "$*" ]]; then + >&2 echo "You must give a list of branch names as arguments" + exit 1; +fi + +if (( DRY_RUN + DAMP_RUN > 1 )); then + >&2 echo "You must either use DRY_RUN=1 or DAMP_RUN=1, but not both" + exit 1; +fi + +source "$( dirname "$0" )/utils.sh" + +# Delete old stuff if present +rm -rf "$MAINTENANCE_WORKSPACE" +mkdir -p "$MAINTENANCE_WORKSPACE" + +# Store the image tags we maintain in this array (same order as branches array!) +# This list will be used to build the support matrix within the Docker Hub image description +SUPPORTED_ROLLING_TAGS=() +# Store the tags of base images we are actually rebuilding to base new app images upon +# Takes the from "branch-name=base-image-ref" +REBUILT_BASE_IMAGES=() + +for BRANCH in "$@"; do + echo "::group::Running maintenance for $BRANCH" + + # 0. Determine if this is a development branch and the most current release + IS_DEV=0 + if [[ "$BRANCH" = "$DEVELOPMENT_BRANCH" ]]; then + IS_DEV=1 + fi + IS_CURRENT_RELEASE=0 + if [[ "$BRANCH" = $( curl -f -sS "https://api.github.com/repos/$GITHUB_REPOSITORY/releases" | jq -r '.[0].tag_name' ) ]]; then + IS_CURRENT_RELEASE=1 + fi + + # 1. Let's get the maintained sources + git clone -c advice.detachedHead=false --depth 1 --branch "$BRANCH" "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}" "$MAINTENANCE_WORKSPACE/$BRANCH" + # Switch context + cd "$MAINTENANCE_WORKSPACE/$BRANCH" + + # 2. Now let's apply the patches (we have them checked out in $GITHUB_WORKSPACE, not necessarily in this local checkout) + echo "Checking for patches..." + if [[ -d ${GITHUB_WORKSPACE}/modules/container-base/src/backports/$BRANCH ]]; then + echo "Applying patches now." + find "${GITHUB_WORKSPACE}/modules/container-base/src/backports/$BRANCH" -type f -name '*.patch' -print0 | xargs -0 -n1 patch -p1 -l -s -i + fi + + # 3. Determine the base image ref (/:) + BASE_IMAGE_REF="" + # For the dev branch we want to full flexi stack tag, to detect stack upgrades requiring new build + if (( IS_DEV )); then + BASE_IMAGE_REF=$( mvn initialize help:evaluate -Pct -f modules/container-base -Dexpression=base.image -q -DforceStdout ) + else + BASE_IMAGE_REF=$( mvn initialize help:evaluate -Pct -f modules/container-base -Dexpression=base.image -Dbase.image.tag.suffix="" -q -DforceStdout ) + fi + echo "Determined BASE_IMAGE_REF=$BASE_IMAGE_REF from Maven" + + # 4. Check for Temurin image updates + JAVA_IMAGE_REF=$( mvn help:evaluate -Pct -f modules/container-base -Dexpression=java.image -q -DforceStdout ) + echo "Determined JAVA_IMAGE_REF=$JAVA_IMAGE_REF from Maven" + NEWER_JAVA_IMAGE=0 + if check_newer_parent "$JAVA_IMAGE_REF" "$BASE_IMAGE_REF"; then + NEWER_JAVA_IMAGE=1 + fi + + # 5. Check for package updates in base image + PKGS="$( grep "ARG PKGS" modules/container-base/src/main/docker/Dockerfile | cut -f2 -d= | tr -d '"' )" + echo "Determined installed packages=\"$PKGS\" from Maven" + NEWER_PKGS=0 + # Don't bother with package checks if the java image is newer already + if ! (( NEWER_JAVA_IMAGE )); then + if check_newer_pkgs "$BASE_IMAGE_REF" "$PKGS"; then + NEWER_PKGS=1 + fi + fi + + # 6. Get current immutable revision tag if not on the dev branch + REV=$( current_revision "$BASE_IMAGE_REF" ) + CURRENT_REV_TAG="${BASE_IMAGE_REF#*:}-r$REV" + NEXT_REV_TAG="${BASE_IMAGE_REF#*:}-r$(( REV + 1 ))" + + # 7. Let's put together what tags we want added to this build run + TAG_OPTIONS="" + if ! (( IS_DEV )); then + TAG_OPTIONS="-Dbase.image=$BASE_IMAGE_REF -Ddocker.tags.revision=$NEXT_REV_TAG" + # In case of the current release, add the "latest" tag as well. + if (( IS_CURRENT_RELEASE )); then + TAG_OPTIONS="$TAG_OPTIONS -Ddocker.tags.latest=latest" + fi + else + UPCOMING_TAG=$( mvn initialize help:evaluate -Pct -f modules/container-base -Dexpression=base.image.tag -Dbase.image.tag.suffix="" -q -DforceStdout ) + TAG_OPTIONS="-Ddocker.tags.develop=unstable -Ddocker.tags.upcoming=$UPCOMING_TAG" + + # For the dev branch we only have rolling tags and can add them now already + SUPPORTED_ROLLING_TAGS+=("[\"unstable\", \"$UPCOMING_TAG\", \"${BASE_IMAGE_REF#*:}\"]") + fi + echo "Determined these additional Maven tag options: $TAG_OPTIONS" + + # 8. Let's build the base image if necessary + NEWER_IMAGE=0 + if (( NEWER_JAVA_IMAGE + NEWER_PKGS + FORCE_BUILD > 0 )); then + if ! (( DRY_RUN )); then + # shellcheck disable=SC2046 + mvn -Pct -f modules/container-base deploy -Ddocker.noCache -Ddocker.platforms="${PLATFORMS}" \ + -Ddocker.imagePropertyConfiguration=override $TAG_OPTIONS \ + $( if (( DAMP_RUN )); then echo "-Ddocker.skip.push -Ddocker.skip.tag"; fi ) + else + echo "Skipping Maven build as requested by DRY_RUN=1" + fi + NEWER_IMAGE=1 + # Save the information about the immutable or rolling tag we just built + if ! (( IS_DEV )); then + REBUILT_BASE_IMAGES+=("$BRANCH=${BASE_IMAGE_REF%:*}:$NEXT_REV_TAG") + else + REBUILT_BASE_IMAGES+=("$BRANCH=$BASE_IMAGE_REF") + fi + else + echo "No rebuild necessary, we're done here." + fi + + # 9. Add list of rolling and immutable tags for release builds + if ! (( IS_DEV )); then + RELEASE_TAGS_LIST="[" + if (( IS_CURRENT_RELEASE )); then + RELEASE_TAGS_LIST+="\"latest\", " + fi + RELEASE_TAGS_LIST+="\"${BASE_IMAGE_REF#*:}\", " + if (( NEWER_IMAGE )); then + RELEASE_TAGS_LIST+="\"$NEXT_REV_TAG\"]" + else + RELEASE_TAGS_LIST+="\"$CURRENT_REV_TAG\"]" + fi + SUPPORTED_ROLLING_TAGS+=("${RELEASE_TAGS_LIST}") + fi + + echo "::endgroup::" +done + +# Built the output which base images have actually been rebuilt as JSON +REBUILT_IMAGES="[" +for IMAGE in "${REBUILT_BASE_IMAGES[@]}"; do + REBUILT_IMAGES+=" \"$IMAGE\" " +done +REBUILT_IMAGES+="]" +echo "rebuilt_images=${REBUILT_IMAGES// /, }" | tee -a "${GITHUB_OUTPUT}" + +# Built the supported rolling tags matrix as JSON +SUPPORTED_TAGS="{" +for (( i=0; i < ${#SUPPORTED_ROLLING_TAGS[@]} ; i++ )); do + j=$((i+1)) + SUPPORTED_TAGS+="\"${!j}\": ${SUPPORTED_ROLLING_TAGS[$i]}" + (( i < ${#SUPPORTED_ROLLING_TAGS[@]}-1 )) && SUPPORTED_TAGS+=", " +done +SUPPORTED_TAGS+="}" +echo "supported_tag_matrix=$SUPPORTED_TAGS" | tee -a "$GITHUB_OUTPUT" diff --git a/.github/workflows/scripts/containers/maintain-configbaker.sh b/.github/workflows/scripts/containers/maintain-configbaker.sh new file mode 100755 index 00000000000..0b5b60b459c --- /dev/null +++ b/.github/workflows/scripts/containers/maintain-configbaker.sh @@ -0,0 +1,199 @@ +#!/bin/bash + +# A matrix-like job to maintain a number of releases as well as the latest snap of Dataverse. + +# PREREQUISITES: +# - You have Java, Maven, QEMU and Docker all setup and ready to go +# - You obviously checked out the develop branch, otherwise you'd not be executing this script +# - You added all the branch names you want to run maintenance for as arguments +# Optional, but recommended: +# - You added a DEVELOPMENT_BRANCH env var to your runner/job env with the name of the development branch +# - You added a FORCE_BUILD=0|1 env var to indicate if the base image build should be forced +# - You added a PLATFORMS env var with all the target platforms you want to build for +# Optional: +# - Use DRY_RUN=1 env var to skip actually building, but see how the tag lookups play out +# - Use DAMP_RUN=1 env var to skip pushing images, but build them + +# NOTE: +# This script is a culmination of Github Action steps into a single script. +# The reason to put all of this in here is due to the complexity of the Github Action and the limitation of the +# matrix support in Github actions, where outputs cannot be aggregated or otherwise used further. + +set -euo pipefail + +# Get all the inputs +# If not within a runner, just print to stdout (duplicating the output in case of tee usage, but that's ok for testing) +GITHUB_OUTPUT=${GITHUB_OUTPUT:-"/proc/self/fd/1"} +GITHUB_ENV=${GITHUB_ENV:-"/proc/self/fd/1"} +GITHUB_WORKSPACE=${GITHUB_WORKSPACE:-"$(pwd)"} +GITHUB_SERVER_URL=${GITHUB_SERVER_URL:-"https://github.com"} +GITHUB_REPOSITORY=${GITHUB_REPOSITORY:-"IQSS/dataverse"} + +MAINTENANCE_WORKSPACE="${GITHUB_WORKSPACE}/maintenance-job" + +DEVELOPMENT_BRANCH="${DEVELOPMENT_BRANCH:-"develop"}" +FORCE_BUILD="${FORCE_BUILD:-"0"}" +DRY_RUN="${DRY_RUN:-"0"}" +DAMP_RUN="${DAMP_RUN:-"0"}" +PLATFORMS="${PLATFORMS:-"linux/amd64,linux/arm64"}" + +# Setup and validation +if [[ -z "$*" ]]; then + >&2 echo "You must give a list of branch names as arguments" + exit 1; +fi + +if (( DRY_RUN + DAMP_RUN > 1 )); then + >&2 echo "You must either use DRY_RUN=1 or DAMP_RUN=1, but not both" + exit 1; +fi + +source "$( dirname "$0" )/utils.sh" + +# Delete old stuff if present +rm -rf "$MAINTENANCE_WORKSPACE" +mkdir -p "$MAINTENANCE_WORKSPACE" + +# Store the image tags we maintain in this array (same order as branches array!) +# This list will be used to build the support matrix within the Docker Hub image description +SUPPORTED_ROLLING_TAGS=() +# Store the tags of config baker images we are actually rebuilding +# Takes the from "branch-name=config-image-ref" +REBUILT_CONFIG_IMAGES=() + +for BRANCH in "$@"; do + echo "::group::Running maintenance for $BRANCH" + + # 0. Determine if this is a development branch and the most current release + IS_DEV=0 + if [[ "$BRANCH" = "$DEVELOPMENT_BRANCH" ]]; then + IS_DEV=1 + fi + IS_CURRENT_RELEASE=0 + if [[ "$BRANCH" = $( curl -f -sS "https://api.github.com/repos/$GITHUB_REPOSITORY/releases" | jq -r '.[0].tag_name' ) ]]; then + IS_CURRENT_RELEASE=1 + fi + + # 1. Let's get the maintained sources + git clone -c advice.detachedHead=false --depth=1 --branch "$BRANCH" "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}" "$MAINTENANCE_WORKSPACE/$BRANCH" + # Switch context + cd "$MAINTENANCE_WORKSPACE/$BRANCH" + + # 2. Now let's apply the patches (we have them checked out in $GITHUB_WORKSPACE, not necessarily in this local checkout) + echo "Checking for patches..." + if [[ -d ${GITHUB_WORKSPACE}/modules/container-configbaker/backports/$BRANCH ]]; then + echo "Applying patches now." + find "${GITHUB_WORKSPACE}/modules/container-configbaker/backports/$BRANCH" -type f -name '*.patch' -print0 | xargs -0 -n1 patch -p1 -l -s -i + fi + + # 3a. Determine the base image ref (/:) + BASE_IMAGE_REF=$( mvn initialize help:evaluate -Pct -f . -Dexpression=conf.image.base -q -DforceStdout ) + echo "Determined BASE_IMAGE_REF=$BASE_IMAGE_REF from Maven" + + # 3b. Determine the configbaker image ref (/:) + CONFIG_IMAGE_REF="" + if (( IS_DEV )); then + # Results in the rolling tag for the dev branch + CONFIG_IMAGE_REF=$( mvn initialize help:evaluate -Pct -f . -Dexpression=conf.image -q -DforceStdout ) + else + # Results in the rolling tag for the release branch (the fixed tag will be determined from this rolling tag) + # shellcheck disable=SC2016 + CONFIG_IMAGE_REF=$( mvn initialize help:evaluate -Pct -f . -Dexpression=conf.image -Dconf.image.tag='${app.image.version}-${conf.image.flavor}' -q -DforceStdout ) + fi + echo "Determined CONFIG_IMAGE_REF=$CONFIG_IMAGE_REF from Maven" + + # 4a. Check for Base image updates + NEWER_BASE_IMAGE=0 + if check_newer_parent "$BASE_IMAGE_REF" "$CONFIG_IMAGE_REF"; then + NEWER_BASE_IMAGE=1 + fi + + # 4b. Check for vulnerabilities in packages fixable by updating + FIXES_AVAILABLE=0 + if ! (( NEWER_BASE_IMAGE )) && check_trivy_fixes_for_os "$CONFIG_IMAGE_REF"; then + FIXES_AVAILABLE=1 + fi + + # 5. Get current immutable revision tag if not on the dev branch + REV=$( current_revision "$CONFIG_IMAGE_REF" ) + CURRENT_REV_TAG="${CONFIG_IMAGE_REF#*:}-r$REV" + NEXT_REV_TAG="${CONFIG_IMAGE_REF#*:}-r$(( REV + 1 ))" + + # 6. Let's put together what tags we want added to this build run + TAG_OPTIONS="" + if ! (( IS_DEV )); then + TAG_OPTIONS="-Dconf.image=$CONFIG_IMAGE_REF -Ddocker.tags.revision=$NEXT_REV_TAG" + # In case of the current release, add the "latest" tag as well. + if (( IS_CURRENT_RELEASE )); then + TAG_OPTIONS="$TAG_OPTIONS -Ddocker.tags.latest=latest" + fi + else + # shellcheck disable=SC2016 + UPCOMING_TAG=$( mvn initialize help:evaluate -Pct -f . -Dexpression=conf.image.tag -Dconf.image.tag='${app.image.version}-${conf.image.flavor}' -q -DforceStdout ) + TAG_OPTIONS="-Ddocker.tags.upcoming=$UPCOMING_TAG" + + # For the dev branch we only have rolling tags and can add them now already + SUPPORTED_ROLLING_TAGS+=("[\"unstable\", \"$UPCOMING_TAG\"]") + fi + echo "Determined these additional Maven tag options: $TAG_OPTIONS" + + # 8. Let's build the base image if necessary + NEWER_IMAGE=0 + if (( NEWER_BASE_IMAGE + FIXES_AVAILABLE + FORCE_BUILD > 0 )); then + if ! (( DRY_RUN )); then + # Build the application image, but skip the configbaker image (that's a different job)! + # shellcheck disable=SC2046 + mvn -Pct -f . deploy -Ddocker.noCache -Ddocker.platforms="${PLATFORMS}" \ + -Dapp.skipBuild -Dconf.image.base="${BASE_IMAGE_REF}" \ + -Dmaven.main.skip -Dmaven.test.skip -Dmaven.war.skip \ + -Ddocker.imagePropertyConfiguration=override $TAG_OPTIONS \ + $( if (( DAMP_RUN )); then echo "-Ddocker.skip.push -Ddocker.skip.tag"; fi ) + else + echo "Skipping Maven build as requested by DRY_RUN=1" + fi + NEWER_IMAGE=1 + # Save the information about the immutable or rolling tag we just built + if ! (( IS_DEV )); then + REBUILT_CONFIG_IMAGES+=("$BRANCH=${CONFIG_IMAGE_REF%:*}:$NEXT_REV_TAG") + else + REBUILT_CONFIG_IMAGES+=("$BRANCH=$CONFIG_IMAGE_REF") + fi + else + echo "No rebuild necessary, we're done here." + fi + + # 9. Add list of rolling and immutable tags for release builds + if ! (( IS_DEV )); then + RELEASE_TAGS_LIST="[" + if (( IS_CURRENT_RELEASE )); then + RELEASE_TAGS_LIST+="\"latest\", " + fi + RELEASE_TAGS_LIST+="\"${CONFIG_IMAGE_REF#*:}\", " + if (( NEWER_IMAGE )); then + RELEASE_TAGS_LIST+="\"$NEXT_REV_TAG\"]" + else + RELEASE_TAGS_LIST+="\"$CURRENT_REV_TAG\"]" + fi + SUPPORTED_ROLLING_TAGS+=("${RELEASE_TAGS_LIST}") + fi + + echo "::endgroup::" +done + +# Built the output which images have actually been rebuilt as JSON +REBUILT_IMAGES="[" +for IMAGE in "${REBUILT_CONFIG_IMAGES[@]}"; do + REBUILT_IMAGES+=" \"$IMAGE\" " +done +REBUILT_IMAGES+="]" +echo "rebuilt_images=${REBUILT_IMAGES// /, }" | tee -a "${GITHUB_OUTPUT}" + +# Built the supported rolling tags matrix as JSON +SUPPORTED_TAGS="{" +for (( i=0; i < ${#SUPPORTED_ROLLING_TAGS[@]} ; i++ )); do + j=$((i+1)) + SUPPORTED_TAGS+="\"${!j}\": ${SUPPORTED_ROLLING_TAGS[$i]}" + (( i < ${#SUPPORTED_ROLLING_TAGS[@]}-1 )) && SUPPORTED_TAGS+=", " +done +SUPPORTED_TAGS+="}" +echo "supported_tag_matrix=$SUPPORTED_TAGS" | tee -a "$GITHUB_OUTPUT" diff --git a/.github/workflows/scripts/containers/utils.sh b/.github/workflows/scripts/containers/utils.sh new file mode 100644 index 00000000000..a61df5467fe --- /dev/null +++ b/.github/workflows/scripts/containers/utils.sh @@ -0,0 +1,134 @@ +#!/bin/bash + +set -euo pipefail + +function is_bin_in_path { + builtin type -P "$1" &> /dev/null +} + +function check_newer_parent() { + PARENT_IMAGE="$1" + # Get namespace, default to "library" if not found + PARENT_IMAGE_NS="${PARENT_IMAGE%/*}" + if [[ "$PARENT_IMAGE_NS" = "${PARENT_IMAGE}" ]]; then + PARENT_IMAGE_NS="library" + fi + PARENT_IMAGE_REPO_CUT_NS="${PARENT_IMAGE#*/}" + PARENT_IMAGE_REPO="${PARENT_IMAGE_REPO_CUT_NS%:*}" + PARENT_IMAGE_TAG="${PARENT_IMAGE#*:}" + + PARENT_IMAGE_LAST_UPDATE="$( curl -sS "https://hub.docker.com/v2/namespaces/${PARENT_IMAGE_NS}/repositories/${PARENT_IMAGE_REPO}/tags/${PARENT_IMAGE_TAG}" | jq -r .last_updated )" + if [[ "$PARENT_IMAGE_LAST_UPDATE" = "null" ]]; then + echo "::error title='Invalid PARENT Image'::Could not find ${PARENT_IMAGE} in the registry" + exit 1 + fi + + DERIVED_IMAGE="$2" + # Get namespace, default to "library" if not found + DERIVED_IMAGE_NS="${DERIVED_IMAGE%/*}" + if [[ "${DERIVED_IMAGE_NS}" = "${DERIVED_IMAGE}" ]]; then + DERIVED_IMAGE_NS="library" + fi + DERIVED_IMAGE_REPO="$( echo "${DERIVED_IMAGE%:*}" | cut -f2 -d/ )" + DERIVED_IMAGE_TAG="${DERIVED_IMAGE#*:}" + + DERIVED_IMAGE_LAST_UPDATE="$( curl -sS "https://hub.docker.com/v2/namespaces/${DERIVED_IMAGE_NS}/repositories/${DERIVED_IMAGE_REPO}/tags/${DERIVED_IMAGE_TAG}" | jq -r .last_updated )" + if [[ "$DERIVED_IMAGE_LAST_UPDATE" = "null" || "$DERIVED_IMAGE_LAST_UPDATE" < "$PARENT_IMAGE_LAST_UPDATE" ]]; then + echo "Parent image $PARENT_IMAGE has a newer release ($PARENT_IMAGE_LAST_UPDATE), which is more recent than $DERIVED_IMAGE ($DERIVED_IMAGE_LAST_UPDATE)" + return 0 + else + echo "Parent image $PARENT_IMAGE ($PARENT_IMAGE_LAST_UPDATE) is older than $DERIVED_IMAGE ($DERIVED_IMAGE_LAST_UPDATE)" + return 1 + fi +} + +function check_newer_pkgs() { + IMAGE="$1" + PKGS="$2" + + docker run --rm -u 0 "${IMAGE}" sh -c "apt update >/dev/null 2>&1 && apt install -s ${PKGS}" | tee /proc/self/fd/2 | grep -q "0 upgraded" + STATUS=$? + + if [[ $STATUS -eq 0 ]]; then + echo "Base image $IMAGE has no updates for our custom installed packages" + return 1 + else + echo "Base image $IMAGE needs updates for our custom installed packages" + return 0 + fi + + # TODO: In a future version of this script, we might want to include checking for other security updates, + # not just updates to the packages we installed. + # grep security /etc/apt/sources.list > /tmp/security.list + # apt-get update -oDir::Etc::Sourcelist=/tmp/security.list + # apt-get dist-upgrade -y -oDir::Etc::Sourcelist=/tmp/security.list -oDir::Etc::SourceParts=/bin/false -s + +} + +function check_trivy_fixes_for_os() { + IMAGE_REF="$1" + if [[ -z "$IMAGE_REF" ]]; then + echo "You must give an image reference as argument to check_trivy_fixes_for_os" + exit 1 + fi + is_bin_in_path trivy || { echo "Trivy Scanner not installed" 1>&2; exit 1; } + JSON_REPORT=$(mktemp) + + trivy image --ignore-unfixed --scanners vuln --disable-telemetry --pkg-types os -f json "$IMAGE_REF" -o "$JSON_REPORT" + + HAS_FIXES=$( jq -r '.Results[] | select(has("Vulnerabilities") and .Vulnerabilities != null and (.Vulnerabilities | length > 100)) | .Vulnerabilities | length > 0' "$JSON_REPORT") + if [[ "true" = "$HAS_FIXES" ]]; then + echo "Trivy Scan showed fixes to known vulnerabilities by updating packages exist for image $IMAGE_REF" + return 0 + else + echo "Trivy Scan showed no fixes to known vulnerabilities by updating packages exist for image $IMAGE_REF" + return 1 + fi +} + +function current_revision() { + IMAGE="$1" + IMAGE_NS_REPO="${IMAGE%:*}" + IMAGE_TAG="${IMAGE#*:}" + + if [[ "$IMAGE_TAG" = "$IMAGE_NS_REPO" ]]; then + >&2 echo "You must provide an image reference in the format [/]:" + exit 1 + fi + + case "$IMAGE_NS_REPO" in + */*) :;; # namespace/repository syntax, leave as is + *) IMAGE_NS_REPO="library/$IMAGE_NS_REPO";; # bare repository name (docker official image); must convert to namespace/repository syntax + esac + + # Without such a token we may run into rate limits + # OB 2024-09-16: for some reason using this token stopped working. Let's go without and see if we really fall into rate limits. + # token=$( curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:$IMAGE_NS_REPO:pull" ) + + ALL_TAGS="$( + i=0 + while [ $? == 0 ]; do + i=$((i+1)) + # OB 2024-09-16: for some reason using this token stopped working. Let's go without and see if we really fall into rate limits. + # RESULT=$( curl -s -H "Authorization: Bearer $token" "https://registry.hub.docker.com/v2/repositories/$IMAGE_NS_REPO/tags/?page=$i&page_size=100" ) + RESULT=$( curl -s "https://registry.hub.docker.com/v2/repositories/$IMAGE_NS_REPO/tags/?page=$i&page_size=100" ) + if [[ $( echo "$RESULT" | jq '.message' ) != "null" ]]; then + # If we run into an error on the first attempt, that means we have a problem. + if [[ "$i" == "1" ]]; then + >&2 echo "Error when retrieving tag data: $( echo "$RESULT" | jq '.message' )" + exit 2 + # Otherwise it will just mean we reached the last page already + else + break + fi + else + echo "$RESULT" | jq -r '."results"[]["name"]' + # DEBUG: + #echo "$RESULT" | >&2 jq -r '."results"[]["name"]' + fi + done + )" + + # Note: if a former tag could not be found, it just might not exist already. Start new series with rev 0 + echo "$ALL_TAGS" | grep "${IMAGE_TAG}-r" | sed -e "s#${IMAGE_TAG}-r##" | sort -h | tail -n1 || echo "-1" +} diff --git a/.github/workflows/shellcheck.yml b/.github/workflows/shellcheck.yml new file mode 100644 index 00000000000..fb9cf5a0a1f --- /dev/null +++ b/.github/workflows/shellcheck.yml @@ -0,0 +1,45 @@ +name: "Shellcheck" +on: + push: + branches: + - develop + paths: + - conf/solr/**/.sh + - modules/container-base/**/*.sh + - modules/container-configbaker/**/*.sh + pull_request: + branches: + - develop + paths: + - conf/solr/**/*.sh + - modules/container-base/**/*.sh + - modules/container-configbaker/**/*.sh +jobs: + shellcheck: + name: Shellcheck + runs-on: ubuntu-latest + permissions: + pull-requests: write + steps: + - uses: actions/checkout@v4 + - name: shellcheck + uses: reviewdog/action-shellcheck@v1 + with: + github_token: ${{ secrets.github_token }} + reporter: github-pr-review # Change reporter. + fail_on_error: true + # Container base image uses dumb-init shebang, so nail to using bash + shellcheck_flags: "--shell=bash --external-sources" + # Exclude old scripts + exclude: | + */.git/* + doc/* + downloads/* + scripts/database/* + scripts/globalid/* + scripts/icons/* + scripts/installer/* + scripts/issues/* + scripts/r/* + scripts/tests/* + tests/* diff --git a/.github/workflows/shellspec.yml b/.github/workflows/shellspec.yml new file mode 100644 index 00000000000..cc09992edac --- /dev/null +++ b/.github/workflows/shellspec.yml @@ -0,0 +1,54 @@ +name: "Shellspec" +on: + push: + paths: + - tests/shell/** + - conf/solr/** + # add more when more specs are written relying on data + pull_request: + paths: + - tests/shell/** + - conf/solr/** + # add more when more specs are written relying on data +env: + SHELLSPEC_VERSION: 0.28.1 +jobs: + shellspec-ubuntu: + name: "Ubuntu" + runs-on: ubuntu-latest + steps: + - name: Install shellspec + run: curl -fsSL https://git.io/shellspec | sh -s ${{ env.SHELLSPEC_VERSION }} --yes + - uses: actions/checkout@v4 + - name: Run Shellspec + run: | + cd tests/shell + shellspec + shellspec-rocky9: + name: "RockyLinux 9" + runs-on: ubuntu-latest + container: + image: rockylinux/rockylinux:9 + steps: + - uses: actions/checkout@v4 + - name: Install shellspec + run: | + curl -fsSL https://github.com/shellspec/shellspec/releases/download/${{ env.SHELLSPEC_VERSION }}/shellspec-dist.tar.gz | tar -xz -C /usr/share + ln -s /usr/share/shellspec/shellspec /usr/bin/shellspec + - name: Install dependencies + run: dnf install -y ed bc diffutils + - name: Run shellspec + run: | + cd tests/shell + shellspec + shellspec-macos: + name: "MacOS" + runs-on: macos-latest + steps: + - name: Install shellspec + run: curl -fsSL https://git.io/shellspec | sh -s 0.28.1 --yes + - uses: actions/checkout@v4 + - name: Run Shellspec + run: | + cd tests/shell + /Users/runner/.local/bin/shellspec diff --git a/.github/workflows/spi_release.yml b/.github/workflows/spi_release.yml new file mode 100644 index 00000000000..6398edca412 --- /dev/null +++ b/.github/workflows/spi_release.yml @@ -0,0 +1,94 @@ +name: Dataverse SPI + +on: + push: + branches: + - "develop" + paths: + - "modules/dataverse-spi/**" + pull_request: + branches: + - "develop" + paths: + - "modules/dataverse-spi/**" + +jobs: + # Note: Pushing packages to Maven Central requires access to secrets, which pull requests from remote forks + # don't have. Skip in these cases. + check-secrets: + name: Check for Secrets Availability + runs-on: ubuntu-latest + outputs: + available: ${{ steps.secret-check.outputs.available }} + steps: + - id: secret-check + # perform secret check & put boolean result as an output + shell: bash + run: | + if [ "${{ secrets.DATAVERSEBOT_SONATYPE_USERNAME }}" != '' ]; then + echo "available=true" >> $GITHUB_OUTPUT; + else + echo "available=false" >> $GITHUB_OUTPUT; + fi + + snapshot: + name: Release Snapshot + needs: check-secrets + runs-on: ubuntu-latest + if: github.event_name == 'pull_request' && needs.check-secrets.outputs.available == 'true' + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-java@v4 + with: + java-version: '17' + distribution: 'adopt' + server-id: ossrh + server-username: MAVEN_USERNAME + server-password: MAVEN_PASSWORD + - uses: actions/cache@v4 + with: + path: ~/.m2 + key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }} + restore-keys: ${{ runner.os }}-m2 + + - name: Deploy Snapshot + run: mvn -f modules/dataverse-spi -Dproject.version.suffix="-PR${{ github.event.number }}-SNAPSHOT" deploy + env: + MAVEN_USERNAME: ${{ secrets.DATAVERSEBOT_SONATYPE_USERNAME }} + MAVEN_PASSWORD: ${{ secrets.DATAVERSEBOT_SONATYPE_TOKEN }} + + release: + name: Release + needs: check-secrets + runs-on: ubuntu-latest + if: github.event_name == 'push' && needs.check-secrets.outputs.available == 'true' + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-java@v4 + with: + java-version: '17' + distribution: 'adopt' + - uses: actions/cache@v4 + with: + path: ~/.m2 + key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }} + restore-keys: ${{ runner.os }}-m2 + + # Running setup-java again overwrites the settings.xml - IT'S MANDATORY TO DO THIS SECOND SETUP!!! + - name: Set up Maven Central Repository + uses: actions/setup-java@v4 + with: + java-version: '17' + distribution: 'adopt' + server-id: ossrh + server-username: MAVEN_USERNAME + server-password: MAVEN_PASSWORD + gpg-private-key: ${{ secrets.DATAVERSEBOT_GPG_KEY }} + gpg-passphrase: MAVEN_GPG_PASSPHRASE + + - name: Sign + Publish Release + run: mvn -f modules/dataverse-spi -P release deploy + env: + MAVEN_USERNAME: ${{ secrets.DATAVERSEBOT_SONATYPE_USERNAME }} + MAVEN_PASSWORD: ${{ secrets.DATAVERSEBOT_SONATYPE_TOKEN }} + MAVEN_GPG_PASSPHRASE: ${{ secrets.DATAVERSEBOT_GPG_PASSWORD }} diff --git a/.gitignore b/.gitignore index 2904bc578f2..bb9686ae629 100644 --- a/.gitignore +++ b/.gitignore @@ -18,7 +18,6 @@ GRTAGS .Trashes ehthumbs.db Thumbs.db -.vagrant *.pyc *.swp scripts/api/py_api_wrapper/demo-data/* @@ -34,17 +33,12 @@ oauth-credentials.md /src/main/webapp/oauth2/newAccount.html scripts/api/setup-all.sh* +scripts/api/setup-all.*.log +src/main/resources/edu/harvard/iq/dataverse/openapi/ # ctags generated tag file tags -# dependencies I'm not sure we're allowed to redistribute / have in version control -conf/docker-aio/dv/deps/ - -# no need to check aoi installer zip into vc -conf/docker-aio/dv/install/dvinstall.zip -# or copy of test data -conf/docker-aio/testdata/ scripts/installer/default.config *.pem @@ -60,3 +54,13 @@ scripts/installer/default.config tests/node_modules tests/package-lock.json venv + +# from thumbnail tests in SearchIT +scripts/search/data/binary/trees.png.thumb140 +src/main/webapp/resources/images/cc0.png.thumb140 +src/main/webapp/resources/images/dataverseproject.png.thumb140 + +# Docker development volumes +/conf/keycloak/docker-dev-volumes +/docker-dev-volumes +/.vs diff --git a/.readthedocs.yml b/.readthedocs.yml new file mode 100644 index 00000000000..cadaedc1448 --- /dev/null +++ b/.readthedocs.yml @@ -0,0 +1,21 @@ +version: 2 + +# HTML is always built, these are additional formats only +formats: + - pdf + +build: + os: ubuntu-22.04 + tools: + python: "3.10" + apt_packages: + - graphviz + +python: + install: + - requirements: doc/sphinx-guides/requirements.txt + + +sphinx: + configuration: doc/sphinx-guides/source/conf.py + fail_on_warning: true diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index b79030a217f..00000000000 --- a/.travis.yml +++ /dev/null @@ -1,5 +0,0 @@ -language: java -jdk: - - openjdk8 -script: mvn -DcompilerArgument=-Xlint:unchecked test -P all-unit-tests -after_success: mvn jacoco:report coveralls:report diff --git a/.travis.yml.future b/.travis.yml.future deleted file mode 100644 index 8bd747625e4..00000000000 --- a/.travis.yml.future +++ /dev/null @@ -1,42 +0,0 @@ -services: - - docker - -jobs: - include: - # Execute java unit- and integration tests - - stage: test - language: java - jdk: - - oraclejdk8 - script: mvn -DcompilerArgument=-Xlint:unchecked test -P all-unit-tests - after_success: mvn jacoco:report coveralls:report - - # Execute Cypress for UI testing - # see https://docs.cypress.io/guides/guides/continuous-integration.html - - stage: test - language: node_js - node_js: - - "10" - addons: - apt: - packages: - # Ubuntu 16+ does not install this dependency by default, so we need to install it ourselves - - libgconf-2-4 - cache: - # Caches $HOME/.npm when npm ci is default script command - # Caches node_modules in all other cases - npm: true - directories: - # we also need to cache folder with Cypress binary - - ~/.cache - # we want to cache the Glassfish and Solr dependencies as well - - conf/docker-aio/dv/deps - before_install: - - cd tests - install: - - npm ci - before_script: - - ./run_docker_dataverse.sh - script: - # --key needs to be injected using CYPRESS_RECORD_KEY to keep it secret - - $(npm bin)/cypress run --record diff --git a/CITATION.cff b/CITATION.cff new file mode 100644 index 00000000000..d425fe8bb34 --- /dev/null +++ b/CITATION.cff @@ -0,0 +1,20 @@ +cff-version: 1.2.0 +title: Dataverse +message: >- + If you use this software, please cite it using the + metadata from this file. +type: software +authors: + - given-names: Gary + family-names: King + orcid: 'https://orcid.org/0000-0002-5327-7631' + affiliation: Harvard University +identifiers: + - type: doi + value: 10.1177/0049124107306660 +repository-code: 'https://github.com/IQSS/dataverse' +url: 'https://dataverse.org' +abstract: >- + Dataverse is an open source software platform designed for + sharing, finding, citing, and preserving research data. +license: Apache-2.0 diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 00000000000..4204a1fc85e --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,76 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as +contributors and maintainers pledge to making participation in our project and +our community a harassment-free experience for everyone, regardless of age, body +size, disability, ethnicity, sex characteristics, gender identity and expression, +level of experience, education, socio-economic status, nationality, personal +appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment +include: + +* Using welcoming and inclusive language +* Being respectful of differing viewpoints and experiences +* Gracefully accepting constructive criticism +* Focusing on what is best for the community +* Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +* The use of sexualized language or imagery and unwelcome sexual attention or + advances +* Trolling, insulting/derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or electronic + address, without explicit permission +* Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable +behavior and are expected to take appropriate and fair corrective action in +response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or +reject comments, commits, code, wiki edits, issues, and other contributions +that are not aligned to this Code of Conduct, or to ban temporarily or +permanently any contributor for other behaviors that they deem inappropriate, +threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies both within project spaces and in public spaces +when an individual is representing the project or its community. Examples of +representing a project or community include using an official project e-mail +address, posting via an official social media account, or acting as an appointed +representative at an online or offline event. Representation of a project may be +further defined and clarified by project maintainers. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported by contacting the project team at support at dataverse dot org. All +complaints will be reviewed and investigated and will result in a response that +is deemed necessary and appropriate to the circumstances. The project team is +obligated to maintain confidentiality with regard to the reporter of an incident. +Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good +faith may face temporary or permanent repercussions as determined by other +members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, +available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see +https://www.contributor-covenant.org/faq diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2227286d4d1..4fa6e955b70 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,71 +1,7 @@ # Contributing to Dataverse -Thank you for your interest in contributing to Dataverse! We are open to contributions from everyone. You don't need permission to participate. Just jump in. If you have questions, please reach out using one or more of the channels described below. +Thank you for your interest in contributing to Dataverse! We are open to contributions from everyone. -We aren't just looking for developers. There are many ways to contribute to Dataverse. We welcome contributions of ideas, bug reports, usability research/feedback, documentation, code, and more! +Please see our [Contributor Guide][] for how you can help! -## Ideas/Feature Requests - -Your idea or feature request might already be captured in the Dataverse [issue tracker] on GitHub but if not, the best way to bring it to the community's attention is by posting on the [dataverse-community Google Group][] or bringing it up on a [Community Call][]. You're also welcome make some noise in the [#dataverse IRC channel][] (which is [logged][]) or cram your idea into 280 characters and mention [@dataverseorg][] on Twitter. To discuss your idea privately, please email it to support@dataverse.org - -There's a chance your idea is already on our roadmap, which is available at https://www.iq.harvard.edu/roadmap-dataverse-project - -[#dataverse IRC channel]: http://chat.dataverse.org -[logged]: http://irclog.iq.harvard.edu/dataverse/today -[issue tracker]: https://github.com/IQSS/dataverse/issues -[@dataverseorg]: https://twitter.com/dataverseorg - -## Usability testing - -Please email us at support@dataverse.org if you are interested in participating in usability testing. - -## Bug Reports/Issues - -An issue is a bug (a feature is no longer behaving the way it should) or a feature (something new to Dataverse that helps users complete tasks). You can browse the Dataverse [issue tracker] on GitHub by open or closed issues or by milestones. - -Before submitting an issue, please search the existing issues by using the search bar at the top of the page. If there is an existing open issue that matches the issue you want to report, please add a comment to it. - -If there is no pre-existing issue or it has been closed, please click on the "New Issue" button, log in, and write in what the issue is (unless it is a security issue which should be reported privately to security@dataverse.org). - -If you do not receive a reply to your new issue or comment in a timely manner, please email support@dataverse.org with a link to the issue. - -We are aware of the new issue templates described at https://help.github.com/articles/about-issue-and-pull-request-templates but haven't converted over yet. - -### Writing an Issue - -For the subject of an issue, please start it by writing the feature or functionality it relates to, i.e. "Create Account:..." or "Dataset Page:...". In the body of the issue, please outline the issue you are reporting with as much detail as possible. In order for the Dataverse development team to best respond to the issue, we need as much information about the issue as you can provide. Include steps to reproduce bugs. Indicate which version you're using, which is shown at the bottom of the page. We love screenshots! - -### Issue Attachments - -You can attach certain files (images, screenshots, logs, etc.) by dragging and dropping, selecting them, or pasting from the clipboard. Files must be one of GitHub's [supported attachment formats] such as png, gif, jpg, txt, pdf, zip, etc. (Pro tip: A file ending in .log can be renamed to .txt so you can upload it.) If there's no easy way to attach your file, please include a URL that points to the file in question. - -[supported attachment formats]: https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/ - -## Documentation - -The source for the documentation at http://guides.dataverse.org/en/latest/ is in the GitHub repo under the "[doc][]" folder. If you find a typo or inaccuracy or something to clarify, please send us a pull request! For more on the tools used to write docs, please see the [documentation][] section of the Developer Guide. - -[doc]: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source -[documentation]: http://guides.dataverse.org/en/latest/developers/documentation.html - -## Code/Pull Requests - -We love code contributions. Developers are not limited to the main Dataverse code in this git repo. You can help with API client libraries in your favorite language that are mentioned in the [API Guide][] or create a new library. You can help work on configuration management code that's mentioned in the [Installation Guide][]. The Installation Guide also covers a relatively new concept called "external tools" that allows developers to create their own tools that are available from within an installation of Dataverse. - -[API Guide]: http://guides.dataverse.org/en/latest/api -[Installation Guide]: http://guides.dataverse.org/en/latest/installation - -If you are interested in working on the main Dataverse code, great! Before you start coding, please reach out to us either on the [dataverse-community Google Group][], the [dataverse-dev Google Group][], [IRC][] (#dataverse on freenode), or via support@dataverse.org to make sure the effort is well coordinated and we avoid merge conflicts. We maintain a list of [community contributors][] and [dev efforts][] the community is working on so please let us know if you'd like to be added or removed from either list. - -Please read http://guides.dataverse.org/en/latest/developers/version-control.html to understand how we use the "git flow" model of development and how we will encourage you to create a GitHub issue (if it doesn't exist already) to associate with your pull request. That page also includes tips on making a pull request. - -After making your pull request, your goal should be to help it advance through our kanban board at https://github.com/orgs/IQSS/projects/2 . If no one has moved your pull request to the code review column in a timely manner, please reach out. Note that once a pull request is created for an issue, we'll remove the issue from the board so that we only track one card (the pull request). - -Thanks for your contribution! - -[dataverse-community Google Group]: https://groups.google.com/group/dataverse-community -[Community Call]: https://dataverse.org/community-calls -[dataverse-dev Google Group]: https://groups.google.com/group/dataverse-dev -[IRC]: http://chat.dataverse.org -[community contributors]: https://docs.google.com/spreadsheets/d/1o9DD-MQ0WkrYaEFTD5rF_NtyL8aUISgURsAXSL7Budk/edit?usp=sharing -[dev efforts]: https://github.com/orgs/IQSS/projects/2#column-5298405 +[Contributor Guide]: https://guides.dataverse.org/en/latest/contributor/index.html diff --git a/Dockerfile b/Dockerfile deleted file mode 100644 index 5f492ea0594..00000000000 --- a/Dockerfile +++ /dev/null @@ -1 +0,0 @@ -# See `conf/docker` for Docker images diff --git a/ISSUE_TEMPLATE.md b/ISSUE_TEMPLATE.md deleted file mode 100644 index 33bff3a9b50..00000000000 --- a/ISSUE_TEMPLATE.md +++ /dev/null @@ -1 +0,0 @@ -Thank you for contributing an issue to the Dataverse Project! If this is a bug report, please let us know when the issue occurs, which page it occurs on, to whom it occurs, and which version of Dataverse you're using. If this is a feature request, please let us know what you'd like to see and give us some context - what kind of user is the feature intended for, the relevant use cases, and what inspired the request? No matter the issue, screenshots are always welcome. diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md deleted file mode 100644 index c3675b96bc5..00000000000 --- a/PULL_REQUEST_TEMPLATE.md +++ /dev/null @@ -1,23 +0,0 @@ -## New Contributors - -Welcome! New contributors should at least glance at [CONTRIBUTING.md](/CONTRIBUTING.md), especially the section on pull requests where we encourage you to reach out to other developers before you start coding. Also, please note that we measure code coverage and prefer you write unit tests. Pull requests can still be reviewed without tests or completion of the checklist outlined below. Note that we use the "closes" syntax below to trigger Github's automation to close the corresponding issue once the pull request is merged. - -Thanks for your contribution to Dataverse! - -## Related Issues - -- closes #ISSUE_NUMBER: ISSUE_TITLE - -## Pull Request Checklist - -- [ ] Unit [tests][] completed -- [ ] Integration [tests][]: None -- [ ] Deployment requirements, [SQL updates][], [Solr updates][], etc.: None -- [ ] [Documentation][docs] completed -- [ ] Merged latest from "develop" [branch][] and resolved conflicts - -[tests]: http://guides.dataverse.org/en/latest/developers/testing.html -[SQL updates]: http://guides.dataverse.org/en/latest/developers/sql-upgrade-scripts.html -[Solr updates]: https://github.com/IQSS/dataverse/blob/develop/conf/solr/7.3.0/schema.xml -[docs]: http://guides.dataverse.org/en/latest/developers/documentation.html -[branch]: http://guides.dataverse.org/en/latest/developers/branching-strategy.html diff --git a/README.md b/README.md index 2bdc0e8edde..2303c001d2c 100644 --- a/README.md +++ b/README.md @@ -1,38 +1,108 @@ -Dataverse® +Dataverse® =============== -Dataverse is an [open source][] software platform for sharing, finding, citing, and preserving research data (developed by the [Data Science and Products team](http://www.iq.harvard.edu/people/people/data-science-products) at the [Institute for Quantitative Social Science](http://iq.harvard.edu/) and the [Dataverse community][]). +![Dataverse-logo](https://github.com/IQSS/dataverse-frontend/assets/7512607/6c4d79e4-7be5-4102-88bd-dfa167dc79d3) -[dataverse.org][] is our home on the web and shows a map of Dataverse installations around the world, a list of [features][], [integrations][] that have been made possible through [REST APIs][], our development [roadmap][], and more. +## Table of Contents -We maintain a demo site at [demo.dataverse.org][] which you are welcome to use for testing and evaluating Dataverse. +1. [❓ What is Dataverse?](#what-is-dataverse) +2. [✔ Try Dataverse](#try-dataverse) +3. [🌐 Features, Integrations, Roadmaps, and More](#website) +4. [đŸ“Ĩ Installation](#installation) +5. [🏘 Community and Support](#community-and-support) +6. [🧑‍đŸ’ģī¸ Contributing](#contributing) +7. [âš–ī¸ Legal Information](#legal-informations) -To install Dataverse, please see our [Installation Guide][] which will prompt you to download our [latest release][]. + -To discuss Dataverse with the community, please join our [mailing list][], participate in a [community call][], chat with us at [chat.dataverse.org][], or attend our annual [Dataverse Community Meeting][]. +## ❓ What is Dataverse? -We love contributors! Please see our [Contributing Guide][] for ways you can help. +Welcome to DataverseÂŽ, the [open source][] software platform designed for sharing, finding, citing, and preserving research data. Developed by the Dataverse team at the [Institute for Quantitative Social Science](https://iq.harvard.edu/) and the [Dataverse community][], our platform makes it easy for research organizations to host, manage, and share their data with the world. + + + +## ✔ Try Dataverse + +We invite you to explore our demo site at [demo.dataverse.org][]. This site is ideal for testing and evaluating Dataverse in a risk-free environment. + + + +## 🌐 Features, Integrations, Roadmaps, and More + +Visit [dataverse.org][], our home on the web, for a comprehensive overview of Dataverse. Here, you will find: + +- An interactive map showcasing Dataverse installations worldwide. +- A detailed list of [features][]. +- Information on [integrations][] that have been made possible through our [REST APIs][]. +- Our [project board][] and development [roadmap][]. +- News, events, and more. + + + +## đŸ“Ĩ Installation + +Ready to get started? Follow our [Installation Guide][] to download and install the latest release of Dataverse. + +If you are using Docker, please refer to our [Container Guide][] for detailed instructions. + + + +## 🏘 Community and Support + +Engage with the vibrant Dataverse community through various channels: + +- **[Mailing List][]**: Join the conversation on our [mailing list][]. +- **[Community Calls][]**: Participate in our regular [community calls][] to discuss new features, ask questions, and share your experiences. +- **[Chat][]**: Connect with us and other users in real-time at [dataverse.zulipchat.com][]. +- **[Dataverse Community Meeting][]**: Attend our annual [Dataverse Community Meeting][] to network, learn, and collaborate with peers and experts. +- **[DataverseTV][]**: Watch the video content from the Dataverse community on [DataverseTV][] and on [Harvard's IQSS YouTube channel][]. + + +## 🧑‍đŸ’ģī¸ Contribute to Dataverse + +We love contributors! Whether you are a developer, researcher, or enthusiast, there are many ways you can help. + +Visit our [Contributing Guide][] to learn how you can get involved. + +Join us in building and enhancing Dataverse to make research data more accessible and impactful. Your support and participation are crucial to our success! + + +## âš–ī¸ Legal Information Dataverse is a trademark of President and Fellows of Harvard College and is registered in the United States. -[![Dataverse Project logo](src/main/webapp/resources/images/dataverseproject_logo.jpg?raw=true "Dataverse Project")](http://dataverse.org) +--- +For more detailed information, visit our website at [dataverse.org][]. + +Feel free to [reach out] with any questions or feedback. Happy researching! + +[![Dataverse Project logo](src/main/webapp/resources/images/dataverseproject_logo.jpg "Dataverse Project")](http://dataverse.org) [![API Test Status](https://jenkins.dataverse.org/buildStatus/icon?job=IQSS-dataverse-develop&subject=API%20Test%20Status)](https://jenkins.dataverse.org/job/IQSS-dataverse-develop/) -[![Unit Test Status](https://img.shields.io/travis/IQSS/dataverse?label=Unit%20Test%20Status)](https://travis-ci.org/IQSS/dataverse) +[![API Test Coverage](https://img.shields.io/jenkins/coverage/jacoco?jobUrl=https%3A%2F%2Fjenkins.dataverse.org%2Fjob%2FIQSS-dataverse-develop&label=API%20Test%20Coverage)](https://jenkins.dataverse.org/job/IQSS-dataverse-develop/ws/target/coverage-it/index.html) +[![Unit Test Status](https://github.com/IQSS/dataverse/actions/workflows/maven_unit_test.yml/badge.svg?branch=develop)](https://github.com/IQSS/dataverse/actions/workflows/maven_unit_test.yml) [![Unit Test Coverage](https://img.shields.io/coveralls/github/IQSS/dataverse?label=Unit%20Test%20Coverage)](https://coveralls.io/github/IQSS/dataverse?branch=develop) +[![Guides Build Status](https://github.com/IQSS/dataverse/actions/workflows/guides_build_sphinx.yml/badge.svg)](https://github.com/IQSS/dataverse/actions/workflows/guides_build_sphinx.yml) [dataverse.org]: https://dataverse.org [demo.dataverse.org]: https://demo.dataverse.org [Dataverse community]: https://dataverse.org/developers -[Installation Guide]: http://guides.dataverse.org/en/latest/installation/index.html +[Installation Guide]: https://guides.dataverse.org/en/latest/installation/index.html [latest release]: https://github.com/IQSS/dataverse/releases +[Container Guide]: https://guides.dataverse.org/en/latest/container/index.html [features]: https://dataverse.org/software-features +[project board]: https://github.com/orgs/IQSS/projects/34 [roadmap]: https://www.iq.harvard.edu/roadmap-dataverse-project [integrations]: https://dataverse.org/integrations -[REST APIs]: http://guides.dataverse.org/en/latest/api/index.html +[REST APIs]: https://guides.dataverse.org/en/latest/api/index.html [Contributing Guide]: CONTRIBUTING.md [mailing list]: https://groups.google.com/group/dataverse-community [community call]: https://dataverse.org/community-calls -[chat.dataverse.org]: http://chat.dataverse.org +[Chat]: https://dataverse.zulipchat.com +[dataverse.zulipchat.com]: https://dataverse.zulipchat.com [Dataverse Community Meeting]: https://dataverse.org/events [open source]: LICENSE.md +[community calls]: https://dataverse.org/community-calls +[DataverseTV]: https://dataverse.org/dataversetv +[Harvard's IQSS YouTube channel]: https://www.youtube.com/@iqssatharvarduniversity8672 +[reach out]: https://dataverse.org/contact diff --git a/Vagrantfile b/Vagrantfile deleted file mode 100644 index b3c6e7b39a9..00000000000 --- a/Vagrantfile +++ /dev/null @@ -1,77 +0,0 @@ -# -*- mode: ruby -*- -# vi: set ft=ruby : - -# Vagrantfile API/syntax version. Don't touch unless you know what you're doing! -VAGRANTFILE_API_VERSION = "2" - -Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| - - config.vm.define "standalone", primary: true do |standalone| - config.vm.hostname = "standalone" - # Uncomment this temporarily to get `vagrant destroy` to work - #standalone.vm.box = "puppetlabs/centos-7.2-64-puppet" - - operating_system = "centos" - if ENV['OPERATING_SYSTEM'].nil? - config.vm.box = "puppetlabs/centos-7.2-64-puppet" - config.vm.box_version = '1.0.1' - elsif ENV['OPERATING_SYSTEM'] == 'debian' - puts "WARNING: Debian specified. Here be dragons! https://github.com/IQSS/dataverse/issues/1059" - config.vm.box_url = "http://puppet-vagrant-boxes.puppetlabs.com/debian-73-x64-virtualbox-puppet.box" - config.vm.box = "puppet-vagrant-boxes.puppetlabs.com-debian-73-x64-virtualbox-puppet.box" - else - operating_system = ENV['OPERATING_SYSTEM'] - puts "Not sure what do to with operating system: #{operating_system}" - exit 1 - end - - mailserver = "localhost" - if ENV['MAIL_SERVER'].nil? - puts "MAIL_SERVER environment variable not specified. Using #{mailserver} by default.\nTo specify it in bash: export MAIL_SERVER=localhost" - else - mailserver = ENV['MAIL_SERVER'] - puts "MAIL_SERVER environment variable found, using #{mailserver}" - end - - config.vm.provider "virtualbox" do |v| - v.memory = 2048 - v.cpus = 1 - end - config.vm.provision "shell", path: "scripts/vagrant/setup.sh" - config.vm.provision "shell", path: "scripts/vagrant/setup-solr.sh" - config.vm.provision "shell", path: "scripts/vagrant/install-dataverse.sh", args: mailserver - # FIXME: get tests working and re-enable them! - #config.vm.provision "shell", path: "scripts/vagrant/test.sh" - - config.vm.network "private_network", type: "dhcp" - config.vm.network "forwarded_port", guest: 80, host: 8888 - config.vm.network "forwarded_port", guest: 443, host: 9999 - config.vm.network "forwarded_port", guest: 8983, host: 8993 - config.vm.network "forwarded_port", guest: 8080, host: 8088 - config.vm.network "forwarded_port", guest: 8181, host: 8188 - - # FIXME: use /dataverse/downloads instead - config.vm.synced_folder "downloads", "/downloads" - # FIXME: use /dataverse/conf instead - config.vm.synced_folder "conf", "/conf" - # FIXME: use /dataverse/scripts instead - config.vm.synced_folder "scripts", "/scripts" - config.vm.synced_folder ".", "/dataverse" - end - - config.vm.define "solr", autostart: false do |solr| - config.vm.hostname = "solr" - solr.vm.box = "puppet-vagrant-boxes.puppetlabs.com-centos-65-x64-virtualbox-puppet.box" - config.vm.synced_folder ".", "/dataverse" - config.vm.network "private_network", type: "dhcp" - config.vm.network "forwarded_port", guest: 8983, host: 9001 - end - - config.vm.define "test", autostart: false do |test| - config.vm.hostname = "test" - test.vm.box = "puppet-vagrant-boxes.puppetlabs.com-centos-65-x64-virtualbox-puppet.box" - config.vm.synced_folder ".", "/dataverse" - config.vm.network "private_network", type: "dhcp" - end - -end diff --git a/checkstyle.xml b/checkstyle.xml index 5a864136fea..c00fa3a8c0c 100644 --- a/checkstyle.xml +++ b/checkstyle.xml @@ -97,7 +97,9 @@ --> - + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -XX:MaxPermSize=192m - -client - -Djava.awt.headless=true - -Djdk.corba.allowOutputStreamSubclass=true - -Djavax.xml.accessExternalSchema=all - -Djavax.management.builder.initial=com.sun.enterprise.v3.admin.AppServerMBeanServerBuilder - -XX:+UnlockDiagnosticVMOptions - -Djava.endorsed.dirs=${com.sun.aas.installRoot}/modules/endorsed${path.separator}${com.sun.aas.installRoot}/lib/endorsed - -Djava.security.policy=${com.sun.aas.instanceRoot}/config/server.policy - -Djava.security.auth.login.config=${com.sun.aas.instanceRoot}/config/login.conf - -Dcom.sun.enterprise.security.httpsOutboundKeyAlias=s1as - -Xmx512m - -Djavax.net.ssl.keyStore=${com.sun.aas.instanceRoot}/config/keystore.jks - -Djavax.net.ssl.trustStore=${com.sun.aas.instanceRoot}/config/cacerts.jks - -Djava.ext.dirs=${com.sun.aas.javaRoot}/lib/ext${path.separator}${com.sun.aas.javaRoot}/jre/lib/ext${path.separator}${com.sun.aas.instanceRoot}/lib/ext - -Djdbc.drivers=org.apache.derby.jdbc.ClientDriver - -DANTLR_USE_DIRECT_CLASS_LOADING=true - -Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.AppserverConfigEnvironmentFactory - - -Dorg.glassfish.additionalOSGiBundlesToStart=org.apache.felix.shell,org.apache.felix.gogo.runtime,org.apache.felix.gogo.shell,org.apache.felix.gogo.command,org.apache.felix.shell.remote,org.apache.felix.fileinstall - - - -Dosgi.shell.telnet.port=6666 - - -Dosgi.shell.telnet.maxconn=1 - - -Dosgi.shell.telnet.ip=127.0.0.1 - - -Dgosh.args=--nointeractive - - -Dfelix.fileinstall.dir=${com.sun.aas.installRoot}/modules/autostart/ - - -Dfelix.fileinstall.poll=5000 - - -Dfelix.fileinstall.log.level=2 - - -Dfelix.fileinstall.bundles.new.start=true - - -Dfelix.fileinstall.bundles.startTransient=true - - -Dfelix.fileinstall.disableConfigSave=false - - -XX:NewRatio=2 - - -Dcom.ctc.wstx.returnNullForDefaultNamespace=true - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -XX:MaxPermSize=192m - -server - -Djava.awt.headless=true - -Djdk.corba.allowOutputStreamSubclass=true - -XX:+UnlockDiagnosticVMOptions - -Djava.endorsed.dirs=${com.sun.aas.installRoot}/modules/endorsed${path.separator}${com.sun.aas.installRoot}/lib/endorsed - -Djava.security.policy=${com.sun.aas.instanceRoot}/config/server.policy - -Djava.security.auth.login.config=${com.sun.aas.instanceRoot}/config/login.conf - -Dcom.sun.enterprise.security.httpsOutboundKeyAlias=s1as - -Djavax.net.ssl.keyStore=${com.sun.aas.instanceRoot}/config/keystore.jks - -Djavax.net.ssl.trustStore=${com.sun.aas.instanceRoot}/config/cacerts.jks - -Djava.ext.dirs=${com.sun.aas.javaRoot}/lib/ext${path.separator}${com.sun.aas.javaRoot}/jre/lib/ext${path.separator}${com.sun.aas.instanceRoot}/lib/ext - -Djdbc.drivers=org.apache.derby.jdbc.ClientDriver - -DANTLR_USE_DIRECT_CLASS_LOADING=true - -Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.AppserverConfigEnvironmentFactory - -XX:NewRatio=2 - -Xmx512m - - -Dorg.glassfish.additionalOSGiBundlesToStart=org.apache.felix.shell,org.apache.felix.gogo.runtime,org.apache.felix.gogo.shell,org.apache.felix.gogo.command,org.apache.felix.fileinstall - - -Dosgi.shell.telnet.port=${OSGI_SHELL_TELNET_PORT} - - -Dosgi.shell.telnet.maxconn=1 - - -Dosgi.shell.telnet.ip=127.0.0.1 - - -Dgosh.args=--noshutdown -c noop=true - - -Dfelix.fileinstall.dir=${com.sun.aas.installRoot}/modules/autostart/ - - -Dfelix.fileinstall.poll=5000 - - -Dfelix.fileinstall.log.level=3 - - -Dfelix.fileinstall.bundles.new.start=true - - -Dfelix.fileinstall.bundles.startTransient=true - - -Dfelix.fileinstall.disableConfigSave=false - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/conf/docker-aio/dv/install/default.config b/conf/docker-aio/dv/install/default.config deleted file mode 100644 index 7c99866be17..00000000000 --- a/conf/docker-aio/dv/install/default.config +++ /dev/null @@ -1,16 +0,0 @@ -HOST_DNS_ADDRESS localhost -GLASSFISH_DIRECTORY /opt/glassfish4 -ADMIN_EMAIL -MAIL_SERVER mail.hmdc.harvard.edu -POSTGRES_ADMIN_PASSWORD secret -POSTGRES_SERVER db -POSTGRES_PORT 5432 -POSTGRES_DATABASE dvndb -POSTGRES_USER dvnapp -POSTGRES_PASSWORD secret -SOLR_LOCATION idx -TWORAVENS_LOCATION NOT INSTALLED -RSERVE_HOST localhost -RSERVE_PORT 6311 -RSERVE_USER rserve -RSERVE_PASSWORD rserve diff --git a/conf/docker-aio/dv/pg_hba.conf b/conf/docker-aio/dv/pg_hba.conf deleted file mode 100644 index 77feba5247d..00000000000 --- a/conf/docker-aio/dv/pg_hba.conf +++ /dev/null @@ -1,91 +0,0 @@ -# PostgreSQL Client Authentication Configuration File -# =================================================== -# -# Refer to the "Client Authentication" section in the PostgreSQL -# documentation for a complete description of this file. A short -# synopsis follows. -# -# This file controls: which hosts are allowed to connect, how clients -# are authenticated, which PostgreSQL user names they can use, which -# databases they can access. Records take one of these forms: -# -# local DATABASE USER METHOD [OPTIONS] -# host DATABASE USER ADDRESS METHOD [OPTIONS] -# hostssl DATABASE USER ADDRESS METHOD [OPTIONS] -# hostnossl DATABASE USER ADDRESS METHOD [OPTIONS] -# -# (The uppercase items must be replaced by actual values.) -# -# The first field is the connection type: "local" is a Unix-domain -# socket, "host" is either a plain or SSL-encrypted TCP/IP socket, -# "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a -# plain TCP/IP socket. -# -# DATABASE can be "all", "sameuser", "samerole", "replication", a -# database name, or a comma-separated list thereof. The "all" -# keyword does not match "replication". Access to replication -# must be enabled in a separate record (see example below). -# -# USER can be "all", a user name, a group name prefixed with "+", or a -# comma-separated list thereof. In both the DATABASE and USER fields -# you can also write a file name prefixed with "@" to include names -# from a separate file. -# -# ADDRESS specifies the set of hosts the record matches. It can be a -# host name, or it is made up of an IP address and a CIDR mask that is -# an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that -# specifies the number of significant bits in the mask. A host name -# that starts with a dot (.) matches a suffix of the actual host name. -# Alternatively, you can write an IP address and netmask in separate -# columns to specify the set of hosts. Instead of a CIDR-address, you -# can write "samehost" to match any of the server's own IP addresses, -# or "samenet" to match any address in any subnet that the server is -# directly connected to. -# -# METHOD can be "trust", "reject", "md5", "password", "gss", "sspi", -# "krb5", "ident", "peer", "pam", "ldap", "radius" or "cert". Note that -# "password" sends passwords in clear text; "md5" is preferred since -# it sends encrypted passwords. -# -# OPTIONS are a set of options for the authentication in the format -# NAME=VALUE. The available options depend on the different -# authentication methods -- refer to the "Client Authentication" -# section in the documentation for a list of which options are -# available for which authentication methods. -# -# Database and user names containing spaces, commas, quotes and other -# special characters must be quoted. Quoting one of the keywords -# "all", "sameuser", "samerole" or "replication" makes the name lose -# its special character, and just match a database or username with -# that name. -# -# This file is read on server startup and when the postmaster receives -# a SIGHUP signal. If you edit the file on a running system, you have -# to SIGHUP the postmaster for the changes to take effect. You can -# use "pg_ctl reload" to do that. - -# Put your actual configuration here -# ---------------------------------- -# -# If you want to allow non-local connections, you need to add more -# "host" records. In that case you will also need to make PostgreSQL -# listen on a non-local interface via the listen_addresses -# configuration parameter, or via the -i or -h command line switches. - - - -# TYPE DATABASE USER ADDRESS METHOD - -# "local" is for Unix domain socket connections only -#local all all peer -local all all trust -# IPv4 local connections: -#host all all 127.0.0.1/32 trust -host all all 0.0.0.0/0 trust -# IPv6 local connections: -host all all ::1/128 trust -# Allow replication connections from localhost, by a user with the -# replication privilege. -#local replication postgres peer -#host replication postgres 127.0.0.1/32 ident -#host replication postgres ::1/128 ident diff --git a/conf/docker-aio/entrypoint.bash b/conf/docker-aio/entrypoint.bash deleted file mode 100755 index da01ee56153..00000000000 --- a/conf/docker-aio/entrypoint.bash +++ /dev/null @@ -1,17 +0,0 @@ -#!/usr/bin/env bash -export LANG=en_US.UTF-8 -#sudo -u postgres /usr/bin/postgres -D /var/lib/pgsql/data & -sudo -u postgres /usr/pgsql-9.6/bin/postgres -D /var/lib/pgsql/data & -cd /opt/solr-7.3.1/ -# TODO: Run Solr as non-root and remove "-force". -bin/solr start -force -bin/solr create_core -c collection1 -d server/solr/collection1/conf -force - -# start apache, in both foreground and background... -apachectl -DFOREGROUND & - -# TODO: Run Glassfish as non-root. -cd /opt/glassfish4 -bin/asadmin start-domain --debug -sleep infinity - diff --git a/conf/docker-aio/httpd.conf b/conf/docker-aio/httpd.conf deleted file mode 100644 index bf3e244a34e..00000000000 --- a/conf/docker-aio/httpd.conf +++ /dev/null @@ -1,30 +0,0 @@ - -Include conf.d/*.conf -Include conf.modules.d/*.conf -ServerName localhost -Listen 80 443 -PidFile run/httpd.pid -DocumentRoot "/var/www/html" -TypesConfig /etc/mime.types -User apache -Group apache - - - ServerName localhost - LogLevel debug - ErrorLog logs/error_log - LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined - CustomLog logs/access_log combined - - # proxy config (aka - what to send to glassfish or not) - ProxyPassMatch ^/RApacheInfo$ ! - ProxyPassMatch ^/custom ! - ProxyPassMatch ^/dataexplore ! - ProxyPassMatch ^/Shibboleth.sso ! - ProxyPassMatch ^/shibboleth-ds ! - # pass everything else to Glassfish - ProxyPass / ajp://localhost:8009/ -# glassfish can be slow sometimes - ProxyTimeout 300 - - diff --git a/conf/docker-aio/install.bash b/conf/docker-aio/install.bash deleted file mode 100755 index 2b3275ad830..00000000000 --- a/conf/docker-aio/install.bash +++ /dev/null @@ -1,10 +0,0 @@ -#!/usr/bin/env bash -sudo -u postgres createuser --superuser dvnapp -#./entrypoint.bash & -unzip dvinstall.zip -cd dvinstall/ -echo "beginning installer" -./install -admin_email=dvAdmin@mailinator.com -y -f > install.out 2> install.err - -echo "installer complete" -cat install.err diff --git a/conf/docker-aio/prep_it.bash b/conf/docker-aio/prep_it.bash deleted file mode 100755 index e0078b679b2..00000000000 --- a/conf/docker-aio/prep_it.bash +++ /dev/null @@ -1,55 +0,0 @@ -#!/usr/bin/env bash - -# run through all the steps to setup docker-aio to run integration tests - -# hard-codes several assumptions: image is named dv0, container is named dv, port is 8084 - -# glassfish healthy/ready retries -n_wait=5 - -cd conf/docker-aio -./0prep_deps.sh -./1prep.sh -docker build -t dv0 -f c7.dockerfile . -# cleanup from previous runs if necessary -docker rm -f dv -# start container -docker run -d -p 8084:80 -p 8083:8080 -p 9010:9009 --name dv dv0 -# wait for glassfish to be healthy -i_wait=0 -d_wait=10 -while [ $i_wait -lt $n_wait ] -do - h=`docker inspect -f "{{.State.Health.Status}}" dv` - if [ "healthy" == "${h}" ]; then - break - else - sleep $d_wait - fi - i_wait=$(( $i_wait + 1 )) - -done -# try setupIT.bash -docker exec dv /opt/dv/setupIT.bash -err=$? -if [ $err -ne 0 ]; then - echo "error - setupIT failure" - exit 1 -fi -# configure DOI provider based on docker build arguments / environmental variables -docker exec dv /opt/dv/configure_doi.bash -err=$? -if [ $err -ne 0 ]; then - echo "error - DOI configuration failure" - exit 1 -fi -# handle config for the private url test (and things like publishing...) -./seturl.bash - - -cd ../.. -#echo "docker-aio ready to run integration tests ($i_retry)" -echo "docker-aio ready to run integration tests" -curl http://localhost:8084/api/info/version -echo $? - diff --git a/conf/docker-aio/readme.md b/conf/docker-aio/readme.md deleted file mode 100644 index 80faa116966..00000000000 --- a/conf/docker-aio/readme.md +++ /dev/null @@ -1,60 +0,0 @@ -# Docker All-In-One - -First pass docker all-in-one image, intended for running integration tests against. -Also usable for normal development and system evaluation; not intended for production. - -### Requirements: - - java8 compiler, maven, make, wget, docker - -### Quickstart: - - in the root of the repository, run `./conf/docker-aio/prep_it.bash` - - if using DataCite test credentials, update the build args appropriately. - - if all goes well, you should see the results of the `api/info/version` endpoint, including the deployed build (eg `{"status":"OK","data":{"version":"4.8.6","build":"develop-c3e9f40"}}`). If not, you may need to read the non-quickstart instructions. - - run integration tests: `./conf/docker-aio/run-test-suite.sh` - ----- - -## More in-depth documentation: - - -### Initial setup (aka - do once): -- `cd conf/docker-aio` and run `./0prep_deps.sh` to created Glassfish and Solr tarballs in `conf/docker-aio/dv/deps`. - -### Per-build: - -> Note: If you encounter any issues, see the Troubleshooting section at the end of this document. - -#### Setup - -- `cd conf/docker-aio`, and run `./1prep.sh` to copy files for integration test data into docker build context; `1prep.sh` will also build the war file and installation zip file -- build the docker image: `docker build -t dv0 -f c7.dockerfile .` - -- Run image: `docker run -d -p 8083:8080 -p 8084:80 --name dv dv0` (aka - forward port 8083 locally to 8080 in the container for glassfish, and 8084 to 80 for apache); if you'd like to connect a java debugger to glassfish, use `docker run -d -p 8083:8080 -p 8084:80 -p 9010:9009 --name dv dv0` - -- Installation (integration test): `docker exec dv /opt/dv/setupIT.bash` - (Note that it's possible to customize the installation by editing `conf/docker-aio/default.config` and running `docker exec dv /opt/dv/install.bash` but for the purposes of integration testing, the `setupIT.bash` script above works fine.) - -- update `dataverse.siteUrl` (appears only necessary for `DatasetsIT.testPrivateUrl`): `docker exec dv /usr/local/glassfish4/bin/asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8084"` (or use the provided `seturl.bash`) - -#### Run integration tests: - -First, cd back to the root of the repo where the `pom.xml` file is (`cd ../..` assuming you're still in the `conf/docker-aio` directory). Then run the test suite with script below: - -`conf/docker-aio/run-test-suite.sh` - -There isn't any strict requirement on the local port (8083, 8084 in this doc), the name of the image (dv0) or container (dv), these can be changed as desired as long as they are consistent. - -### Troubleshooting Notes: - -* If Dataverse' build fails due to an error about `Module` being ambiguous, you might be using a Java 9 compiler. - -* If you see an error like this: - ``` - docker: Error response from daemon: Conflict. The container name "/dv" is already in use by container "5f72a45b68c86c7b0f4305b83ce7d663020329ea4e30fa2a3ce9ddb05223533d" - You have to remove (or rename) that container to be able to reuse that name. - ``` - run something like `docker ps -a | grep dv` to see the container left over from the last run and something like `docker rm 5f72a45b68c8` to remove it. Then try the `docker run` command above again. - -* `empty reply from server` or `Failed to connect to ::1: Cannot assign requested address` tend to indicate either that you haven't given glassfish enough time to start, or your docker setup is in an inconsistent state and should probably be restarted. - -* For manually fiddling around with the created dataverse, use user `dataverseAdmin` with password `admin1`. diff --git a/conf/docker-aio/run-test-suite.sh b/conf/docker-aio/run-test-suite.sh deleted file mode 100755 index 811bc579c6d..00000000000 --- a/conf/docker-aio/run-test-suite.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/sh -# This is the canonical list of which "IT" tests are expected to pass. - -dvurl=$1 -if [ -z "$dvurl" ]; then - dvurl="http://localhost:8084" -fi - -# Please note the "dataverse.test.baseurl" is set to run for "all-in-one" Docker environment. -# TODO: Rather than hard-coding the list of "IT" classes here, add a profile to pom.xml. -mvn test -Dtest=DataversesIT,DatasetsIT,SwordIT,AdminIT,BuiltinUsersIT,UsersIT,UtilIT,ConfirmEmailIT,FileMetadataIT,FilesIT,SearchIT,InReviewWorkflowIT,HarvestingServerIT,MoveIT,MakeDataCountApiIT,FileTypeDetectionIT,EditDDIIT,ExternalToolsIT,AccessIT -Ddataverse.test.baseurl=$dvurl diff --git a/conf/docker-aio/setupIT.bash b/conf/docker-aio/setupIT.bash deleted file mode 100755 index 528b8f3c5f8..00000000000 --- a/conf/docker-aio/setupIT.bash +++ /dev/null @@ -1,13 +0,0 @@ -#!/usr/bin/env bash - -# do integration-test install and test data setup - -cd /opt/dv -unzip dvinstall.zip -cd /opt/dv/testdata -./scripts/deploy/phoenix.dataverse.org/prep -./db.sh -./install # modified from phoenix -/usr/local/glassfish4/glassfish/bin/asadmin deploy /opt/dv/dvinstall/dataverse.war -./post # modified from phoenix - diff --git a/conf/docker-aio/seturl.bash b/conf/docker-aio/seturl.bash deleted file mode 100755 index a62fb6b3ea7..00000000000 --- a/conf/docker-aio/seturl.bash +++ /dev/null @@ -1,3 +0,0 @@ -#!/usr/bin/env bash - -docker exec dv /usr/local/glassfish4/bin/asadmin create-jvm-options "\"-Ddataverse.siteUrl=http\://localhost\:8084\"" diff --git a/conf/docker-aio/testdata/httpd.conf b/conf/docker-aio/testdata/httpd.conf deleted file mode 100644 index bf3e244a34e..00000000000 --- a/conf/docker-aio/testdata/httpd.conf +++ /dev/null @@ -1,30 +0,0 @@ - -Include conf.d/*.conf -Include conf.modules.d/*.conf -ServerName localhost -Listen 80 443 -PidFile run/httpd.pid -DocumentRoot "/var/www/html" -TypesConfig /etc/mime.types -User apache -Group apache - - - ServerName localhost - LogLevel debug - ErrorLog logs/error_log - LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined - CustomLog logs/access_log combined - - # proxy config (aka - what to send to glassfish or not) - ProxyPassMatch ^/RApacheInfo$ ! - ProxyPassMatch ^/custom ! - ProxyPassMatch ^/dataexplore ! - ProxyPassMatch ^/Shibboleth.sso ! - ProxyPassMatch ^/shibboleth-ds ! - # pass everything else to Glassfish - ProxyPass / ajp://localhost:8009/ -# glassfish can be slow sometimes - ProxyTimeout 300 - - diff --git a/conf/docker-aio/testscripts/db.sh b/conf/docker-aio/testscripts/db.sh deleted file mode 100755 index aeb09f0a7de..00000000000 --- a/conf/docker-aio/testscripts/db.sh +++ /dev/null @@ -1,3 +0,0 @@ -#!/bin/sh -psql -U postgres -c "CREATE ROLE dvnapp UNENCRYPTED PASSWORD 'secret' SUPERUSER CREATEDB CREATEROLE INHERIT LOGIN" template1 -psql -U dvnapp -c 'CREATE DATABASE "dvndb" WITH OWNER = "dvnapp"' template1 diff --git a/conf/docker-aio/testscripts/install b/conf/docker-aio/testscripts/install deleted file mode 100755 index a994fe2920d..00000000000 --- a/conf/docker-aio/testscripts/install +++ /dev/null @@ -1,22 +0,0 @@ -#!/bin/sh -export HOST_ADDRESS=localhost -export GLASSFISH_ROOT=/usr/local/glassfish4 -export FILES_DIR=/usr/local/glassfish4/glassfish/domains/domain1/files -export DB_NAME=dvndb -export DB_PORT=5432 -export DB_HOST=localhost -export DB_USER=dvnapp -export DB_PASS=secret -export RSERVE_HOST=localhost -export RSERVE_PORT=6311 -export RSERVE_USER=rserve -export RSERVE_PASS=rserve -export SMTP_SERVER=localhost -export MEM_HEAP_SIZE=2048 -export GLASSFISH_DOMAIN=domain1 -cd scripts/installer -cp pgdriver/postgresql-42.2.2.jar $GLASSFISH_ROOT/glassfish/lib -#cp ../../conf/jhove/jhove.conf $GLASSFISH_ROOT/glassfish/domains/$GLASSFISH_DOMAIN/config/jhove.conf -cp /opt/dv/testdata/jhove.conf $GLASSFISH_ROOT/glassfish/domains/$GLASSFISH_DOMAIN/config/jhove.conf -cp /opt/dv/testdata/jhoveConfig.xsd $GLASSFISH_ROOT/glassfish/domains/$GLASSFISH_DOMAIN/config/jhoveConfig.xsd -./glassfish-setup.sh diff --git a/conf/docker-aio/testscripts/post b/conf/docker-aio/testscripts/post deleted file mode 100755 index f38a704e454..00000000000 --- a/conf/docker-aio/testscripts/post +++ /dev/null @@ -1,14 +0,0 @@ -#/bin/sh -cd scripts/api -./setup-all.sh --insecure -p=admin1 | tee /tmp/setup-all.sh.out -cd ../.. -psql -U dvnapp dvndb -f scripts/database/reference_data.sql -psql -U dvnapp dvndb -f doc/sphinx-guides/source/_static/util/createsequence.sql -scripts/search/tests/publish-dataverse-root -#git checkout scripts/api/data/dv-root.json -scripts/search/tests/grant-authusers-add-on-root -scripts/search/populate-users -scripts/search/create-users -scripts/search/tests/create-all-and-test -scripts/search/tests/publish-spruce1-and-test -#java -jar downloads/schemaSpy_5.0.0.jar -t pgsql -host localhost -db dvndb -u postgres -p secret -s public -dp scripts/installer/pgdriver/postgresql-9.1-902.jdbc4.jar -o /var/www/html/schemaspy/latest diff --git a/conf/docker-dcm/.gitignore b/conf/docker-dcm/.gitignore deleted file mode 100644 index ac39981ce6a..00000000000 --- a/conf/docker-dcm/.gitignore +++ /dev/null @@ -1,2 +0,0 @@ -*.rpm -upload*.bash diff --git a/conf/docker-dcm/0prep.sh b/conf/docker-dcm/0prep.sh deleted file mode 100755 index 300aa39d567..00000000000 --- a/conf/docker-dcm/0prep.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/sh -DCM_VERSION=0.5 -RSAL_VERSION=0.1 - -if [ ! -e dcm-${DCM_VERSION}-0.noarch.rpm ]; then - wget https://github.com/sbgrid/data-capture-module/releases/download/${DCM_VERSION}/dcm-${DCM_VERSION}-0.noarch.rpm -fi - -if [ ! -e rsal-${RSAL_VERSION}-0.noarch.rpm ] ;then - wget https://github.com/sbgrid/rsal/releases/download/${RSAL_VERSION}/rsal-${RSAL_VERSION}-0.noarch.rpm -fi diff --git a/conf/docker-dcm/c6client.dockerfile b/conf/docker-dcm/c6client.dockerfile deleted file mode 100644 index e4d1ae7da82..00000000000 --- a/conf/docker-dcm/c6client.dockerfile +++ /dev/null @@ -1,7 +0,0 @@ -# build from repo root -FROM centos:6 -RUN yum install -y epel-release -RUN yum install -y rsync openssh-clients jq curl wget lynx -RUN useradd depositor -USER depositor -WORKDIR /home/depositor diff --git a/conf/docker-dcm/cfg/dcm/bashrc b/conf/docker-dcm/cfg/dcm/bashrc deleted file mode 100644 index 07137ab8471..00000000000 --- a/conf/docker-dcm/cfg/dcm/bashrc +++ /dev/null @@ -1,18 +0,0 @@ -# .bashrc - -# User specific aliases and functions - -alias rm='rm -i' -alias cp='cp -i' -alias mv='mv -i' - -# Source global definitions -if [ -f /etc/bashrc ]; then - . /etc/bashrc -fi - -# these are dummy values, obviously -export UPLOADHOST=dcmsrv -export DVAPIKEY=burrito -export DVHOSTINT=dvsrv -export DVHOST=dvsrv diff --git a/conf/docker-dcm/cfg/dcm/entrypoint-dcm.sh b/conf/docker-dcm/cfg/dcm/entrypoint-dcm.sh deleted file mode 100755 index 0db674bfac4..00000000000 --- a/conf/docker-dcm/cfg/dcm/entrypoint-dcm.sh +++ /dev/null @@ -1,6 +0,0 @@ -#!/bin/sh - -/etc/init.d/sshd start -/etc/init.d/redis start -/etc/init.d/rq start -lighttpd -D -f /etc/lighttpd/lighttpd.conf diff --git a/conf/docker-dcm/cfg/dcm/healthcheck-dcm.sh b/conf/docker-dcm/cfg/dcm/healthcheck-dcm.sh deleted file mode 100755 index 3964a79391e..00000000000 --- a/conf/docker-dcm/cfg/dcm/healthcheck-dcm.sh +++ /dev/null @@ -1,14 +0,0 @@ -#!/bin/sh - -r_rq=`/etc/init.d/rq status` -if [ "rq_worker running" != "$r_rq" ]; then - echo "rq failed" - exit 1 -fi -r_www=`/etc/init.d/lighttpd status` -e_www=$? -if [ 0 -ne $e_www ]; then - echo "lighttpd failed" - exit 2 -fi - diff --git a/conf/docker-dcm/cfg/dcm/rq-init-d b/conf/docker-dcm/cfg/dcm/rq-init-d deleted file mode 100755 index 093cd894376..00000000000 --- a/conf/docker-dcm/cfg/dcm/rq-init-d +++ /dev/null @@ -1,57 +0,0 @@ -#!/bin/bash - -# chkconfig: 2345 90 60 -# description: rq worker script (single worker process) - -# example rq configuration file (to be placed in /etc/init.d) - -# works on cent6 - -DAEMON=rq_worker -DAEMON_PATH=/opt/dcm/gen/ -export UPLOADHOST=dcmsrv -VIRTUALENV= -LOGFILE=/var/log/${DAEMON}.log -PIDFILE=/var/run/${DAEMON}.pid - -case "$1" in -start) - printf "%-50s" "starting $DAEMON..." - cd $DAEMON_PATH - if [ ! -z "$VIRTUALENV" ]; then - source $VIRTUALENV/bin/activate - fi - rq worker normal --pid $PIDFILE > ${LOGFILE} 2>&1 & -;; -status) - if [ -f $PIDFILE ]; then - PID=`cat $PIDFILE` - if [ -z "`ps axf | grep ${PID} | grep -v grep`" ]; then - printf "%s\n" "$DAEMON not running, but PID file ($PIDFILE) exists" - else - echo "$DAEMON running" - fi - else - printf "%s\n" "$DAEMON not running" - fi -;; -stop) - printf "%-50s" "stopping $DAEMON" - if [ -f $PIDFILE ]; then - PID=`cat $PIDFILE` - kill -HUP $PID - rm -f $PIDFILE - else - printf "%s\n" "no PID file ($PIDFILE) - maybe not running" - fi -;; -restart) - $0 stop - $0 start -;; - -*) - echo "Usage: $0 {status|start|stop|restart}" - exit 1 -esac - diff --git a/conf/docker-dcm/cfg/dcm/test_install.sh b/conf/docker-dcm/cfg/dcm/test_install.sh deleted file mode 100755 index 3026ceb9fa5..00000000000 --- a/conf/docker-dcm/cfg/dcm/test_install.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/sh - -cp /etc/dcm/rq-init-d /etc/init.d/rq -cp /etc/dcm/lighttpd-conf-dcm /etc/lighttpd/lighttpd.conf -cp /etc/dcm/lighttpd-modules-dcm /etc/lighttpd/modules.conf -cp /etc/dcm/dcm-rssh.conf /etc/rssh.conf - diff --git a/conf/docker-dcm/cfg/rsal/entrypoint-rsal.sh b/conf/docker-dcm/cfg/rsal/entrypoint-rsal.sh deleted file mode 100755 index 92466c3bd4b..00000000000 --- a/conf/docker-dcm/cfg/rsal/entrypoint-rsal.sh +++ /dev/null @@ -1,5 +0,0 @@ -#!/bin/sh - -#/usr/bin/rsync --no-detach --daemon --config /etc/rsyncd.conf -/usr/bin/rsync --daemon --config /etc/rsyncd.conf -lighttpd -D -f /etc/lighttpd/lighttpd.conf diff --git a/conf/docker-dcm/cfg/rsal/lighttpd-modules.conf b/conf/docker-dcm/cfg/rsal/lighttpd-modules.conf deleted file mode 100644 index cdb1438af82..00000000000 --- a/conf/docker-dcm/cfg/rsal/lighttpd-modules.conf +++ /dev/null @@ -1,174 +0,0 @@ -####################################################################### -## -## ansible managed -# -## Modules to load -## ----------------- -## -## at least mod_access and mod_accesslog should be loaded -## all other module should only be loaded if really neccesary -## -## - saves some time -## - saves memory -## -## the default module set contains: -## -## "mod_indexfile", "mod_dirlisting", "mod_staticfile" -## -## you dont have to include those modules in your list -## -## Modules, which are pulled in via conf.d/*.conf -## -## NOTE: the order of modules is important. -## -## - mod_accesslog -> conf.d/access_log.conf -## - mod_compress -> conf.d/compress.conf -## - mod_status -> conf.d/status.conf -## - mod_webdav -> conf.d/webdav.conf -## - mod_cml -> conf.d/cml.conf -## - mod_evhost -> conf.d/evhost.conf -## - mod_simple_vhost -> conf.d/simple_vhost.conf -## - mod_mysql_vhost -> conf.d/mysql_vhost.conf -## - mod_trigger_b4_dl -> conf.d/trigger_b4_dl.conf -## - mod_userdir -> conf.d/userdir.conf -## - mod_rrdtool -> conf.d/rrdtool.conf -## - mod_ssi -> conf.d/ssi.conf -## - mod_cgi -> conf.d/cgi.conf -## - mod_scgi -> conf.d/scgi.conf -## - mod_fastcgi -> conf.d/fastcgi.conf -## - mod_proxy -> conf.d/proxy.conf -## - mod_secdownload -> conf.d/secdownload.conf -## - mod_expire -> conf.d/expire.conf -## - -server.modules = ( - "mod_access", -# "mod_alias", -# "mod_auth", -# "mod_evasive", -# "mod_redirect", -# "mod_rewrite", -# "mod_setenv", -# "mod_usertrack", -) - -## -####################################################################### - -####################################################################### -## -## Config for various Modules -## - -## -## mod_ssi -## -#include "conf.d/ssi.conf" - -## -## mod_status -## -#include "conf.d/status.conf" - -## -## mod_webdav -## -#include "conf.d/webdav.conf" - -## -## mod_compress -## -#include "conf.d/compress.conf" - -## -## mod_userdir -## -#include "conf.d/userdir.conf" - -## -## mod_magnet -## -#include "conf.d/magnet.conf" - -## -## mod_cml -## -#include "conf.d/cml.conf" - -## -## mod_rrdtool -## -#include "conf.d/rrdtool.conf" - -## -## mod_proxy -## -#include "conf.d/proxy.conf" - -## -## mod_expire -## -#include "conf.d/expire.conf" - -## -## mod_secdownload -## -#include "conf.d/secdownload.conf" - -## -####################################################################### - -####################################################################### -## -## CGI modules -## - -## -## SCGI (mod_scgi) -## -#include "conf.d/scgi.conf" - -## -## FastCGI (mod_fastcgi) -## -#include "conf.d/fastcgi.conf" - -## -## plain old CGI (mod_cgi) -## -include "conf.d/cgi.conf" - -## -####################################################################### - -####################################################################### -## -## VHost Modules -## -## Only load ONE of them! -## ======================== -## - -## -## You can use conditionals for vhosts aswell. -## -## see http://www.lighttpd.net/documentation/configuration.html -## - -## -## mod_evhost -## -#include "conf.d/evhost.conf" - -## -## mod_simple_vhost -## -#include "conf.d/simple_vhost.conf" - -## -## mod_mysql_vhost -## -#include "conf.d/mysql_vhost.conf" - -## -####################################################################### diff --git a/conf/docker-dcm/cfg/rsal/lighttpd.conf b/conf/docker-dcm/cfg/rsal/lighttpd.conf deleted file mode 100644 index 5874d60eb48..00000000000 --- a/conf/docker-dcm/cfg/rsal/lighttpd.conf +++ /dev/null @@ -1,43 +0,0 @@ -## lighttpd configuration customized for RSAL; centos7 - -# refuse connections not from frontend or localhost -# DO NOT HAVE THIS OPEN TO THE WORLD!!! -#$HTTP["remoteip"] !~ "192.168.2.2|127.0.0.1" { -#url.access-deny = ("") -#} -server.breakagelog = "/var/log/lighttpd/breakage.log" - -####################################################################### -## -## Some Variable definition which will make chrooting easier. -## -## if you add a variable here. Add the corresponding variable in the -## chroot example aswell. -## -var.log_root = "/var/log/lighttpd" -var.server_root = "/opt/rsal/api" -var.state_dir = "/var/run" -var.home_dir = "/var/lib/lighttpd" -var.conf_dir = "/etc/lighttpd" - -var.cache_dir = "/var/cache/lighttpd" -var.socket_dir = home_dir + "/sockets" -include "modules.conf" -server.port = 80 -server.use-ipv6 = "disable" -server.username = "lighttpd" -server.groupname = "lighttpd" -server.document-root = server_root -server.pid-file = state_dir + "/lighttpd.pid" -server.errorlog = log_root + "/error.log" -include "conf.d/access_log.conf" -include "conf.d/debug.conf" -server.event-handler = "linux-sysepoll" -server.network-backend = "linux-sendfile" -server.stat-cache-engine = "simple" -server.max-connections = 1024 -static-file.exclude-extensions = ( ".php", ".pl", ".fcgi", ".scgi" ) -include "conf.d/mime.conf" -include "conf.d/dirlisting.conf" -server.follow-symlink = "enable" -server.upload-dirs = ( "/var/tmp" ) diff --git a/conf/docker-dcm/cfg/rsal/rsyncd.conf b/conf/docker-dcm/cfg/rsal/rsyncd.conf deleted file mode 100644 index 5a15ab28a12..00000000000 --- a/conf/docker-dcm/cfg/rsal/rsyncd.conf +++ /dev/null @@ -1,8 +0,0 @@ -lock file=/var/run/rsync.lock -log file=/var/log/rsyncd.log -pid file=/var/log/rsyncd.pid - -[10.5072] - path=/public/ - read only=yes - diff --git a/conf/docker-dcm/configure_dcm.sh b/conf/docker-dcm/configure_dcm.sh deleted file mode 100755 index 5b65b0a0314..00000000000 --- a/conf/docker-dcm/configure_dcm.sh +++ /dev/null @@ -1,26 +0,0 @@ -#!/bin/sh - -echo "dcm configs on dv side to be done" - -# in homage to dataverse traditions, reset to insecure "burrito" admin API key -sudo -u postgres psql -c "update apitoken set tokenstring='burrito' where id=1;" dvndb -sudo -u postgres psql -c "update authenticateduser set superuser='t' where id=1;" dvndb - -# dataverse configs for DCM -curl -X PUT -d "SHA-1" "http://localhost:8080/api/admin/settings/:FileFixityChecksumAlgorithm" -curl -X PUT "http://localhost:8080/api/admin/settings/:UploadMethods" -d "dcm/rsync+ssh" -curl -X PUT "http://localhost:8080/api/admin/settings/:DataCaptureModuleUrl" -d "http://dcmsrv" - -# configure for RSAL downloads; but no workflows or RSAL yet -curl -X PUT "http://localhost:8080/api/admin/settings/:DownloadMethods" -d "rsal/rsync" - -# publish root dataverse -curl -X POST -H "X-Dataverse-key: burrito" "http://localhost:8080/api/dataverses/root/actions/:publish" - -# symlink `hold` volume -mkdir -p /usr/local/glassfish4/glassfish/domains/domain1/files/ -ln -s /hold /usr/local/glassfish4/glassfish/domains/domain1/files/10.5072 - -# need to set siteUrl -cd /usr/local/glassfish4 -bin/asadmin create-jvm-options "\"-Ddataverse.siteUrl=http\://localhost\:8084\"" diff --git a/conf/docker-dcm/configure_rsal.sh b/conf/docker-dcm/configure_rsal.sh deleted file mode 100755 index 5db43a34381..00000000000 --- a/conf/docker-dcm/configure_rsal.sh +++ /dev/null @@ -1,21 +0,0 @@ -#!/bin/sh - -fn=rsal-workflow2.json -# needs an actual IP (vs a hostname) for whitelist -rsalip=`dig +short rsalsrv` - -# create workflow -curl -s -X POST -H "Content-type: application/json" -d @${fn} "http://localhost:8080/api/admin/workflows" - -# put rsal on the whitelist -curl -X PUT -d "127.0.0.1;${rsalip}" "http://localhost:8080/api/admin/workflows/ip-whitelist" - -# set workflow as default -curl -X PUT -d "1" "http://localhost:8080/api/admin/workflows/default/PrePublishDataset" - -# local access path -curl -X PUT -d "/hpc/storage" "http://localhost:8080/api/admin/settings/:LocalDataAccessPath" - -# storage sites -curl -X POST -H "Content-type: application/json" --upload-file site-primary.json "http://localhost:8080/api/admin/storageSites" -curl -X POST -H "Content-type: application/json" --upload-file site-remote.json "http://localhost:8080/api/admin/storageSites" diff --git a/conf/docker-dcm/create.bash b/conf/docker-dcm/create.bash deleted file mode 100755 index 58ae6e61dc7..00000000000 --- a/conf/docker-dcm/create.bash +++ /dev/null @@ -1,22 +0,0 @@ -#!/usr/bin/env bash - - -# user creates dataset -k_d=burrito -dv_d=root -h=http://dvsrv - -fn=dataset.json -#dset_id=`curl -s -H "X-Dataverse-key: $k_d" -X POST --upload-file $fn $h/api/dataverses/$dv_d/datasets | jq .data.id` -r=`curl -s -H "X-Dataverse-key: $k_d" -X POST --upload-file $fn $h/api/dataverses/$dv_d/datasets` -echo $r -dset_id=`echo $r | jq .data.id` -echo "dataset created with id: $dset_id" - -if [ "null" == "${dset_id}" ]; then - echo "error - no dataset id from create command" - exit 1 -fi -echo "dataset created; internal/db id: ${dset_id}" - - diff --git a/conf/docker-dcm/dataset.json b/conf/docker-dcm/dataset.json deleted file mode 100644 index fb1b734ed40..00000000000 --- a/conf/docker-dcm/dataset.json +++ /dev/null @@ -1,126 +0,0 @@ -{ - "datasetVersion": { - "metadataBlocks": { - "citation": { - "displayName": "Citation Metadata", - "fields": [ - { - "typeName": "title", - "multiple": false, - "typeClass": "primitive", - "value": "DCM test dataset" - }, - { - "typeName": "productionDate", - "multiple": false, - "typeClass": "primitive", - "value": "2017-04-01" - }, - { - "typeName": "dsDescription", - "multiple": true, - "typeClass": "compound", - "value": [ - { - "dsDescriptionValue": { - "typeName": "dsDescriptionValue", - "multiple": false, - "typeClass": "primitive", - "value": "this would normally be a dataset large enough to require a DCM" - } - } - ] - }, - { - "typeName": "depositor", - "multiple": false, - "typeClass": "primitive", - "value": "Doc, Bob" - }, - { - "typeName": "producer", - "multiple": true, - "typeClass": "compound", - "value": [ - { - "producerName": { - "typeName": "producerName", - "multiple": false, - "typeClass": "primitive", - "value": "Prof, Arthor" - }, - "producerAffiliation": { - "typeName": "producerAffiliation", - "multiple": false, - "typeClass": "primitive", - "value": "LibraScholar" - } - } - ] - }, - { - "typeName": "author", - "multiple": true, - "typeClass": "compound", - "value": [ - { - "authorName": { - "typeName": "authorName", - "multiple": false, - "typeClass": "primitive", - "value": "Student, Carol" - } - , - "authorAffiliation": { - "typeName": "authorAffiliation", - "multiple": false, - "typeClass": "primitive", - "value": "LibraScholar" - } - }, - { - "authorName": { - "typeName": "authorName", - "multiple": false, - "typeClass": "primitive", - "value": "Doc, Bob" - } - , - "authorAffiliation": { - "typeName": "authorAffiliation", - "multiple": false, - "typeClass": "primitive", - "value": "LibraScholar" - } - } - - ] - }, - { - "typeName": "datasetContact", - "multiple": true, - "typeClass": "compound", - "value": [ - { - "datasetContactEmail": { - "typeName": "datasetContactEmail", - "multiple": false, - "typeClass": "primitive", - "value": "dsContact@mailinator.com" - } - } - ] - }, - { - "typeName": "subject", - "multiple": true, - "typeClass": "controlledVocabulary", - "value": [ - "Medicine, Health and Life Sciences" - ] - } - ] - } - } - } -} diff --git a/conf/docker-dcm/dcmsrv.dockerfile b/conf/docker-dcm/dcmsrv.dockerfile deleted file mode 100644 index 9989fa3a89d..00000000000 --- a/conf/docker-dcm/dcmsrv.dockerfile +++ /dev/null @@ -1,21 +0,0 @@ -# build from repo root -FROM centos:6 -RUN yum install -y epel-release -ARG RPMFILE=dcm-0.5-0.noarch.rpm -COPY ${RPMFILE} /tmp/ -COPY cfg/dcm/bashrc /root/.bashrc -COPY cfg/dcm/test_install.sh /root/ -RUN yum localinstall -y /tmp/${RPMFILE} -RUN pip install -r /opt/dcm/requirements.txt -RUN pip install awscli==1.15.75 -run export PATH=~/.local/bin:$PATH -RUN /root/test_install.sh -COPY cfg/dcm/rq-init-d /etc/init.d/rq -RUN useradd glassfish -COPY cfg/dcm/entrypoint-dcm.sh / -COPY cfg/dcm/healthcheck-dcm.sh / -EXPOSE 80 -EXPOSE 22 -VOLUME /hold -HEALTHCHECK CMD /healthcheck-dcm.sh -CMD ["/entrypoint-dcm.sh"] diff --git a/conf/docker-dcm/docker-compose.yml b/conf/docker-dcm/docker-compose.yml deleted file mode 100644 index 49d4467d349..00000000000 --- a/conf/docker-dcm/docker-compose.yml +++ /dev/null @@ -1,50 +0,0 @@ -# initial docker-compose file for combined Dataverse and DCM with shared filesystem - -version: '3' - -services: - dcmsrv: - build: - context: . - dockerfile: dcmsrv.dockerfile - container_name: dcmsrv - volumes: - - hold:/hold - rsalsrv: - build: - context: . - dockerfile: rsalsrv.dockerfile - container_name: rsalsrv -# image: rsalrepo_rsal - volumes: - - hold:/hold - - ./:/mnt - environment: - DV_HOST: http://dvsrv:8080 - DV_APIKEY: burrito - ports: - - "8889:80" - - "873:873" - dvsrv: - build: - context: . - dockerfile: dv0dcm.dockerfile - container_name: dvsrv - volumes: - - hold:/hold - - ./:/mnt - ports: - - "8083:8080" - - "8084:80" - client: - build: - context: . - dockerfile: c6client.dockerfile - command: sleep infinity - container_name: dcm_client - volumes: - - ./:/mnt - -volumes: - hold: - diff --git a/conf/docker-dcm/dv0dcm.dockerfile b/conf/docker-dcm/dv0dcm.dockerfile deleted file mode 100644 index 021534c8978..00000000000 --- a/conf/docker-dcm/dv0dcm.dockerfile +++ /dev/null @@ -1,7 +0,0 @@ -# dv0 assumed to be image name for docker-aio -FROM dv0 -RUN yum install -y bind-utils -COPY configure_dcm.sh /opt/dv/ -COPY configure_rsal.sh /opt/dv/ -COPY rsal-workflow2.json site-primary.json site-remote.json /opt/dv/ -VOLUME /hold diff --git a/conf/docker-dcm/get_transfer.bash b/conf/docker-dcm/get_transfer.bash deleted file mode 100755 index 42080f536e1..00000000000 --- a/conf/docker-dcm/get_transfer.bash +++ /dev/null @@ -1,19 +0,0 @@ -#!/usr/bin/env bash - -# user gets transfer script - -dset_id=$1 -if [ -z "$dset_id" ]; then - echo "no dataset id specified, bailing out" - exit 1 -fi - -k_d=burrito -dv_d=root - -h=http://dvsrv - -#get upload script from DCM -wget --header "X-Dataverse-key: ${k_d}" ${h}/api/datasets/${dset_id}/dataCaptureModule/rsync -O upload-${dset_id}.bash - - diff --git a/conf/docker-dcm/publish_major.bash b/conf/docker-dcm/publish_major.bash deleted file mode 100755 index 6a3fd1288ca..00000000000 --- a/conf/docker-dcm/publish_major.bash +++ /dev/null @@ -1,17 +0,0 @@ -#!/usr/bin/env bash - -# publish dataset based on database id - -dset_id=$1 -if [ -z "$dset_id" ]; then - echo "no dataset id specified, bailing out" - exit 1 -fi - -k_d=burrito - -h=http://dvsrv - -curl -X POST -H "X-Dataverse-key: ${k_d}" "${h}/api/datasets/${dset_id}/actions/:publish?type=major" - - diff --git a/conf/docker-dcm/readme.md b/conf/docker-dcm/readme.md deleted file mode 100644 index 3e6a15e61d6..00000000000 --- a/conf/docker-dcm/readme.md +++ /dev/null @@ -1,26 +0,0 @@ -This docker-compose setup is intended for use in development, small scale evaluation, and potentially serve as an example of a working (although not production security level) configuration. - -Setup: - -- build docker-aio image with name dv0 as described in `../docker-aio` (don't start up the docker image or run setupIT.bash) -- work in the `conf/docker-dcm` directory for below commands -- download/prepare dependencies: `./0prep.sh` -- build dcm/dv0dcm images with docker-compose: `docker-compose -f docker-compose.yml build` -- start containers: `docker-compose -f docker-compose.yml up -d` -- wait for container to show "healthy" (aka - `docker ps`), then run dataverse app installation: `docker exec dvsrv /opt/dv/install.bash` -- for development, you probably want to use the `FAKE` DOI provider: `docker exec -it dvsrv /opt/dv/configure_doi.bash` -- configure dataverse application to use DCM: `docker exec -it dvsrv /opt/dv/configure_dcm.sh` -- configure dataverse application to use RSAL (if desired): `docker exec -it dvsrv /opt/dv/configure_rsal.sh` - -Operation: -The dataverse installation is accessible at `http://localhost:8084`. -The `dcm_client` container is intended to be used for executing transfer scripts, and `conf/docker-dcm` is available at `/mnt` inside the container; this container can be accessed with `docker exec -it dcm_client bash`. -The DCM cron job is NOT configured here; for development purposes the DCM checks can be run manually with `docker exec -it dcmsrv /opt/dcm/scn/post_upload.bash`. -The RSAL cron job is similarly NOT configured; for development purposes `docker exec -it rsalsrv /opt/rsal/scn/pub.py` can be run manually. - - -Cleanup: -- shutdown/cleanup `docker-compose -f docker-compose.yml down -v` - -For reference, this configuration was working with docker 17.09 / docker-compose 1.16. - diff --git a/conf/docker-dcm/rsal-workflow2.json b/conf/docker-dcm/rsal-workflow2.json deleted file mode 100644 index 322d3ecbcf7..00000000000 --- a/conf/docker-dcm/rsal-workflow2.json +++ /dev/null @@ -1,31 +0,0 @@ -{ - "name": "RSAL file move for publication", - "steps": [ - { - "provider":":internal", - "stepType":"log", - "parameters": { - "message": "Pre-http request" - } - }, - { - "provider":":internal", - "stepType":"http/sr", - "parameters": { - "url":"http://rsalsrv/rr.py", - "method":"POST", - "contentType":"text/plain", - "body":"${invocationId}\ndataset.id=${dataset.id}\ndataset.identifier=${dataset.identifier}\ndataset.globalId=${dataset.globalId}", - "expectedResponse":"OK.*", - "rollbackMethod":"DELETE" - } - }, - { - "provider":":internal", - "stepType":"log", - "parameters": { - "message": "Post-http request" - } - } - ] -} diff --git a/conf/docker-dcm/rsalsrv.dockerfile b/conf/docker-dcm/rsalsrv.dockerfile deleted file mode 100644 index 844432afe6b..00000000000 --- a/conf/docker-dcm/rsalsrv.dockerfile +++ /dev/null @@ -1,20 +0,0 @@ -FROM centos:7 -ARG RPMFILE=rsal-0.1-0.noarch.rpm -RUN yum update; yum install -y epel-release -COPY ${RPMFILE} /tmp/ -RUN yum localinstall -y /tmp/${RPMFILE} -COPY cfg/rsal/rsyncd.conf /etc/rsyncd.conf -COPY cfg/rsal/entrypoint-rsal.sh /entrypoint.sh -COPY cfg/rsal/lighttpd-modules.conf /etc/lighttpd/modules.conf -COPY cfg/rsal/lighttpd.conf /etc/lighttpd/lighttpd.conf -RUN mkdir -p /public/FK2 -RUN pip2 install -r /opt/rsal/scn/requirements.txt -#COPY doc/testdata/ /hold/ -ARG DV_HOST=http://dv_srv:8080 -ARG DV_API_KEY=burrito -ENV DV_HOST ${DV_HOST} -ENV DV_API_KEY ${DV_API_KEY} -EXPOSE 873 -EXPOSE 80 -HEALTHCHECK CMD curl --fail http://localhost/hw.py || exit 1 -CMD ["/entrypoint.sh"] diff --git a/conf/docker-dcm/site-primary.json b/conf/docker-dcm/site-primary.json deleted file mode 100644 index 35b217edffd..00000000000 --- a/conf/docker-dcm/site-primary.json +++ /dev/null @@ -1,6 +0,0 @@ -{ - "hostname": "rsalsrv", - "name": "LibraScholar University", - "primaryStorage": true, - "transferProtocols": "rsync,posix" -} diff --git a/conf/docker-dcm/site-remote.json b/conf/docker-dcm/site-remote.json deleted file mode 100644 index d47c3ef4dda..00000000000 --- a/conf/docker-dcm/site-remote.json +++ /dev/null @@ -1,6 +0,0 @@ -{ - "hostname": "remote.libra.research", - "name": "LibraResearch Institute", - "primaryStorage": false, - "transferProtocols": "rsync" -} diff --git a/conf/docker/build.sh b/conf/docker/build.sh deleted file mode 100755 index 27145d199e1..00000000000 --- a/conf/docker/build.sh +++ /dev/null @@ -1,110 +0,0 @@ -#!/bin/bash -# Creates images and pushes them to Docker Hub. -# The "latest" tag under "iqss" should be relatively stable. Don't push breaking changes there. -# None of the tags are suitable for production use. See https://github.com/IQSS/dataverse/issues/4040 -# To iterate on images, push to custom tags or tags based on branch names or a non-iqss Docker Hub org/username. Consider trying "internal" to push to the internal Minishift registry. - -# Docker Hub organization or username -HUBORG=iqss -# The most stable tag we have. -STABLE=latest -#FIXME: Use a real flag/argument parser. download-files.sh uses "getopts" for example. -if [ -z "$1" ]; then - echo "No argument supplied. For experiments, specify \"branch\" or \"custom my-custom-tag\" or \"huborg \" or \"internal\". Specify \"stable\" to push to the \"$STABLE\" tag under \"$HUBORG\" if your change won't break anything." - exit 1 -fi - -if [ "$1" == 'branch' ]; then - echo "We'll push a tag to the branch you're on." - GIT_BRANCH=$(git rev-parse --abbrev-ref HEAD) - TAG=$GIT_BRANCH -elif [ "$1" == 'stable' ]; then - echo "We'll push a tag to the most stable tag (which isn't saying much!)." - TAG=$STABLE -elif [ "$1" == 'custom' ]; then - if [ -z "$2" ]; then - echo "You must provide a custom tag as the second argument. Something other than \"$STABLE\"." - exit 1 - else - echo "We'll push a custom tag." - TAG=$2 - fi -elif [ "$1" == 'huborg' ]; then - if [ -z "$2" ]; then - echo "You must provide your Docker Hub organization or username as the second argument. \"$USER\" or whatever." - exit 1 - else - HUBORG=$2 - TAG=$STABLE - echo "We'll push to the Docker Hub organization or username you specified: $HUBORG." - fi -elif [ "$1" == 'internal' ]; then - echo "Building for internal Minishift registry." - TAG=$STABLE -else - echo "Unexpected argument: $1. Exiting. Run with no arguments for help." - TAG=$STABLE - exit 1 -fi -echo Images are being built for registry org/username \"$HUBORG\" with the tag \"$TAG\". -# -# Build dataverse-solr -# -cp ../solr/7.3.0/schema.xml solr/ -# move solr*.tgz to the solr image -cp ../../downloads/solr-7.3.0.tgz solr/ -docker build -t $HUBORG/dataverse-solr:$TAG solr/ -if [ "$1" == 'internal' ]; then - echo "Skipping docker push because we're using the internal Minishift registry." -else - # FIXME: Check the output of "docker build" and only push on success. - docker push $HUBORG/dataverse-solr:$TAG -fi -# -# Build dataverse-glassfish -# -# TODO: Think about if we really need dataverse.war because it's in dvinstall.zip. -cd ../.. -mvn clean -scripts/installer/custom-build-number -mvn package -cd conf/docker -cp ../../target/dataverse*.war dataverse-glassfish/dataverse.war -if [[ "$?" -ne 0 ]]; then - echo "Unable to copy war file into place. Did 'mvn package' work?" - exit 1 -fi -cd ../../scripts/installer -make clean -make -cd ../../conf/docker -cp ../../scripts/installer/dvinstall.zip dataverse-glassfish -if [[ "$?" -ne 0 ]]; then - echo "Unable to copy dvinstall.zip file into place. Did 'make' work?" - exit 1 -fi -cp ../../downloads/glassfish-4.1.zip dataverse-glassfish -if [[ "$?" -ne 0 ]]; then - echo "Unable to copy Glassfish zip file into place. You must run the download script in that directory once. " - exit 1 -fi -# We'll assume at this point that the download script has been run. -cp ../../downloads/weld-osgi-bundle-2.2.10.Final-glassfish4.jar dataverse-glassfish -docker build -t $HUBORG/dataverse-glassfish:$TAG dataverse-glassfish -if [ "$1" == 'internal' ]; then - echo "Skipping docker push because we're using the internal Minishift registry." -else - # FIXME: Check the output of "docker build" and only push on success. - docker push $HUBORG/dataverse-glassfish:$TAG -fi -# -# Build init-container -# -cp ../../scripts/installer/install dataverse-glassfish/init-container -docker build -t $HUBORG/init-container:$TAG dataverse-glassfish/init-container -if [ "$1" == 'internal' ]; then - echo "Skipping docker push because we're using the internal Minishift registry." -else - # FIXME: Check the output of "docker build" and only push on success. - docker push $HUBORG/init-container:$TAG -fi diff --git a/conf/docker/dataverse-glassfish/.gitignore b/conf/docker/dataverse-glassfish/.gitignore deleted file mode 100644 index 2084aa8849e..00000000000 --- a/conf/docker/dataverse-glassfish/.gitignore +++ /dev/null @@ -1,5 +0,0 @@ -glassfish-4.1.zip -weld-osgi-bundle-2.2.10.Final-glassfish4.jar -dvinstall.zip -dataverse.war -init-container/postgres-setup diff --git a/conf/docker/dataverse-glassfish/Dockerfile b/conf/docker/dataverse-glassfish/Dockerfile deleted file mode 100644 index 367a9ca127c..00000000000 --- a/conf/docker/dataverse-glassfish/Dockerfile +++ /dev/null @@ -1,101 +0,0 @@ -FROM centos:7 -MAINTAINER Dataverse (support@dataverse.org) - -COPY glassfish-4.1.zip /tmp -COPY weld-osgi-bundle-2.2.10.Final-glassfish4.jar /tmp -COPY default.config /tmp -# Install dependencies -#RUN yum install -y unzip -RUN yum install -y \ - cronie \ - git \ - java-1.8.0-openjdk-devel \ - nc \ - perl \ - postgresql \ - sha1sum \ - unzip \ - wget - -ENV GLASSFISH_DOWNLOAD_SHA1 d1a103d06682eb08722fbc9a93089211befaa080 -ENV GLASSFISH_DIRECTORY "/usr/local/glassfish4" -ENV HOST_DNS_ADDRESS "localhost" -ENV POSTGRES_DB "dvndb" -ENV POSTGRES_USER "dvnapp" -ENV RSERVE_USER "rserve" -ENV RSERVE_PASSWORD "rserve" - -#RUN exitEarlyBeforeJq -RUN yum -y install epel-release \ - jq - -COPY dvinstall.zip /tmp - -#RUN ls /tmp -# -RUN find /tmp -# -#RUN exitEarly - -# Install Glassfish 4.1 -RUN cd /tmp \ - && unzip glassfish-4.1.zip \ - && mv glassfish4 /usr/local \ - && cd /usr/local/glassfish4/glassfish/modules \ - && rm weld-osgi-bundle.jar \ - && cp /tmp/weld-osgi-bundle-2.2.10.Final-glassfish4.jar . \ - && cd /tmp && unzip /tmp/dvinstall.zip \ - && chmod 777 -R /tmp/dvinstall/ \ - #FIXME: Patch Grizzly too! - && echo "Done installing and patching Glassfish" - -RUN chmod g=u /etc/passwd - -RUN mkdir -p /home/glassfish -RUN chgrp -R 0 /home/glassfish && \ - chmod -R g=u /home/glassfish - -RUN mkdir -p /usr/local/glassfish4 -RUN chgrp -R 0 /usr/local/glassfish4 && \ - chmod -R g=u /usr/local/glassfish4 - -#JHOVE -RUN cp /tmp/dvinstall/jhove* /usr/local/glassfish4/glassfish/domains/domain1/config - - -#SETUP JVM OPTIONS -ARG DOCKER_BUILD="true" -RUN echo $DOCKER_BUILD -RUN /tmp/dvinstall/glassfish-setup.sh -###glassfish-setup will handle everything in Dockerbuild - -##install jdbc driver -RUN cp /tmp/dvinstall/pgdriver/postgresql-42.2.2.jar /usr/local/glassfish4/glassfish/domains/domain1/lib - -# Customized persistence xml to avoid database recreation -#RUN mkdir -p /tmp/WEB-INF/classes/META-INF/ -#COPY WEB-INF/classes/META-INF/persistence.xml /tmp/WEB-INF/classes/META-INF/ - -# Install iRods iCommands -#RUN cd /tmp \ -# && yum -y install epel-release \ -# && yum -y install ftp://ftp.renci.org/pub/irods/releases/4.1.6/centos7/irods-icommands-4.1.6-centos7-x86_64.rpm - -#COPY config-glassfish /root/dvinstall -#COPY restart-glassfish /root/dvinstall -#COPY config-dataverse /root/dvinstall - -#RUN cd /root/dvinstall && ./config-dataverse -COPY ./entrypoint.sh / -#COPY ./ddl /root/dvinstall -#COPY ./init-postgres /root/dvinstall -#COPY ./init-glassfish /root/dvinstall -#COPY ./init-dataverse /root/dvinstall -#COPY ./setup-all.sh /root/dvinstall -#COPY ./setup-irods.sh /root/dvinstall -COPY ./Dockerfile / - -EXPOSE 8080 - -ENTRYPOINT ["/entrypoint.sh"] -CMD ["dataverse"] diff --git a/conf/docker/dataverse-glassfish/default.config b/conf/docker/dataverse-glassfish/default.config deleted file mode 100644 index 7af10336f30..00000000000 --- a/conf/docker/dataverse-glassfish/default.config +++ /dev/null @@ -1,16 +0,0 @@ -HOST_DNS_ADDRESS localhost -GLASSFISH_DIRECTORY /usr/local/glassfish4 -ADMIN_EMAIL -MAIL_SERVER mail.hmdc.harvard.edu -POSTGRES_ADMIN_PASSWORD secret -POSTGRES_SERVER dataverse-postgresql-0.dataverse-postgresql-service -POSTGRES_PORT 5432 -POSTGRES_DATABASE dvndb -POSTGRES_USER dvnapp -POSTGRES_PASSWORD secret -SOLR_LOCATION dataverse-solr-service:8983 -TWORAVENS_LOCATION NOT INSTALLED -RSERVE_HOST localhost -RSERVE_PORT 6311 -RSERVE_USER rserve -RSERVE_PASSWORD rserve diff --git a/conf/docker/dataverse-glassfish/entrypoint.sh b/conf/docker/dataverse-glassfish/entrypoint.sh deleted file mode 100755 index 55bbbdedbb7..00000000000 --- a/conf/docker/dataverse-glassfish/entrypoint.sh +++ /dev/null @@ -1,139 +0,0 @@ -#!/bin/bash -x - -# Entrypoint script for Dataverse web application. This script waits -# for dependent services (Rserve, Postgres, Solr) to start before -# initializing Glassfish. - -echo "whoami before..." -whoami -if ! whoami &> /dev/null; then - if [ -w /etc/passwd ]; then - # Make `whoami` return the glassfish user. # See https://docs.openshift.org/3.6/creating_images/guidelines.html#openshift-origin-specific-guidelines - # Fancy bash magic from https://github.com/RHsyseng/container-rhel-examples/blob/1208dcd7d4f431fc6598184dba6341b9465f4197/starter-arbitrary-uid/bin/uid_entrypoint#L4 - echo "${USER_NAME:-glassfish}:x:$(id -u):0:${USER_NAME:-glassfish} user:/home/glassfish:/bin/bash" >> /etc/passwd - fi -fi -echo "whoami after" -whoami - -set -e - -if [ "$1" = 'dataverse' ]; then - - export GLASSFISH_DIRECTORY=/usr/local/glassfish4 - export HOST_DNS_ADDRESS=localhost - - TIMEOUT=30 - - if [ -n "$RSERVE_SERVICE_HOST" ]; then - RSERVE_HOST=$RSERVE_SERVICE_HOST - elif [ -n "$RSERVE_PORT_6311_TCP_ADDR" ]; then - RSERVE_HOST=$RSERVE_PORT_6311_TCP_ADDR - elif [ -z "$RSERVE_HOST" ]; then - RSERVE_HOST="localhost" - fi - export RSERVE_HOST - - if [ -n "$RSERVE_SERVICE_PORT" ]; then - RSERVE_PORT=$RSERVE_SERVICE_PORT - elif [ -n "$RSERVE_PORT_6311_TCP_PORT" ]; then - RSERVE_PORT=$RSERVE_PORT_6311_TCP_PORT - elif [ -z "$RSERVE_PORT" ]; then - RSERVE_PORT="6311" - fi - export RSERVE_PORT - - echo "Using Rserve at $RSERVE_HOST:$RSERVE_PORT" - - if ncat $RSERVE_HOST $RSERVE_PORT -w $TIMEOUT --send-only < /dev/null > /dev/null 2>&1 ; then - echo Rserve running; - else - echo Optional service Rserve not running. - fi - - - # postgres - if [ -n "$POSTGRES_SERVICE_HOST" ]; then - POSTGRES_HOST=$POSTGRES_SERVICE_HOST - elif [ -n "$POSTGRES_PORT_5432_TCP_ADDR" ]; then - POSTGRES_HOST=$POSTGRES_PORT_5432_TCP_ADDR - elif [ -z "$POSTGRES_HOST" ]; then - POSTGRES_HOST="localhost" - fi - export POSTGRES_HOST - - if [ -n "$POSTGRES_SERVICE_PORT" ]; then - POSTGRES_PORT=$POSTGRES_SERVICE_PORT - elif [ -n "$POSTGRES_PORT_5432_TCP_PORT" ]; then - POSTGRES_PORT=$POSTGRES_PORT_5432_TCP_PORT - else - POSTGRES_PORT=5432 - fi - export POSTGRES_PORT - - echo "Using Postgres at $POSTGRES_HOST:$POSTGRES_PORT" - - if ncat $POSTGRES_HOST $POSTGRES_PORT -w $TIMEOUT --send-only < /dev/null > /dev/null 2>&1 ; then - echo Postgres running; - else - echo Required service Postgres not running. Have you started the required services? - exit 1 - fi - - # solr - if [ -n "$SOLR_SERVICE_HOST" ]; then - SOLR_HOST=$SOLR_SERVICE_HOST - elif [ -n "$SOLR_PORT_8983_TCP_ADDR" ]; then - SOLR_HOST=$SOLR_PORT_8983_TCP_ADDR - elif [ -z "$SOLR_HOST" ]; then - SOLR_HOST="localhost" - fi - export SOLR_HOST - - if [ -n "$SOLR_SERVICE_PORT" ]; then - SOLR_PORT=$SOLR_SERVICE_PORT - elif [ -n "$SOLR_PORT_8983_TCP_PORT" ]; then - SOLR_PORT=$SOLR_PORT_8983_TCP_PORT - else - SOLR_PORT=8983 - fi - export SOLR_PORT - - echo "Using Solr at $SOLR_HOST:$SOLR_PORT" - - if ncat $SOLR_HOST $SOLR_PORT -w $TIMEOUT --send-only < /dev/null > /dev/null 2>&1 ; then - echo Solr running; - else - echo Required service Solr not running. Have you started the required services? - exit 1 - fi - - GLASSFISH_INSTALL_DIR="/usr/local/glassfish4" - cd /tmp/dvinstall - echo Copying the non-interactive file into place - cp /tmp/default.config . - echo Looking at first few lines of default.config - head default.config - # non-interactive install - echo Running non-interactive install - #./install -y -f > install.out 2> install.err - ./install -y -f - -# if [ -n "$DVICAT_PORT_1247_TCP_PORT" ]; then -# ./setup-irods.sh -# fi - - # We do change the Solr server in Minishift/OpenShift, which is - # the primary target for all of the work under conf/docker. - # echo -e "\n\nRestarting Dataverse in case Solr host was changed..." - # /usr/local/glassfish4/glassfish/bin/asadmin stop-domain - # sleep 3 - # /usr/local/glassfish4/glassfish/bin/asadmin start-domain - - echo -e "\n\nDataverse started" - - sleep infinity -else - exec "$@" -fi - diff --git a/conf/docker/dataverse-glassfish/init-container/Dockerfile b/conf/docker/dataverse-glassfish/init-container/Dockerfile deleted file mode 100644 index 829ab03bcb2..00000000000 --- a/conf/docker/dataverse-glassfish/init-container/Dockerfile +++ /dev/null @@ -1,16 +0,0 @@ -FROM centos:7 -MAINTAINER Dataverse (support@dataverse.org) - -### init-container is an Init Container for glassfish service in OpenShift or other Kubernetes environment -# This initContainer will take care of setting up glassfish - -# Install dependencies -RUN yum install -y \ - nc \ - perl \ - postgresql \ - sha1sum - -COPY install / - -ENTRYPOINT ["/install", "--pg_only", "--yes"] diff --git a/conf/docker/dataverse-glassfish/init-container/default.config b/conf/docker/dataverse-glassfish/init-container/default.config deleted file mode 100644 index 7af10336f30..00000000000 --- a/conf/docker/dataverse-glassfish/init-container/default.config +++ /dev/null @@ -1,16 +0,0 @@ -HOST_DNS_ADDRESS localhost -GLASSFISH_DIRECTORY /usr/local/glassfish4 -ADMIN_EMAIL -MAIL_SERVER mail.hmdc.harvard.edu -POSTGRES_ADMIN_PASSWORD secret -POSTGRES_SERVER dataverse-postgresql-0.dataverse-postgresql-service -POSTGRES_PORT 5432 -POSTGRES_DATABASE dvndb -POSTGRES_USER dvnapp -POSTGRES_PASSWORD secret -SOLR_LOCATION dataverse-solr-service:8983 -TWORAVENS_LOCATION NOT INSTALLED -RSERVE_HOST localhost -RSERVE_PORT 6311 -RSERVE_USER rserve -RSERVE_PASSWORD rserve diff --git a/conf/docker/postgresql/Dockerfile b/conf/docker/postgresql/Dockerfile deleted file mode 100644 index 81ecf0fdeb8..00000000000 --- a/conf/docker/postgresql/Dockerfile +++ /dev/null @@ -1,3 +0,0 @@ -# PostgreSQL for Dataverse (but consider switching to the image from CentOS) -# -# See also conf/docker/dataverse-glassfish/Dockerfile diff --git a/conf/docker/solr/.gitignore b/conf/docker/solr/.gitignore deleted file mode 100644 index a6237a89914..00000000000 --- a/conf/docker/solr/.gitignore +++ /dev/null @@ -1,2 +0,0 @@ -solr-7.3.0.tgz -schema.xml diff --git a/conf/docker/solr/Dockerfile b/conf/docker/solr/Dockerfile deleted file mode 100644 index 993c2909f4d..00000000000 --- a/conf/docker/solr/Dockerfile +++ /dev/null @@ -1,31 +0,0 @@ -FROM centos:7 -MAINTAINER Dataverse (support@dataverse.org) - -RUN yum install -y unzip java-1.8.0-openjdk-devel lsof - -# Install Solr 7.3.0 -# The context of the build is the "conf" directory. -COPY solr-7.3.0.tgz /tmp -RUN cd /tmp \ - && tar xvfz solr-7.3.0.tgz \ - && rm solr-7.3.0.tgz \ - && mkdir /usr/local/solr \ - && mv solr-7.3.0 /usr/local/solr/ - -COPY schema.xml /tmp -COPY solrconfig_master.xml /tmp -COPY solrconfig_slave.xml /tmp - -RUN chmod g=u /etc/passwd - -RUN chgrp -R 0 /usr/local/solr && \ - chmod -R g=u /usr/local/solr - -EXPOSE 8983 - -COPY Dockerfile / -COPY entrypoint.sh / - -ENTRYPOINT ["/entrypoint.sh"] -USER 1001 -CMD ["solr"] diff --git a/conf/docker/solr/backup_cron.sh b/conf/docker/solr/backup_cron.sh deleted file mode 100644 index 95f31b4b53a..00000000000 --- a/conf/docker/solr/backup_cron.sh +++ /dev/null @@ -1 +0,0 @@ -0 */6 * * * curl 'http://localhost:8983/solr/collection1/replication?command=backup&location=/home/share' diff --git a/conf/docker/solr/entrypoint.sh b/conf/docker/solr/entrypoint.sh deleted file mode 100755 index 5d003f9c56b..00000000000 --- a/conf/docker/solr/entrypoint.sh +++ /dev/null @@ -1,38 +0,0 @@ -#!/bin/bash - -if ! whoami &> /dev/null; then - if [ -w /etc/passwd ]; then - echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd - fi -fi - -SOLR_DIR=/usr/local/solr/solr-7.3.0 - -if [ "$1" = 'solr' ]; then - - cp -r $SOLR_DIR/server/solr/configsets/_default $SOLR_DIR/server/solr/collection1 - cp /tmp/schema.xml $SOLR_DIR/server/solr/collection1/conf - - if [ $HOSTNAME = "dataverse-solr-0" ]; then - echo "I am the master" - mv /tmp/solrconfig_master.xml $SOLR_DIR/server/solr/collection1/conf/solrconfig.xml - cp /tmp/solrconfig_slave.xml $SOLR_DIR/server/solr/collection1/conf - - else - echo "I am the slave" - cp /tmp/solrconfig_slave.xml $SOLR_DIR/server/solr/collection1/conf - mv $SOLR_DIR/server/solr/collection1/conf/solrconfig_slave.xml $SOLR_DIR/server/solr/collection1/conf/solrconfig.xml - fi - cd $SOLR_DIR - bin/solr start - bin/solr create_core -c collection1 -d server/solr/collection1/conf - if [ $HOSTNAME = "dataverse-solr-0" ]; then - curl 'http://localhost:8983/solr/collection1/replication?command=restore&location=/home/share' - fi - - sleep infinity -elif [ "$1" = 'usage' ]; then - echo 'docker run -d iqss/dataverse-solr solr' -else - exec "$@" -fi diff --git a/conf/docker/solr/solrconfig_master.xml b/conf/docker/solr/solrconfig_master.xml deleted file mode 100644 index d409f70b5d5..00000000000 --- a/conf/docker/solr/solrconfig_master.xml +++ /dev/null @@ -1,1431 +0,0 @@ - - - - - - - - - 7.3.0 - - - - - - - - - - - - - - - - - - - - ${solr.data.dir:} - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ${solr.lock.type:native} - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ${solr.ulog.dir:} - ${solr.ulog.numVersionBuckets:65536} - - - - - ${solr.autoCommit.maxTime:15000} - false - - - - - - ${solr.autoSoftCommit.maxTime:-1} - - - - - - - - - - - - - - 1024 - - - - - - - - - - - - - - - - - - - - - - - - true - - - - - - 20 - - - 200 - - - - - - - - - - - - - - - - false - - - - - - - - - - - - - - - - - - - - - - explicit - 10 - edismax - 0.075 - - dvName^400 - authorName^180 - dvSubject^190 - dvDescription^180 - dvAffiliation^170 - title^130 - subject^120 - keyword^110 - topicClassValue^100 - dsDescriptionValue^90 - authorAffiliation^80 - publicationCitation^60 - producerName^50 - fileName^30 - fileDescription^30 - variableLabel^20 - variableName^10 - _text_^1.0 - - - dvName^200 - authorName^100 - dvSubject^100 - dvDescription^100 - dvAffiliation^100 - title^75 - subject^75 - keyword^75 - topicClassValue^75 - dsDescriptionValue^75 - authorAffiliation^75 - publicationCitation^75 - producerName^75 - - - - isHarvested:false^25000 - - - - - - - - - - - - - - - - - - explicit - json - true - - - - - - - - explicit - - - - - - _text_ - - - - - - - true - ignored_ - _text_ - - - - - - - - - text_general - - - - - - default - _text_ - solr.DirectSolrSpellChecker - - internal - - 0.5 - - 2 - - 1 - - 5 - - 4 - - 0.01 - - - - - - - - - - - - default - on - true - 10 - 5 - 5 - true - true - 10 - 5 - - - spellcheck - - - - - - - - - - true - - - tvComponent - - - - - - - - - - - - true - false - - - terms - - - - - - - - - string - - - - - - explicit - - - elevator - - - - - - - - startup - commit - - - - - schema.xml,stopwords.txt,elevate.xml - solrconfig_slave.xml:solrconfig.xml,x.xml,y.xml - - - - - 1 - - - - - - - - - - - 100 - - - - - - - - 70 - - 0.5 - - [-\w ,/\n\"']{20,200} - - - - - - - ]]> - ]]> - - - - - - - - - - - - - - - - - - - - - - - - ,, - ,, - ,, - ,, - ,]]> - ]]> - - - - - - 10 - .,!? - - - - - - - WORD - - - en - US - - - - - - - - - - - - - - [^\w-\.] - _ - - - - - - - yyyy-MM-dd'T'HH:mm:ss.SSSZ - yyyy-MM-dd'T'HH:mm:ss,SSSZ - yyyy-MM-dd'T'HH:mm:ss.SSS - yyyy-MM-dd'T'HH:mm:ss,SSS - yyyy-MM-dd'T'HH:mm:ssZ - yyyy-MM-dd'T'HH:mm:ss - yyyy-MM-dd'T'HH:mmZ - yyyy-MM-dd'T'HH:mm - yyyy-MM-dd HH:mm:ss.SSSZ - yyyy-MM-dd HH:mm:ss,SSSZ - yyyy-MM-dd HH:mm:ss.SSS - yyyy-MM-dd HH:mm:ss,SSS - yyyy-MM-dd HH:mm:ssZ - yyyy-MM-dd HH:mm:ss - yyyy-MM-dd HH:mmZ - yyyy-MM-dd HH:mm - yyyy-MM-dd - - - - - - - - - - - - - - - - - - - - - - - - - - - - - text/plain; charset=UTF-8 - - - - - ${velocity.template.base.dir:} - ${velocity.solr.resource.loader.enabled:true} - ${velocity.params.resource.loader.enabled:false} - - - - - 5 - - - - - - - - - - - - - - diff --git a/conf/docker/solr/solrconfig_slave.xml b/conf/docker/solr/solrconfig_slave.xml deleted file mode 100644 index c31710ebace..00000000000 --- a/conf/docker/solr/solrconfig_slave.xml +++ /dev/null @@ -1,1442 +0,0 @@ - - - - - - - - - 7.3.0 - - - - - - - - - - - - - - - - - - - - ${solr.data.dir:} - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ${solr.lock.type:native} - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ${solr.ulog.dir:} - ${solr.ulog.numVersionBuckets:65536} - - - - - ${solr.autoCommit.maxTime:15000} - false - - - - - - ${solr.autoSoftCommit.maxTime:-1} - - - - - - - - - - - - - - 1024 - - - - - - - - - - - - - - - - - - - - - - - - true - - - - - - 20 - - - 200 - - - - - - - - - - - - - - - - false - - - - - - - - - - - - - - - - - - - - - - explicit - 10 - edismax - 0.075 - - dvName^400 - authorName^180 - dvSubject^190 - dvDescription^180 - dvAffiliation^170 - title^130 - subject^120 - keyword^110 - topicClassValue^100 - dsDescriptionValue^90 - authorAffiliation^80 - publicationCitation^60 - producerName^50 - fileName^30 - fileDescription^30 - variableLabel^20 - variableName^10 - _text_^1.0 - - - dvName^200 - authorName^100 - dvSubject^100 - dvDescription^100 - dvAffiliation^100 - title^75 - subject^75 - keyword^75 - topicClassValue^75 - dsDescriptionValue^75 - authorAffiliation^75 - publicationCitation^75 - producerName^75 - - - - isHarvested:false^25000 - - - - - - - - - - - - - - - - - - explicit - json - true - - - - - - - - explicit - - - - - - _text_ - - - - - - - true - ignored_ - _text_ - - - - - - - - - text_general - - - - - - default - _text_ - solr.DirectSolrSpellChecker - - internal - - 0.5 - - 2 - - 1 - - 5 - - 4 - - 0.01 - - - - - - - - - - - - default - on - true - 10 - 5 - 5 - true - true - 10 - 5 - - - spellcheck - - - - - - - - - - true - - - tvComponent - - - - - - - - - - - - true - false - - - terms - - - - - - - - - string - - - - - - explicit - - - elevator - - - - - - - - - http://dataverse-solr-0.dataverse-solr-service:8983/solr/collection1 - - - 00:00:20 - - - internal - - 5000 - 10000 - - - username - password - - - - - - - - - - - - - 100 - - - - - - - - 70 - - 0.5 - - [-\w ,/\n\"']{20,200} - - - - - - - ]]> - ]]> - - - - - - - - - - - - - - - - - - - - - - - - ,, - ,, - ,, - ,, - ,]]> - ]]> - - - - - - 10 - .,!? - - - - - - - WORD - - - en - US - - - - - - - - - - - - - - [^\w-\.] - _ - - - - - - - yyyy-MM-dd'T'HH:mm:ss.SSSZ - yyyy-MM-dd'T'HH:mm:ss,SSSZ - yyyy-MM-dd'T'HH:mm:ss.SSS - yyyy-MM-dd'T'HH:mm:ss,SSS - yyyy-MM-dd'T'HH:mm:ssZ - yyyy-MM-dd'T'HH:mm:ss - yyyy-MM-dd'T'HH:mmZ - yyyy-MM-dd'T'HH:mm - yyyy-MM-dd HH:mm:ss.SSSZ - yyyy-MM-dd HH:mm:ss,SSSZ - yyyy-MM-dd HH:mm:ss.SSS - yyyy-MM-dd HH:mm:ss,SSS - yyyy-MM-dd HH:mm:ssZ - yyyy-MM-dd HH:mm:ss - yyyy-MM-dd HH:mmZ - yyyy-MM-dd HH:mm - yyyy-MM-dd - - - - - - - - - - - - - - - - - - - - - - - - - - - - - text/plain; charset=UTF-8 - - - - - ${velocity.template.base.dir:} - ${velocity.solr.resource.loader.enabled:true} - ${velocity.params.resource.loader.enabled:false} - - - - - 5 - - - - - - - - - - - - - - diff --git a/conf/httpd/conf.d/dataverse.conf b/conf/httpd/conf.d/dataverse.conf index d24d7c7893f..fe843ec60e4 100644 --- a/conf/httpd/conf.d/dataverse.conf +++ b/conf/httpd/conf.d/dataverse.conf @@ -1,8 +1,3 @@ -# don't pass paths used by rApache and TwoRavens to Glassfish -ProxyPassMatch ^/RApacheInfo$ ! -ProxyPassMatch ^/dataexplore ! -ProxyPassMatch ^/custom ! -ProxyPassMatch ^/rookzelig ! # don't pass paths used by Shibboleth to Glassfish ProxyPassMatch ^/Shibboleth.sso ! ProxyPassMatch ^/shibboleth-ds ! diff --git a/conf/jhove/jhove.conf b/conf/jhove/jhove.conf index 17a7c5e0530..971c60acfaa 100644 --- a/conf/jhove/jhove.conf +++ b/conf/jhove/jhove.conf @@ -3,7 +3,7 @@ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig - file:///usr/local/glassfish4/glassfish/domains/domain1/config/jhoveConfig.xsd"> + file:///usr/local/payara6/glassfish/domains/domain1/config/jhoveConfig.xsd"> /usr/local/src/jhove utf-8 /tmp diff --git a/conf/keycloak/.env b/conf/keycloak/.env new file mode 100644 index 00000000000..6d99d85b3a7 --- /dev/null +++ b/conf/keycloak/.env @@ -0,0 +1,5 @@ +APP_IMAGE=gdcc/dataverse:unstable +POSTGRES_VERSION=17 +DATAVERSE_DB_USER=dataverse +SOLR_VERSION=9.8.0 +SKIP_DEPLOY=0 \ No newline at end of file diff --git a/conf/keycloak/Dockerfile b/conf/keycloak/Dockerfile new file mode 100644 index 00000000000..d2c85ee3335 --- /dev/null +++ b/conf/keycloak/Dockerfile @@ -0,0 +1,39 @@ +# ------------------------------------------ +# Stage 1: Build SPI with Maven +# ------------------------------------------ +FROM maven:3.9.5-eclipse-temurin-17 AS builder + +WORKDIR /app + +# Copy SPI source code +COPY ./builtin-users-spi /app + +# Build the SPI JAR +RUN mvn clean package + +# ------------------------------------------ +# Stage 2: Build Keycloak Image +# ------------------------------------------ +FROM quay.io/keycloak/keycloak:26.3.2 + +# Add the Oracle JDBC jars +ARG ORACLE_JDBC_VERSION=23.8.0.25.04 +ADD --chown=keycloak:keycloak https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc11/${ORACLE_JDBC_VERSION}/ojdbc11-${ORACLE_JDBC_VERSION}.jar /opt/keycloak/providers/ojdbc11.jar +ADD --chown=keycloak:keycloak https://repo1.maven.org/maven2/com/oracle/database/nls/orai18n/${ORACLE_JDBC_VERSION}/orai18n-${ORACLE_JDBC_VERSION}.jar /opt/keycloak/providers/orai18n.jar + +# Health build parameter +ENV KC_HEALTH_ENABLED=true + +# Copy SPI JAR from builder stage +COPY --from=builder /app/target/keycloak-dv-builtin-users-authenticator-1.0-SNAPSHOT.jar /opt/keycloak/providers/ + +# Copy additional configurations +COPY ./builtin-users-spi/conf/quarkus.properties /opt/keycloak/conf/ +COPY ./test-realm-include-spi.json /opt/keycloak/data/import/ + +# Set the Keycloak command +ENTRYPOINT ["/opt/keycloak/bin/kc.sh"] +CMD ["start-dev", "--import-realm", "--http-port=8090"] + +# Expose port 8090 +EXPOSE 8090 diff --git a/conf/keycloak/builtin-users-spi/conf/quarkus.properties b/conf/keycloak/builtin-users-spi/conf/quarkus.properties new file mode 100644 index 00000000000..64ce6d898c5 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/conf/quarkus.properties @@ -0,0 +1,15 @@ +quarkus.datasource.user-store.db-kind=postgresql +quarkus.datasource.user-store.jdbc.url=jdbc:postgresql://${DATAVERSE_DB_HOST}:${DATAVERSE_DB_PORT}/dataverse +quarkus.datasource.user-store.username=${DATAVERSE_DB_USER} +quarkus.datasource.user-store.password=${DATAVERSE_DB_PASSWORD} + +quarkus.datasource.user-store.jdbc.driver=org.postgresql.Driver +quarkus.datasource.user-store.jdbc.transactions=disabled +quarkus.transaction-manager.unsafe-multiple-last-resources=allow + +quarkus.datasource.user-store.jdbc.recovery.username=${DATAVERSE_DB_USER} +quarkus.datasource.user-store.jdbc.recovery.password=${DATAVERSE_DB_PASSWORD} + +quarkus.datasource.user-store.jdbc.xa-properties.serverName=${DATAVERSE_DB_HOST} +quarkus.datasource.user-store.jdbc.xa-properties.portNumber=${DATAVERSE_DB_PORT} +quarkus.datasource.user-store.jdbc.xa-properties.databaseName=dataverse diff --git a/conf/keycloak/builtin-users-spi/pom.xml b/conf/keycloak/builtin-users-spi/pom.xml new file mode 100644 index 00000000000..36cf6548d01 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/pom.xml @@ -0,0 +1,110 @@ + + 4.0.0 + edu.harvard.iq.keycloak + keycloak-dv-builtin-users-authenticator + 1.0-SNAPSHOT + jar + + + + + org.keycloak + keycloak-server-spi + ${keycloak.version} + provided + + + + + org.keycloak + keycloak-server-spi-private + ${keycloak.version} + provided + + + + + org.keycloak + keycloak-services + ${keycloak.version} + provided + + + + + org.keycloak + keycloak-model-jpa + ${keycloak.version} + provided + + + + + jakarta.persistence + jakarta.persistence-api + ${jakarta.persistence.version} + + + + + org.mindrot + jbcrypt + ${mindrot.jbcrypt.version} + compile + + + + + + + org.junit.jupiter + junit-jupiter-api + ${junit.jupiter.version} + test + + + + + org.mockito + mockito-core + ${mockito.version} + test + + + + + + org.apache.maven.plugins + maven-shade-plugin + 3.2.4 + + + package + + shade + + + + + + + org.apache.maven.plugins + maven-compiler-plugin + + 17 + 17 + + + + + + + 26.3.2 + 17 + 3.2.0 + 0.4 + 5.15.2 + 5.11.4 + + diff --git a/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/adapters/DataverseUserAdapter.java b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/adapters/DataverseUserAdapter.java new file mode 100644 index 00000000000..a47eeee1749 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/adapters/DataverseUserAdapter.java @@ -0,0 +1,71 @@ +package edu.harvard.iq.keycloak.auth.spi.adapters; + +import edu.harvard.iq.keycloak.auth.spi.models.DataverseUser; +import edu.harvard.iq.keycloak.auth.spi.providers.DataverseUserStorageProviderFactory; +import org.keycloak.component.ComponentModel; +import org.keycloak.models.GroupModel; +import org.keycloak.models.KeycloakSession; +import org.keycloak.models.RealmModel; +import org.keycloak.storage.StorageId; +import org.keycloak.storage.adapter.AbstractUserAdapterFederatedStorage; + +import java.util.stream.Stream; + +public class DataverseUserAdapter extends AbstractUserAdapterFederatedStorage { + + protected DataverseUser dataverseUser; + protected String keycloakId; + + private static final String ATTRIBUTE_NAME_IDP = "idp"; + + public DataverseUserAdapter(KeycloakSession session, RealmModel realm, ComponentModel model, DataverseUser dataverseUser) { + super(session, realm, model); + this.dataverseUser = dataverseUser; + keycloakId = StorageId.keycloakId(model, dataverseUser.getBuiltinUser().getId().toString()); + this.setSingleAttribute(ATTRIBUTE_NAME_IDP, DataverseUserStorageProviderFactory.PROVIDER_ID); + } + + @Override + public void setUsername(String s) { + } + + @Override + public String getUsername() { + return dataverseUser.getBuiltinUser().getUsername(); + } + + @Override + public String getEmail() { + return dataverseUser.getAuthenticatedUser().getEmail(); + } + + @Override + public String getFirstName() { + return dataverseUser.getAuthenticatedUser().getFirstName(); + } + + @Override + public String getLastName() { + return dataverseUser.getAuthenticatedUser().getLastName(); + } + + @Override + public Stream getGroupsStream(String search, Integer first, Integer max) { + return super.getGroupsStream(search, first, max); + } + + @Override + public long getGroupsCount() { + return super.getGroupsCount(); + } + + @Override + public long getGroupsCountByNameContaining(String search) { + return super.getGroupsCountByNameContaining(search); + } + + @Override + public String getId() { + return keycloakId; + } +} diff --git a/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/models/DataverseAuthenticatedUser.java b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/models/DataverseAuthenticatedUser.java new file mode 100644 index 00000000000..d2d1e292ade --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/models/DataverseAuthenticatedUser.java @@ -0,0 +1,48 @@ +package edu.harvard.iq.keycloak.auth.spi.models; + +import jakarta.persistence.*; + +@NamedQueries({ + @NamedQuery(name = "DataverseAuthenticatedUser.findByEmail", + query = "select au from DataverseAuthenticatedUser au WHERE LOWER(au.email)=LOWER(:email)"), + @NamedQuery(name = "DataverseAuthenticatedUser.findByIdentifier", + query = "select au from DataverseAuthenticatedUser au WHERE LOWER(au.userIdentifier)=LOWER(:identifier)"), +}) +@Entity +@Table(name = "authenticateduser") +public class DataverseAuthenticatedUser { + @Id + private Integer id; + private String email; + private String lastName; + private String firstName; + private String userIdentifier; + + public void setId(Integer id) { + this.id = id; + } + + public void setEmail(String email) { + this.email = email; + } + + public void setUserIdentifier(String userIdentifier) { + this.userIdentifier = userIdentifier; + } + + public String getEmail() { + return email; + } + + public String getLastName() { + return lastName; + } + + public String getFirstName() { + return firstName; + } + + public String getUserIdentifier() { + return userIdentifier; + } +} diff --git a/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/models/DataverseBuiltinUser.java b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/models/DataverseBuiltinUser.java new file mode 100644 index 00000000000..b4dd59339d2 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/models/DataverseBuiltinUser.java @@ -0,0 +1,48 @@ +package edu.harvard.iq.keycloak.auth.spi.models; + +import jakarta.persistence.*; + +@NamedQueries({ + @NamedQuery(name = "DataverseBuiltinUser.findByUsername", + query = "SELECT u FROM DataverseBuiltinUser u WHERE LOWER(u.username)=LOWER(:username)") +}) +@Entity +@Table(name = "builtinuser") +public class DataverseBuiltinUser { + @Id + private Integer id; + + private String username; + + private String encryptedPassword; + + private Integer passwordEncryptionVersion; + + public void setId(Integer id) { + this.id = id; + } + + public void setUsername(String username) { + this.username = username; + } + + public Integer getId() { + return id; + } + + public String getUsername() { + return username; + } + + public Integer getPasswordEncryptionVersion() { + return passwordEncryptionVersion; + } + + public void setEncryptedPassword(String encryptedPassword) { + this.encryptedPassword = encryptedPassword; + } + + public String getEncryptedPassword() { + return encryptedPassword; + } +} diff --git a/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/models/DataverseUser.java b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/models/DataverseUser.java new file mode 100644 index 00000000000..d697fe52fc8 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/models/DataverseUser.java @@ -0,0 +1,20 @@ +package edu.harvard.iq.keycloak.auth.spi.models; + +public class DataverseUser { + + private final DataverseAuthenticatedUser authenticatedUser; + private final DataverseBuiltinUser builtinUser; + + public DataverseUser(DataverseAuthenticatedUser authenticatedUser, DataverseBuiltinUser builtinUser) { + this.authenticatedUser = authenticatedUser; + this.builtinUser = builtinUser; + } + + public DataverseAuthenticatedUser getAuthenticatedUser() { + return authenticatedUser; + } + + public DataverseBuiltinUser getBuiltinUser() { + return builtinUser; + } +} diff --git a/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/providers/DataverseUserStorageProvider.java b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/providers/DataverseUserStorageProvider.java new file mode 100644 index 00000000000..f9de30cf845 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/providers/DataverseUserStorageProvider.java @@ -0,0 +1,86 @@ +package edu.harvard.iq.keycloak.auth.spi.providers; + +import edu.harvard.iq.keycloak.auth.spi.adapters.DataverseUserAdapter; +import edu.harvard.iq.keycloak.auth.spi.models.DataverseUser; +import edu.harvard.iq.keycloak.auth.spi.services.DataverseAuthenticationService; +import edu.harvard.iq.keycloak.auth.spi.services.DataverseUserService; +import org.jboss.logging.Logger; +import org.keycloak.component.ComponentModel; +import org.keycloak.credential.CredentialInput; +import org.keycloak.credential.CredentialInputValidator; +import org.keycloak.models.*; +import org.keycloak.models.credential.PasswordCredentialModel; +import org.keycloak.storage.UserStorageProvider; +import org.keycloak.storage.user.UserLookupProvider; + +/** + * DataverseUserStorageProvider integrates Keycloak with Dataverse user storage. + * It enables authentication and retrieval of users from a Dataverse-based user store. + */ +public class DataverseUserStorageProvider implements + UserStorageProvider, + UserLookupProvider, + CredentialInputValidator { + + private static final Logger logger = Logger.getLogger(DataverseUserStorageProvider.class); + + private final ComponentModel model; + private final KeycloakSession session; + private final DataverseUserService dataverseUserService; + + public DataverseUserStorageProvider(KeycloakSession session, ComponentModel model) { + this.session = session; + this.model = model; + + String datasource = model.getConfig().getFirst("datasource"); + logger.debugf("Using datasource: %s", datasource); + this.dataverseUserService = new DataverseUserService(session, datasource); + } + + @Override + public UserModel getUserById(RealmModel realm, String id) { + DataverseUser dataverseUser = dataverseUserService.getUserById(id); + return (dataverseUser != null) ? new DataverseUserAdapter(session, realm, model, dataverseUser) : null; + } + + @Override + public UserModel getUserByUsername(RealmModel realm, String username) { + DataverseUser dataverseUser = dataverseUserService.getUserByUsername(username); + return (dataverseUser != null) ? new DataverseUserAdapter(session, realm, model, dataverseUser) : null; + } + + @Override + public UserModel getUserByEmail(RealmModel realm, String email) { + DataverseUser dataverseUser = dataverseUserService.getUserByEmail(email); + return (dataverseUser != null) ? new DataverseUserAdapter(session, realm, model, dataverseUser) : null; + } + + @Override + public boolean supportsCredentialType(String credentialType) { + return PasswordCredentialModel.TYPE.equals(credentialType); + } + + @Override + public boolean isConfiguredFor(RealmModel realm, UserModel user, String credentialType) { + logger.debugf("Checking credential configuration for user: %s, credentialType: %s", user.getUsername(), credentialType); + return false; + } + + @Override + public boolean isValid(RealmModel realm, UserModel user, CredentialInput input) { + logger.debugf("Validating credentials for user: %s", user.getUsername()); + + if (!supportsCredentialType(input.getType()) || !(input instanceof UserCredentialModel userCredential)) { + return false; + } + + DataverseAuthenticationService dataverseAuthenticationService = new DataverseAuthenticationService(dataverseUserService); + return dataverseAuthenticationService.canLogInAsBuiltinUser(user.getUsername(), userCredential.getValue()); + } + + @Override + public void close() { + logger.debug("Closing DataverseUserStorageProvider"); + this.dataverseUserService.close(); + } +} diff --git a/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/providers/DataverseUserStorageProviderFactory.java b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/providers/DataverseUserStorageProviderFactory.java new file mode 100644 index 00000000000..ceaeec46055 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/providers/DataverseUserStorageProviderFactory.java @@ -0,0 +1,52 @@ +package edu.harvard.iq.keycloak.auth.spi.providers; + +import org.jboss.logging.Logger; +import org.keycloak.component.ComponentModel; +import org.keycloak.models.KeycloakSession; +import org.keycloak.provider.ProviderConfigProperty; +import org.keycloak.storage.UserStorageProviderFactory; + +import java.util.ArrayList; +import java.util.List; + +public class DataverseUserStorageProviderFactory implements UserStorageProviderFactory { + + public static final String PROVIDER_ID = "dv-builtin-users-authenticator"; + + private static final Logger logger = Logger.getLogger(DataverseUserStorageProviderFactory.class); + + @Override + public DataverseUserStorageProvider create(KeycloakSession session, ComponentModel model) { + return new DataverseUserStorageProvider(session, model); + } + + @Override + public String getId() { + return PROVIDER_ID; + } + + @Override + public String getHelpText() { + return "A Keycloak Storage Provider to authenticate Dataverse Builtin Users"; + } + + @Override + public void close() { + logger.debug("<<<<<< Closing factory"); + } + + @Override + public List getConfigProperties() { + List configProperties = new ArrayList<>(); + + ProviderConfigProperty mySetting = new ProviderConfigProperty(); + mySetting.setName("datasource"); + mySetting.setLabel("Datasource"); + mySetting.setHelpText("This specifies the target datasource used by the SPI."); + mySetting.setType(ProviderConfigProperty.STRING_TYPE); + + configProperties.add(mySetting); + + return configProperties; + } +} diff --git a/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseAuthenticationService.java b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseAuthenticationService.java new file mode 100644 index 00000000000..995662e1cb6 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseAuthenticationService.java @@ -0,0 +1,48 @@ +package edu.harvard.iq.keycloak.auth.spi.services; + +import edu.harvard.iq.keycloak.auth.spi.models.DataverseBuiltinUser; +import edu.harvard.iq.keycloak.auth.spi.models.DataverseUser; + +public class DataverseAuthenticationService { + + private final DataverseUserService dataverseUserService; + + private PasswordEncryption.Algorithm passwordEncryptionAlgorithm; + + public DataverseAuthenticationService(DataverseUserService dataverseUserService) { + this(dataverseUserService, null); + } + + // Just for testing purposes, do not use + public DataverseAuthenticationService(DataverseUserService dataverseUserService, PasswordEncryption.Algorithm passwordEncryptionAlgorithm) { + this.dataverseUserService = dataverseUserService; + this.passwordEncryptionAlgorithm = passwordEncryptionAlgorithm; + } + + /** + * Validates if a Dataverse built-in user can log in with the given credentials. + * + * @param usernameOrEmail The username or email of the Dataverse built-in user. + * @param password The password to be validated. + * @return {@code true} if the user can log in, {@code false} otherwise. + */ + public boolean canLogInAsBuiltinUser(String usernameOrEmail, String password) { + DataverseUser dataverseUser = this.dataverseUserService.getUserByUsername(usernameOrEmail); + + if (dataverseUser == null) { + dataverseUser = this.dataverseUserService.getUserByEmail(usernameOrEmail); + } + + if (dataverseUser == null) { + return false; + } + + DataverseBuiltinUser builtinUser = dataverseUser.getBuiltinUser(); + + if (passwordEncryptionAlgorithm == null) { + passwordEncryptionAlgorithm = PasswordEncryption.getVersion(builtinUser.getPasswordEncryptionVersion()); + } + + return passwordEncryptionAlgorithm.check(password, builtinUser.getEncryptedPassword()); + } +} diff --git a/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseUserService.java b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseUserService.java new file mode 100644 index 00000000000..5256d5df683 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseUserService.java @@ -0,0 +1,110 @@ +package edu.harvard.iq.keycloak.auth.spi.services; + +import edu.harvard.iq.keycloak.auth.spi.models.DataverseAuthenticatedUser; +import edu.harvard.iq.keycloak.auth.spi.models.DataverseBuiltinUser; +import edu.harvard.iq.keycloak.auth.spi.models.DataverseUser; +import jakarta.persistence.EntityManager; +import org.jboss.logging.Logger; +import org.keycloak.connections.jpa.JpaConnectionProvider; +import org.keycloak.models.KeycloakSession; +import org.keycloak.storage.StorageId; + +import java.util.List; + +public class DataverseUserService { + + private static final Logger logger = Logger.getLogger(DataverseUserService.class); + + private final EntityManager em; + + public DataverseUserService(KeycloakSession session, String datasource) { + this.em = session.getProvider(JpaConnectionProvider.class, datasource).getEntityManager(); + } + + public DataverseUser getUserById(String id) { + logger.debugf("Fetching user by ID: %s", id); + String persistenceId = StorageId.externalId(id); + + DataverseBuiltinUser builtinUser = em.find(DataverseBuiltinUser.class, persistenceId); + if (builtinUser == null) { + logger.debugf("Builtin user not found for external ID: %s", persistenceId); + return null; + } + + String username = builtinUser.getUsername(); + DataverseAuthenticatedUser authenticatedUser = getAuthenticatedUserByUsername(username); + if (authenticatedUser == null) { + logger.debugf("Authenticated user not found by username: %s", username); + return null; + } + + return new DataverseUser(authenticatedUser, builtinUser); + } + + public DataverseUser getUserByUsername(String username) { + logger.debugf("Fetching user by username: %s", username); + List users = em.createNamedQuery("DataverseBuiltinUser.findByUsername", DataverseBuiltinUser.class) + .setParameter("username", username) + .getResultList(); + + if (users.isEmpty()) { + logger.debugf("Builtin user not found by username: %s", username); + return null; + } + + DataverseAuthenticatedUser authenticatedUser = getAuthenticatedUserByUsername(username); + if (authenticatedUser == null) { + logger.debugf("Authenticated user not found by username: %s", username); + return null; + } + + return new DataverseUser(authenticatedUser, users.get(0)); + } + + public DataverseUser getUserByEmail(String email) { + logger.debugf("Fetching user by email: %s", email); + List authUsers = em.createNamedQuery("DataverseAuthenticatedUser.findByEmail", DataverseAuthenticatedUser.class) + .setParameter("email", email) + .getResultList(); + + if (authUsers.isEmpty()) { + logger.debugf("Authenticated user not found by email: %s", email); + return null; + } + + String username = authUsers.get(0).getUserIdentifier(); + List builtinUsers = em.createNamedQuery("DataverseBuiltinUser.findByUsername", DataverseBuiltinUser.class) + .setParameter("username", username) + .getResultList(); + + if (builtinUsers.isEmpty()) { + logger.debugf("Builtin user not found by username: %s", username); + return null; + } + + return new DataverseUser(authUsers.get(0), builtinUsers.get(0)); + } + + public void close() { + if (em != null) { + em.close(); + } + } + + /** + * Retrieves an authenticated user from Dataverse by username. + * + * @param username The username to look up. + * @return The authenticated user or null if not found. + */ + private DataverseAuthenticatedUser getAuthenticatedUserByUsername(String username) { + try { + return em.createNamedQuery("DataverseAuthenticatedUser.findByIdentifier", DataverseAuthenticatedUser.class) + .setParameter("identifier", username) + .getSingleResult(); + } catch (Exception e) { + logger.debugf("Could not find authenticated user by username: %s", username); + return null; + } + } +} diff --git a/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/services/PasswordEncryption.java b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/services/PasswordEncryption.java new file mode 100644 index 00000000000..f8ecd4232b3 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/java/edu/harvard/iq/keycloak/auth/spi/services/PasswordEncryption.java @@ -0,0 +1,86 @@ +package edu.harvard.iq.keycloak.auth.spi.services; + +import org.mindrot.jbcrypt.BCrypt; + +import java.nio.charset.StandardCharsets; +import java.security.MessageDigest; +import java.security.NoSuchAlgorithmException; +import java.util.Base64; + +/** + * Password encryption, supporting multiple encryption algorithms to + * allow migrations between them. + *

+ * When adding a new password hashing algorithm, implement the {@link Algorithm} + * interface, and add an instance of the implementation as the last element + * of the {@link #algorithms} array. The rest should pretty much happen automatically + * (e.g system will detect outdated passwords for users and initiate the password reset breakout). + * + * NOTE: This class is a copy of the one in + * {@code edu.harvard.iq.dataverse.authorization.providers.builtin} + * within the Dataverse application and must stay in sync with it. + * + * @author Ellen Kraffmiller + * @author Michael Bar-Sinai + */ +public final class PasswordEncryption implements java.io.Serializable { + + public interface Algorithm { + boolean check(String plainText, String hashed); + } + + /** + * The SHA algorithm, now considered not secure enough. + */ + private static final Algorithm SHA = new Algorithm() { + + private String encrypt(String plainText) { + try { + MessageDigest md = MessageDigest.getInstance("SHA"); + md.update(plainText.getBytes(StandardCharsets.UTF_8)); + byte[] raw = md.digest(); + return Base64.getEncoder().encodeToString(raw); + + } catch (NoSuchAlgorithmException e) { + throw new RuntimeException(e); + } + } + + @Override + public boolean check(String plainText, String hashed) { + return hashed.equals(encrypt(plainText)); + } + }; + + /** + * BCrypt, using a complexity factor of 10 (considered safe by 2015 standards). + */ + private static final Algorithm BCRYPT_10 = new Algorithm() { + + @Override + public boolean check(String plainText, String hashed) { + try { + return BCrypt.checkpw(plainText, hashed); + } catch (IllegalArgumentException iae) { + // the password was probably not hashed using bcrypt. + return false; + } + } + }; + + private static final Algorithm[] algorithms; + + static { + algorithms = new Algorithm[]{SHA, BCRYPT_10}; + } + + /** + * Prevent people instantiating this class. + */ + private PasswordEncryption() { + } + + public static Algorithm getVersion(int i) { + return algorithms[i]; + } +} diff --git a/conf/keycloak/builtin-users-spi/src/main/resources/META-INF/beans.xml b/conf/keycloak/builtin-users-spi/src/main/resources/META-INF/beans.xml new file mode 100644 index 00000000000..e69de29bb2d diff --git a/conf/keycloak/builtin-users-spi/src/main/resources/META-INF/persistence.xml b/conf/keycloak/builtin-users-spi/src/main/resources/META-INF/persistence.xml new file mode 100644 index 00000000000..f5134f48986 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/resources/META-INF/persistence.xml @@ -0,0 +1,47 @@ + + + + edu.harvard.iq.keycloak.auth.spi.models.DataverseBuiltinUser + edu.harvard.iq.keycloak.auth.spi.models.DataverseAuthenticatedUser + + + + + + + + + + + + + + + + + + + + edu.harvard.iq.keycloak.auth.spi.models.DataverseBuiltinUser + edu.harvard.iq.keycloak.auth.spi.models.DataverseAuthenticatedUser + + + + + + + + + + + + + + + + + + diff --git a/conf/keycloak/builtin-users-spi/src/main/resources/META-INF/services/org.keycloak.storage.UserStorageProviderFactory b/conf/keycloak/builtin-users-spi/src/main/resources/META-INF/services/org.keycloak.storage.UserStorageProviderFactory new file mode 100644 index 00000000000..4ec99f734db --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/main/resources/META-INF/services/org.keycloak.storage.UserStorageProviderFactory @@ -0,0 +1 @@ +edu.harvard.iq.keycloak.auth.spi.providers.DataverseUserStorageProviderFactory \ No newline at end of file diff --git a/conf/keycloak/builtin-users-spi/src/test/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseAuthenticationServiceTest.java b/conf/keycloak/builtin-users-spi/src/test/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseAuthenticationServiceTest.java new file mode 100644 index 00000000000..35f37973b71 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/test/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseAuthenticationServiceTest.java @@ -0,0 +1,64 @@ +package edu.harvard.iq.keycloak.auth.spi.services; + +import edu.harvard.iq.keycloak.auth.spi.models.DataverseBuiltinUser; +import edu.harvard.iq.keycloak.auth.spi.models.DataverseUser; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.*; +import static org.mockito.Mockito.*; + +class DataverseAuthenticationServiceTest { + + private DataverseUserService dataverseUserServiceMock; + private PasswordEncryption.Algorithm passwordEncryptionAlgorithmMock; + private DataverseAuthenticationService sut; + + @BeforeEach + void setUp() { + dataverseUserServiceMock = mock(DataverseUserService.class); + passwordEncryptionAlgorithmMock = mock(PasswordEncryption.Algorithm.class); + sut = new DataverseAuthenticationService(dataverseUserServiceMock, passwordEncryptionAlgorithmMock); + } + + @Test + void canLogInAsBuiltinUser_userFoundByUsername_validCredentials() { + setupUserMock("username", true, true); + assertTrue(sut.canLogInAsBuiltinUser("username", "password")); + } + + @Test + void canLogInAsBuiltinUser_userFoundByUsername_invalidCredentials() { + setupUserMock("username", true, false); + assertFalse(sut.canLogInAsBuiltinUser("username", "password")); + } + + @Test + void canLogInAsBuiltinUser_userFoundByEmail_validCredentials() { + setupUserMock("user@dataverse.org", false, true); + assertTrue(sut.canLogInAsBuiltinUser("user@dataverse.org", "password")); + } + + @Test + void canLogInAsBuiltinUser_userFoundByEmail_invalidCredentials() { + setupUserMock("user@dataverse.org", false, false); + assertFalse(sut.canLogInAsBuiltinUser("user@dataverse.org", "password")); + } + + private void setupUserMock(String identifier, boolean foundByUsername, boolean validPassword) { + String encryptedPassword = "encryptedPassword"; + DataverseUser dataverseUserMock = mock(DataverseUser.class); + DataverseBuiltinUser dataverseBuiltinUser = new DataverseBuiltinUser(); + dataverseBuiltinUser.setEncryptedPassword(encryptedPassword); + + when(dataverseUserMock.getBuiltinUser()).thenReturn(dataverseBuiltinUser); + when(passwordEncryptionAlgorithmMock.check(anyString(), eq(encryptedPassword))).thenReturn(validPassword); + + if (foundByUsername) { + when(dataverseUserServiceMock.getUserByUsername(identifier)).thenReturn(dataverseUserMock); + } else { + when(dataverseUserServiceMock.getUserByUsername(identifier)).thenReturn(null); + when(dataverseUserServiceMock.getUserByEmail(identifier)).thenReturn(dataverseUserMock); + } + } +} diff --git a/conf/keycloak/builtin-users-spi/src/test/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseUserServiceTest.java b/conf/keycloak/builtin-users-spi/src/test/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseUserServiceTest.java new file mode 100644 index 00000000000..5f4de20f074 --- /dev/null +++ b/conf/keycloak/builtin-users-spi/src/test/java/edu/harvard/iq/keycloak/auth/spi/services/DataverseUserServiceTest.java @@ -0,0 +1,150 @@ +package edu.harvard.iq.keycloak.auth.spi.services; + +import edu.harvard.iq.keycloak.auth.spi.models.DataverseAuthenticatedUser; +import edu.harvard.iq.keycloak.auth.spi.models.DataverseBuiltinUser; +import edu.harvard.iq.keycloak.auth.spi.models.DataverseUser; +import jakarta.persistence.EntityManager; +import jakarta.persistence.TypedQuery; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.keycloak.connections.jpa.JpaConnectionProvider; +import org.keycloak.models.KeycloakSession; + +import java.util.Collections; + +import static org.junit.jupiter.api.Assertions.*; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +class DataverseUserServiceTest { + + private static final String TEST_USER_STORE = "user-store"; + + private EntityManager entityManagerMock; + private DataverseUserService sut; + + @BeforeEach + void setUp() { + entityManagerMock = mock(EntityManager.class); + KeycloakSession sessionMock = mock(KeycloakSession.class); + + JpaConnectionProvider jpaConnectionProviderMock = mock(JpaConnectionProvider.class); + when(sessionMock.getProvider(JpaConnectionProvider.class, TEST_USER_STORE)).thenReturn(jpaConnectionProviderMock); + when(jpaConnectionProviderMock.getEntityManager()).thenReturn(entityManagerMock); + + sut = new DataverseUserService(sessionMock, TEST_USER_STORE); + } + + @Test + void getUserById_userExists() { + String testUserId = "123"; + String testUsername = "testuser"; + + DataverseBuiltinUser builtinUser = new DataverseBuiltinUser(); + builtinUser.setId(1); + builtinUser.setUsername(testUsername); + + when(entityManagerMock.find(DataverseBuiltinUser.class, "123")).thenReturn(builtinUser); + TypedQuery authUserQuery = mock(TypedQuery.class); + when(entityManagerMock.createNamedQuery("DataverseAuthenticatedUser.findByIdentifier", DataverseAuthenticatedUser.class)) + .thenReturn(authUserQuery); + when(authUserQuery.setParameter("identifier", testUsername)).thenReturn(authUserQuery); + + DataverseAuthenticatedUser authUser = new DataverseAuthenticatedUser(); + authUser.setUserIdentifier(testUsername); + when(authUserQuery.getSingleResult()).thenReturn(authUser); + + DataverseUser user = sut.getUserById(testUserId); + assertNotNull(user); + assertEquals(testUsername, user.getBuiltinUser().getUsername()); + } + + @Test + void getUserById_userNotFound() { + when(entityManagerMock.find(DataverseBuiltinUser.class, "123")).thenReturn(null); + assertNull(sut.getUserById("123")); + } + + @Test + void getUserByUsername_userExists() { + String testUsername = "testuser"; + + DataverseBuiltinUser builtinUser = new DataverseBuiltinUser(); + builtinUser.setUsername(testUsername); + builtinUser.setId(1); + + DataverseAuthenticatedUser authUser = new DataverseAuthenticatedUser(); + authUser.setUserIdentifier(testUsername); + + TypedQuery builtinUserQuery = mock(TypedQuery.class); + TypedQuery authUserQuery = mock(TypedQuery.class); + + when(entityManagerMock.createNamedQuery("DataverseBuiltinUser.findByUsername", DataverseBuiltinUser.class)) + .thenReturn(builtinUserQuery); + when(builtinUserQuery.setParameter("username", testUsername)).thenReturn(builtinUserQuery); + when(builtinUserQuery.getResultList()).thenReturn(Collections.singletonList(builtinUser)); + + when(entityManagerMock.createNamedQuery("DataverseAuthenticatedUser.findByIdentifier", DataverseAuthenticatedUser.class)) + .thenReturn(authUserQuery); + when(authUserQuery.setParameter("identifier", testUsername)).thenReturn(authUserQuery); + when(authUserQuery.getSingleResult()).thenReturn(authUser); + + DataverseUser user = sut.getUserByUsername(testUsername); + assertNotNull(user); + assertEquals(testUsername, user.getBuiltinUser().getUsername()); + } + + @Test + void getUserByUsername_userNotFound() { + TypedQuery query = mock(TypedQuery.class); + when(entityManagerMock.createNamedQuery("DataverseBuiltinUser.findByUsername", DataverseBuiltinUser.class)) + .thenReturn(query); + when(query.setParameter("username", "unknown")).thenReturn(query); + when(query.getResultList()).thenReturn(Collections.emptyList()); + + assertNull(sut.getUserByUsername("unknown")); + } + + @Test + void getUserByEmail_userExists() { + String testEmail = "test@dataverse.org"; + String testUsername = "testuser"; + + DataverseAuthenticatedUser authUser = new DataverseAuthenticatedUser(); + authUser.setEmail(testEmail); + authUser.setId(1); + authUser.setUserIdentifier(testUsername); + + DataverseBuiltinUser builtinUser = new DataverseBuiltinUser(); + builtinUser.setUsername(testUsername); + builtinUser.setId(1); + + TypedQuery authUserQuery = mock(TypedQuery.class); + TypedQuery builtinUserQuery = mock(TypedQuery.class); + + when(entityManagerMock.createNamedQuery("DataverseAuthenticatedUser.findByEmail", DataverseAuthenticatedUser.class)) + .thenReturn(authUserQuery); + when(authUserQuery.setParameter("email", testEmail)).thenReturn(authUserQuery); + when(authUserQuery.getResultList()).thenReturn(Collections.singletonList(authUser)); + + when(entityManagerMock.createNamedQuery("DataverseBuiltinUser.findByUsername", DataverseBuiltinUser.class)) + .thenReturn(builtinUserQuery); + when(builtinUserQuery.setParameter("username", testUsername)).thenReturn(builtinUserQuery); + when(builtinUserQuery.getResultList()).thenReturn(Collections.singletonList(builtinUser)); + + DataverseUser user = sut.getUserByEmail(testEmail); + assertNotNull(user); + assertEquals(testUsername, user.getBuiltinUser().getUsername()); + } + + @Test + void getUserByEmail_userNotFound() { + TypedQuery query = mock(TypedQuery.class); + when(entityManagerMock.createNamedQuery("DataverseAuthenticatedUser.findByEmail", DataverseAuthenticatedUser.class)) + .thenReturn(query); + when(query.setParameter("email", "unknown@dataverse.org")).thenReturn(query); + when(query.getResultList()).thenReturn(Collections.emptyList()); + + assertNull(sut.getUserByEmail("unknown@dataverse.org")); + } +} diff --git a/conf/keycloak/docker-compose-dev.yml b/conf/keycloak/docker-compose-dev.yml new file mode 100644 index 00000000000..7356161ec47 --- /dev/null +++ b/conf/keycloak/docker-compose-dev.yml @@ -0,0 +1,318 @@ +# This file is designed for testing Keycloak authentication using the +# Dataverse Builtin Users SPI. +# +# Keycloak is deployed using a custom-built image, defined by a Dockerfile +# located in this directory. This allows for a controlled +# and flexible development setup. Note that this image is currently +# intended for development and testing purposes only and should be used +# accordingly in non-production environments. + +version: "2.4" + +services: + + dev_dataverse: + container_name: "dev_dataverse" + hostname: dataverse + image: ${APP_IMAGE} + restart: on-failure + user: payara + environment: + DATAVERSE_DB_HOST: postgres + DATAVERSE_DB_PASSWORD: secret + DATAVERSE_DB_USER: ${DATAVERSE_DB_USER} + ENABLE_JDWP: "1" + ENABLE_RELOAD: "1" + SKIP_DEPLOY: "${SKIP_DEPLOY}" + DATAVERSE_JSF_REFRESH_PERIOD: "1" + DATAVERSE_FEATURE_API_BEARER_AUTH: "1" + DATAVERSE_FEATURE_INDEX_HARVESTED_METADATA_SOURCE: "1" + DATAVERSE_FEATURE_API_BEARER_AUTH_PROVIDE_MISSING_CLAIMS: "1" + DATAVERSE_FEATURE_API_BEARER_AUTH_USE_BUILTIN_USER_ON_ID_MATCH: "1" + DATAVERSE_MAIL_SYSTEM_EMAIL: "dataverse@localhost" + DATAVERSE_MAIL_MTA_HOST: "smtp" + DATAVERSE_AUTH_OIDC_ENABLED: "1" + DATAVERSE_AUTH_OIDC_CLIENT_ID: test + DATAVERSE_AUTH_OIDC_CLIENT_SECRET: 94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8 + DATAVERSE_AUTH_OIDC_AUTH_SERVER_URL: http://keycloak.mydomain.com:8090/realms/test + DATAVERSE_SPI_EXPORTERS_DIRECTORY: "/dv/exporters" + # These two oai settings are here to get HarvestingServerIT to pass + dataverse_oai_server_maxidentifiers: "2" + dataverse_oai_server_maxrecords: "2" + JVM_ARGS: -Ddataverse.files.storage-driver-id=file1 + -Ddataverse.files.file1.type=file + -Ddataverse.files.file1.label=Filesystem + -Ddataverse.files.file1.directory=${STORAGE_DIR}/store + -Ddataverse.files.localstack1.type=s3 + -Ddataverse.files.localstack1.label=LocalStack + -Ddataverse.files.localstack1.custom-endpoint-url=http://localstack:4566 + -Ddataverse.files.localstack1.custom-endpoint-region=us-east-2 + -Ddataverse.files.localstack1.bucket-name=mybucket + -Ddataverse.files.localstack1.path-style-access=true + -Ddataverse.files.localstack1.upload-redirect=true + -Ddataverse.files.localstack1.download-redirect=true + -Ddataverse.files.localstack1.access-key=default + -Ddataverse.files.localstack1.secret-key=default + -Ddataverse.files.minio1.type=s3 + -Ddataverse.files.minio1.label=MinIO + -Ddataverse.files.minio1.custom-endpoint-url=http://minio:9000 + -Ddataverse.files.minio1.custom-endpoint-region=us-east-1 + -Ddataverse.files.minio1.bucket-name=mybucket + -Ddataverse.files.minio1.path-style-access=true + -Ddataverse.files.minio1.upload-redirect=false + -Ddataverse.files.minio1.download-redirect=false + -Ddataverse.files.minio1.access-key=4cc355_k3y + -Ddataverse.files.minio1.secret-key=s3cr3t_4cc355_k3y + -Ddataverse.pid.providers=fake + -Ddataverse.pid.default-provider=fake + -Ddataverse.pid.fake.type=FAKE + -Ddataverse.pid.fake.label=FakeDOIProvider + -Ddataverse.pid.fake.authority=10.5072 + -Ddataverse.pid.fake.shoulder=FK2/ + #-Ddataverse.lang.directory=/dv/lang + ports: + - "8080:8080" # HTTP (Dataverse Application) + - "4949:4848" # HTTPS (Payara Admin Console) + - "9009:9009" # JDWP + - "8686:8686" # JMX + networks: + - dataverse + depends_on: + - dev_postgres + - dev_solr + - dev_dv_initializer + volumes: + - ./docker-dev-volumes/app/data:/dv + - ./docker-dev-volumes/app/secrets:/secrets + - ../../target/dataverse:/opt/payara/deployments/dataverse:ro + tmpfs: + - /dumps:mode=770,size=2052M,uid=1000,gid=1000 + - /tmp:mode=770,size=2052M,uid=1000,gid=1000 + mem_limit: 2147483648 # 2 GiB + mem_reservation: 1024m + privileged: false + + dev_bootstrap: + container_name: "dev_bootstrap" + image: gdcc/configbaker:unstable + restart: "no" + command: + - bootstrap.sh + - dev + networks: + - dataverse + volumes: + - ./docker-dev-volumes/solr/data:/var/solr + + dev_dv_initializer: + container_name: "dev_dv_initializer" + image: gdcc/configbaker:unstable + restart: "no" + command: + - sh + - -c + - "fix-fs-perms.sh dv" + volumes: + - ./docker-dev-volumes/app/data:/dv + + dev_postgres: + container_name: "dev_postgres" + hostname: postgres + image: postgres:${POSTGRES_VERSION} + restart: on-failure + environment: + - POSTGRES_USER=${DATAVERSE_DB_USER} + - POSTGRES_PASSWORD=secret + ports: + - "5432:5432" + networks: + - dataverse + volumes: + - ./docker-dev-volumes/postgresql/data:/var/lib/postgresql/data + + dev_solr_initializer: + container_name: "dev_solr_initializer" + image: gdcc/configbaker:unstable + restart: "no" + command: + - sh + - -c + - "fix-fs-perms.sh solr && cp -a /template/* /solr-template" + volumes: + - ./docker-dev-volumes/solr/data:/var/solr + - ./docker-dev-volumes/solr/conf:/solr-template + + dev_solr: + container_name: "dev_solr" + hostname: "solr" + image: solr:${SOLR_VERSION} + depends_on: + - dev_solr_initializer + restart: on-failure + ports: + - "8983:8983" + networks: + - dataverse + command: + - "solr-precreate" + - "collection1" + - "/template" + volumes: + - ./docker-dev-volumes/solr/data:/var/solr + - ./docker-dev-volumes/solr/conf:/template + + dev_smtp: + container_name: "dev_smtp" + hostname: "smtp" + image: maildev/maildev:2.0.5 + restart: on-failure + ports: + - "25:25" # smtp server + - "1080:1080" # web ui + environment: + - MAILDEV_SMTP_PORT=25 + - MAILDEV_MAIL_DIRECTORY=/mail + networks: + - dataverse + #volumes: + # - ./docker-dev-volumes/smtp/data:/mail + tmpfs: + - /mail:mode=770,size=128M,uid=1000,gid=1000 + + dev_keycloak: + container_name: "dev_keycloak" + build: + context: . + dockerfile: Dockerfile + image: gdcc/keycloak + hostname: keycloak + environment: + - KEYCLOAK_ADMIN=kcadmin + - KEYCLOAK_ADMIN_PASSWORD=kcpassword + - KEYCLOAK_LOGLEVEL=DEBUG + - KC_HOSTNAME_STRICT=false + - KC_DB=postgres + - KC_DB_URL=jdbc:postgresql://postgres:5432/dataverse + - KC_DB_USERNAME=${DATAVERSE_DB_USER} + - KC_DB_PASSWORD=secret + - DATAVERSE_DB_HOST=postgres + - DATAVERSE_DB_PORT=5432 + - DATAVERSE_DB_USER=${DATAVERSE_DB_USER} + - DATAVERSE_DB_PASSWORD=secret + - DATAVERSE_BASE_URL=http://dataverse:8080 + networks: + dataverse: + aliases: + - keycloak.mydomain.com #create a DNS alias within the network (add the same alias to your /etc/hosts to get a working OIDC flow) + command: start-dev --verbose --import-realm --http-port=8090 # change port to 8090, so within the network and external the same port is used + expose: + - "9000" + ports: + - "8090:8090" + + dev_keycloak_initializer: + image: alpine:latest + container_name: "dev_keycloak_initializer" + depends_on: + - dev_keycloak + environment: + - KEYCLOAK_ADMIN=kcadmin + - KEYCLOAK_ADMIN_PASSWORD=kcpassword + volumes: + - ./setup-spi.sh:/usr/local/bin/setup-spi.sh + command: [ "/bin/sh", "-c", "apk add --no-cache curl jq && /usr/local/bin/setup-spi.sh" ] + networks: + - dataverse + + # This proxy configuration is only intended to be used for development purposes! + # DO NOT USE IN PRODUCTION! HIGH SECURITY RISK! + dev_proxy: + image: caddy:2-alpine + # The command below is enough to enable using the admin gui, but it will not rewrite location headers to HTTP. + # To achieve rewriting from https:// to http://, we need a simple configuration file + #command: ["caddy", "reverse-proxy", "-f", ":4848", "-t", "https://dataverse:4848", "--insecure"] + command: ["caddy", "run", "-c", "/Caddyfile"] + ports: + - "4848:4848" # Will expose Payara Admin Console (HTTPS) as HTTP + restart: always + volumes: + - ../proxy/Caddyfile:/Caddyfile:ro + depends_on: + - dev_dataverse + networks: + - dataverse + + dev_localstack: + container_name: "dev_localstack" + hostname: "localstack" + image: localstack/localstack:2.3.2 + restart: on-failure + ports: + - "127.0.0.1:4566:4566" + environment: + - DEBUG=${DEBUG-} + - DOCKER_HOST=unix:///var/run/docker.sock + - HOSTNAME_EXTERNAL=localstack + networks: + - dataverse + volumes: + - ../localstack:/etc/localstack/init/ready.d + tmpfs: + - /localstack:mode=770,size=128M,uid=1000,gid=1000 + + dev_minio: + container_name: "dev_minio" + hostname: "minio" + image: minio/minio + restart: on-failure + ports: + - "9000:9000" + - "9001:9001" + networks: + - dataverse + volumes: + - ./docker-dev-volumes/minio_storage:/data + environment: + MINIO_ROOT_USER: 4cc355_k3y + MINIO_ROOT_PASSWORD: s3cr3t_4cc355_k3y + command: server /data + + previewers-provider: + container_name: previewers-provider + hostname: previewers-provider + image: trivadis/dataverse-previewers-provider:latest + ports: + - "9080:9080" + networks: + - dataverse + environment: + # have nginx match the port we run previewers on + - NGINX_HTTP_PORT=9080 + - PREVIEWERS_PROVIDER_URL=http://localhost:9080 + - VERSIONS="v1.4,betatest" + # https://docs.docker.com/reference/compose-file/services/#platform + # https://github.com/fabric8io/docker-maven-plugin/issues/1750 + platform: linux/amd64 + + register-previewers: + container_name: register-previewers + hostname: register-previewers + image: trivadis/dataverse-deploy-previewers:latest + networks: + - dataverse + environment: + - DATAVERSE_URL=http://dataverse:8080 + - TIMEOUT=10m + - PREVIEWERS_PROVIDER_URL=http://localhost:9080 + # Uncomment to specify which previewers you want. Otherwise you get all of them. + #- INCLUDE_PREVIEWERS=text,html,pdf,csv,comma-separated-values,tsv,tab-separated-values,jpeg,png,gif,markdown,x-markdown + - EXCLUDE_PREVIEWERS= + - REMOVE_EXISTING=true + command: + - deploy + restart: "no" + platform: linux/amd64 + +networks: + dataverse: + driver: bridge diff --git a/conf/keycloak/docker-compose.yml b/conf/keycloak/docker-compose.yml new file mode 100644 index 00000000000..4b5cf80daca --- /dev/null +++ b/conf/keycloak/docker-compose.yml @@ -0,0 +1,17 @@ +version: "3.9" + +services: + + keycloak: + image: 'quay.io/keycloak/keycloak:26.3.2' + command: + - "start-dev" + - "--import-realm" + environment: + - KEYCLOAK_ADMIN=kcadmin + - KEYCLOAK_ADMIN_PASSWORD=kcpassword + - KEYCLOAK_LOGLEVEL=DEBUG + ports: + - "8090:8080" + volumes: + - './test-realm.json:/opt/keycloak/data/import/test-realm.json' diff --git a/conf/keycloak/oidc-keycloak-auth-provider.json b/conf/keycloak/oidc-keycloak-auth-provider.json new file mode 100644 index 00000000000..7e01bd4c325 --- /dev/null +++ b/conf/keycloak/oidc-keycloak-auth-provider.json @@ -0,0 +1,8 @@ +{ + "id": "oidc-keycloak", + "factoryAlias": "oidc", + "title": "OIDC-Keycloak", + "subtitle": "OIDC-Keycloak", + "factoryData": "type: oidc | issuer: http://keycloak.mydomain.com:8090/realms/test | clientId: test | clientSecret: 94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8", + "enabled": true +} diff --git a/conf/keycloak/rm-keycloak.sh b/conf/keycloak/rm-keycloak.sh new file mode 100755 index 00000000000..ea29fbb37c0 --- /dev/null +++ b/conf/keycloak/rm-keycloak.sh @@ -0,0 +1,11 @@ +#!/usr/bin/env bash + +if [ "$(docker ps -aq -f name=^/keycloak$)" ]; then + if [ "$(docker ps -aq -f status=running -f name=^/keycloak$)" ]; then + docker kill keycloak + fi + docker rm keycloak + echo "INFO - Keycloak container removed" +else + echo "INFO - No Keycloak container available to remove" +fi diff --git a/conf/keycloak/run-keycloak.sh b/conf/keycloak/run-keycloak.sh new file mode 100755 index 00000000000..9ea34a99416 --- /dev/null +++ b/conf/keycloak/run-keycloak.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash + +DOCKER_IMAGE="quay.io/keycloak/keycloak:26.3.2" +KEYCLOAK_USER="kcadmin" +KEYCLOAK_PASSWORD="kcpassword" +KEYCLOAK_PORT=8090 + +if [ ! "$(docker ps -q -f name=^/keycloak$)" ]; then + if [ "$(docker ps -aq -f status=exited -f name=^/keycloak$)" ]; then + echo "INFO - An exited Keycloak container already exists, restarting..." + docker start keycloak + echo "INFO - Keycloak container restarted" + else + docker run -d --name keycloak -p $KEYCLOAK_PORT:8080 -e KEYCLOAK_USER=$KEYCLOAK_USER -e KEYCLOAK_PASSWORD=$KEYCLOAK_PASSWORD -e KEYCLOAK_IMPORT=/tmp/test-realm.json -v "$(pwd)"/test-realm.json:/tmp/test-realm.json $DOCKER_IMAGE + echo "INFO - Keycloak container created and running" + fi +else + echo "INFO - Keycloak container is already running" +fi diff --git a/conf/keycloak/setup-spi.sh b/conf/keycloak/setup-spi.sh new file mode 100755 index 00000000000..12fac19fc01 --- /dev/null +++ b/conf/keycloak/setup-spi.sh @@ -0,0 +1,44 @@ +#!/bin/sh + +echo "Waiting for Keycloak to be fully up..." + +# Loop until the health check returns 200 +while true; do + RESPONSE=$(curl -s -w "\n%{http_code}" "http://keycloak:9000/health") + HTTP_BODY=$(echo "$RESPONSE" | head -n -1) # Extract response body + HTTP_CODE=$(echo "$RESPONSE" | tail -n 1) # Extract HTTP status code + + if [ "$HTTP_CODE" -eq 200 ]; then + echo "Keycloak is up! (HTTP $HTTP_CODE)" + break + else + echo "Health check failed (HTTP $HTTP_CODE). Response: $HTTP_BODY" + sleep 5 + fi +done + +echo "Keycloak is up and running! Executing SPI setup script..." + +# Obtain admin token +ADMIN_TOKEN=$(curl -s -X POST "http://keycloak:8090/realms/master/protocol/openid-connect/token" \ + -H "Content-Type: application/x-www-form-urlencoded" \ + -d "username=$KEYCLOAK_ADMIN" \ + -d "password=$KEYCLOAK_ADMIN_PASSWORD" \ + -d "grant_type=password" \ + -d "client_id=admin-cli" | jq -r .access_token) + +# Create user storage provider using the components endpoint +curl -X POST "http://keycloak:8090/admin/realms/test/components" \ + -H "Authorization: Bearer $ADMIN_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "Dataverse built-in users authentication", + "providerId": "dv-builtin-users-authenticator", + "providerType": "org.keycloak.storage.UserStorageProvider", + "parentId": null, + "config": { + "datasource": ["user-store"] + } + }' + +echo "Keycloak SPI configured in realm." diff --git a/conf/keycloak/test-realm-include-spi.json b/conf/keycloak/test-realm-include-spi.json new file mode 100644 index 00000000000..3ab6ac2477e --- /dev/null +++ b/conf/keycloak/test-realm-include-spi.json @@ -0,0 +1,2383 @@ +{ + "id": "80a7e04b-a2b5-4891-a2d1-5ad4e915f983", + "realm": "test", + "displayName": "", + "displayNameHtml": "", + "notBefore": 0, + "defaultSignatureAlgorithm": "RS256", + "revokeRefreshToken": false, + "refreshTokenMaxReuse": 0, + "accessTokenLifespan": 300, + "accessTokenLifespanForImplicitFlow": 900, + "ssoSessionIdleTimeout": 1800, + "ssoSessionMaxLifespan": 36000, + "ssoSessionIdleTimeoutRememberMe": 0, + "ssoSessionMaxLifespanRememberMe": 0, + "offlineSessionIdleTimeout": 2592000, + "offlineSessionMaxLifespanEnabled": false, + "offlineSessionMaxLifespan": 5184000, + "clientSessionIdleTimeout": 0, + "clientSessionMaxLifespan": 0, + "clientOfflineSessionIdleTimeout": 0, + "clientOfflineSessionMaxLifespan": 0, + "accessCodeLifespan": 60, + "accessCodeLifespanUserAction": 300, + "accessCodeLifespanLogin": 1800, + "actionTokenGeneratedByAdminLifespan": 43200, + "actionTokenGeneratedByUserLifespan": 300, + "oauth2DeviceCodeLifespan": 600, + "oauth2DevicePollingInterval": 5, + "enabled": true, + "sslRequired": "none", + "registrationAllowed": false, + "registrationEmailAsUsername": false, + "rememberMe": false, + "verifyEmail": false, + "loginWithEmailAllowed": true, + "duplicateEmailsAllowed": false, + "resetPasswordAllowed": false, + "editUsernameAllowed": false, + "bruteForceProtected": false, + "permanentLockout": false, + "maxTemporaryLockouts": 0, + "bruteForceStrategy": "MULTIPLE", + "maxFailureWaitSeconds": 900, + "minimumQuickLoginWaitSeconds": 60, + "waitIncrementSeconds": 60, + "quickLoginCheckMilliSeconds": 1000, + "maxDeltaTimeSeconds": 43200, + "failureFactor": 30, + "roles": { + "realm": [ + { + "id": "075daee1-5ab2-44b5-adbf-fa49a3da8305", + "name": "uma_authorization", + "description": "${role_uma_authorization}", + "composite": false, + "clientRole": false, + "containerId": "80a7e04b-a2b5-4891-a2d1-5ad4e915f983", + "attributes": {} + }, + { + "id": "b4ff9091-ddf9-4536-b175-8cfa3e331d71", + "name": "default-roles-test", + "description": "${role_default-roles}", + "composite": true, + "composites": { + "realm": [ + "offline_access", + "uma_authorization" + ], + "client": { + "account": [ + "view-profile", + "manage-account" + ] + } + }, + "clientRole": false, + "containerId": "80a7e04b-a2b5-4891-a2d1-5ad4e915f983", + "attributes": {} + }, + { + "id": "e6d31555-6be6-4dee-bc6a-40a53108e4c2", + "name": "offline_access", + "description": "${role_offline-access}", + "composite": false, + "clientRole": false, + "containerId": "80a7e04b-a2b5-4891-a2d1-5ad4e915f983", + "attributes": {} + } + ], + "client": { + "realm-management": [ + { + "id": "1955bd12-5f86-4a74-b130-d68a8ef6f0ee", + "name": "impersonation", + "description": "${role_impersonation}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "1109c350-9ab1-426c-9876-ef67d4310f35", + "name": "view-authorization", + "description": "${role_view-authorization}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "980c3fd3-1ae3-4b8f-9a00-d764c939035f", + "name": "query-users", + "description": "${role_query-users}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "5363e601-0f9d-4633-a8c8-28cb0f859b7b", + "name": "query-groups", + "description": "${role_query-groups}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "59aa7992-ad78-48db-868a-25d6e1d7db50", + "name": "realm-admin", + "description": "${role_realm-admin}", + "composite": true, + "composites": { + "client": { + "realm-management": [ + "impersonation", + "view-authorization", + "query-users", + "query-groups", + "manage-clients", + "manage-realm", + "view-identity-providers", + "query-realms", + "manage-authorization", + "manage-identity-providers", + "manage-users", + "view-users", + "view-realm", + "create-client", + "view-clients", + "manage-events", + "query-clients", + "view-events" + ] + } + }, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "112f53c2-897d-4c01-81db-b8dc10c5b995", + "name": "manage-clients", + "description": "${role_manage-clients}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "c7f57bbd-ef32-4a64-9888-7b8abd90777a", + "name": "manage-realm", + "description": "${role_manage-realm}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "8885dac8-0af3-45af-94ce-eff5e801bb80", + "name": "view-identity-providers", + "description": "${role_view-identity-providers}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "2673346c-b0ef-4e01-8a90-be03866093af", + "name": "manage-authorization", + "description": "${role_manage-authorization}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "b7182885-9e57-445f-8dae-17c16eb31b5d", + "name": "manage-identity-providers", + "description": "${role_manage-identity-providers}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "ba7bfe0c-cb07-4a47-b92c-b8132b57e181", + "name": "manage-users", + "description": "${role_manage-users}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "13a8f0fc-647d-4bfe-b525-73956898e550", + "name": "query-realms", + "description": "${role_query-realms}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "ef4c57dc-78c2-4f9a-8d2b-0e97d46fc842", + "name": "view-realm", + "description": "${role_view-realm}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "2875da34-006c-4b7f-bfc8-9ae8e46af3a2", + "name": "view-users", + "description": "${role_view-users}", + "composite": true, + "composites": { + "client": { + "realm-management": [ + "query-users", + "query-groups" + ] + } + }, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "c8c8f7dc-876b-4263-806f-3329f7cd5fd3", + "name": "create-client", + "description": "${role_create-client}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "21b84f90-5a9a-4845-a7ba-bbd98ac0fcc4", + "name": "view-clients", + "description": "${role_view-clients}", + "composite": true, + "composites": { + "client": { + "realm-management": [ + "query-clients" + ] + } + }, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "6fd64c94-d663-4501-ad77-0dcf8887d434", + "name": "manage-events", + "description": "${role_manage-events}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "b321927a-023c-4d2a-99ad-24baf7ff6d83", + "name": "query-clients", + "description": "${role_query-clients}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "2fc21160-78de-457b-8594-e5c76cde1d5e", + "name": "view-events", + "description": "${role_view-events}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + } + ], + "test": [], + "security-admin-console": [], + "admin-cli": [], + "account-console": [], + "broker": [ + { + "id": "07ee59b5-dca6-48fb-83d4-2994ef02850e", + "name": "read-token", + "description": "${role_read-token}", + "composite": false, + "clientRole": true, + "containerId": "b57d62bb-77ff-42bd-b8ff-381c7288f327", + "attributes": {} + } + ], + "account": [ + { + "id": "17d2f811-7bdf-4c73-83b4-1037001797b8", + "name": "view-applications", + "description": "${role_view-applications}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "f4999a71-d4c3-4d7c-afb0-ba522e457bd3", + "name": "view-groups", + "description": "${role_view-groups}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "d1ff44f9-419e-42fd-98e8-1add1169a972", + "name": "delete-account", + "description": "${role_delete-account}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "14c23a18-ae2d-43c9-b0c0-aaf6e0c7f5b0", + "name": "manage-account-links", + "description": "${role_manage-account-links}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "6fbe58af-d2fe-4d66-95fe-a2e8a818cb55", + "name": "view-profile", + "description": "${role_view-profile}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "bdfd02bc-6f6a-47d2-82bc-0ca52d78ff48", + "name": "manage-consent", + "description": "${role_manage-consent}", + "composite": true, + "composites": { + "client": { + "account": [ + "view-consent" + ] + } + }, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "782f3b0c-a17b-4a87-988b-1a711401f3b0", + "name": "manage-account", + "description": "${role_manage-account}", + "composite": true, + "composites": { + "client": { + "account": [ + "manage-account-links" + ] + } + }, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "8a3bfe15-66d9-4f3d-83ac-801d682d42b0", + "name": "view-consent", + "description": "${role_view-consent}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + } + ] + } + }, + "groups": [ + { + "id": "d46f94c2-3b47-4288-b937-9cf918e54f0a", + "name": "admins", + "path": "/admins", + "subGroups": [], + "attributes": {}, + "realmRoles": [], + "clientRoles": {} + }, + { + "id": "e992ce15-baac-48a0-8834-06f6fcf6c05b", + "name": "curators", + "path": "/curators", + "subGroups": [], + "attributes": {}, + "realmRoles": [], + "clientRoles": {} + }, + { + "id": "531cf81d-a700-4336-808f-37a49709b48c", + "name": "members", + "path": "/members", + "subGroups": [], + "attributes": {}, + "realmRoles": [], + "clientRoles": {} + } + ], + "defaultRole": { + "id": "b4ff9091-ddf9-4536-b175-8cfa3e331d71", + "name": "default-roles-test", + "description": "${role_default-roles}", + "composite": true, + "clientRole": false, + "containerId": "80a7e04b-a2b5-4891-a2d1-5ad4e915f983" + }, + "requiredCredentials": [ + "password" + ], + "otpPolicyType": "totp", + "otpPolicyAlgorithm": "HmacSHA1", + "otpPolicyInitialCounter": 0, + "otpPolicyDigits": 6, + "otpPolicyLookAheadWindow": 1, + "otpPolicyPeriod": 30, + "otpPolicyCodeReusable": false, + "otpSupportedApplications": [ + "totpAppFreeOTPName", + "totpAppGoogleName", + "totpAppMicrosoftAuthenticatorName" + ], + "localizationTexts": {}, + "webAuthnPolicyRpEntityName": "keycloak", + "webAuthnPolicySignatureAlgorithms": [ + "ES256" + ], + "webAuthnPolicyRpId": "", + "webAuthnPolicyAttestationConveyancePreference": "not specified", + "webAuthnPolicyAuthenticatorAttachment": "not specified", + "webAuthnPolicyRequireResidentKey": "not specified", + "webAuthnPolicyUserVerificationRequirement": "not specified", + "webAuthnPolicyCreateTimeout": 0, + "webAuthnPolicyAvoidSameAuthenticatorRegister": false, + "webAuthnPolicyAcceptableAaguids": [], + "webAuthnPolicyExtraOrigins": [], + "webAuthnPolicyPasswordlessRpEntityName": "keycloak", + "webAuthnPolicyPasswordlessSignatureAlgorithms": [ + "ES256" + ], + "webAuthnPolicyPasswordlessRpId": "", + "webAuthnPolicyPasswordlessAttestationConveyancePreference": "not specified", + "webAuthnPolicyPasswordlessAuthenticatorAttachment": "not specified", + "webAuthnPolicyPasswordlessRequireResidentKey": "not specified", + "webAuthnPolicyPasswordlessUserVerificationRequirement": "not specified", + "webAuthnPolicyPasswordlessCreateTimeout": 0, + "webAuthnPolicyPasswordlessAvoidSameAuthenticatorRegister": false, + "webAuthnPolicyPasswordlessAcceptableAaguids": [], + "webAuthnPolicyPasswordlessExtraOrigins": [], + "scopeMappings": [ + { + "clientScope": "offline_access", + "roles": [ + "offline_access" + ] + } + ], + "clientScopeMappings": { + "account": [ + { + "client": "account-console", + "roles": [ + "manage-account", + "view-groups" + ] + } + ] + }, + "clients": [ + { + "id": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "clientId": "account", + "name": "${client_account}", + "rootUrl": "${authBaseUrl}", + "baseUrl": "/realms/test/account/", + "surrogateAuthRequired": false, + "enabled": true, + "alwaysDisplayInConsole": false, + "clientAuthenticatorType": "client-secret", + "redirectUris": [ + "/realms/test/account/*" + ], + "webOrigins": [], + "notBefore": 0, + "bearerOnly": false, + "consentRequired": false, + "standardFlowEnabled": true, + "implicitFlowEnabled": false, + "directAccessGrantsEnabled": false, + "serviceAccountsEnabled": false, + "publicClient": true, + "frontchannelLogout": false, + "protocol": "openid-connect", + "attributes": { + "realm_client": "false", + "post.logout.redirect.uris": "+" + }, + "authenticationFlowBindingOverrides": {}, + "fullScopeAllowed": false, + "nodeReRegistrationTimeout": 0, + "defaultClientScopes": [ + "web-origins", + "acr", + "roles", + "profile", + "basic", + "email" + ], + "optionalClientScopes": [ + "address", + "phone", + "offline_access", + "microprofile-jwt" + ] + }, + { + "id": "5d99f721-027c-478d-867d-61114e0a8192", + "clientId": "account-console", + "name": "${client_account-console}", + "rootUrl": "${authBaseUrl}", + "baseUrl": "/realms/test/account/", + "surrogateAuthRequired": false, + "enabled": true, + "alwaysDisplayInConsole": false, + "clientAuthenticatorType": "client-secret", + "redirectUris": [ + "/realms/test/account/*" + ], + "webOrigins": [], + "notBefore": 0, + "bearerOnly": false, + "consentRequired": false, + "standardFlowEnabled": true, + "implicitFlowEnabled": false, + "directAccessGrantsEnabled": false, + "serviceAccountsEnabled": false, + "publicClient": true, + "frontchannelLogout": false, + "protocol": "openid-connect", + "attributes": { + "realm_client": "false", + "post.logout.redirect.uris": "+", + "pkce.code.challenge.method": "S256" + }, + "authenticationFlowBindingOverrides": {}, + "fullScopeAllowed": false, + "nodeReRegistrationTimeout": 0, + "protocolMappers": [ + { + "id": "e181a0ce-9a04-4468-a38a-aaef9f78f989", + "name": "audience resolve", + "protocol": "openid-connect", + "protocolMapper": "oidc-audience-resolve-mapper", + "consentRequired": false, + "config": {} + } + ], + "defaultClientScopes": [ + "web-origins", + "acr", + "roles", + "profile", + "basic", + "email" + ], + "optionalClientScopes": [ + "address", + "phone", + "offline_access", + "microprofile-jwt" + ] + }, + { + "id": "5eccc178-121e-4d0f-bcb2-04ae3c2e52ed", + "clientId": "admin-cli", + "name": "${client_admin-cli}", + "surrogateAuthRequired": false, + "enabled": true, + "alwaysDisplayInConsole": false, + "clientAuthenticatorType": "client-secret", + "redirectUris": [], + "webOrigins": [], + "notBefore": 0, + "bearerOnly": false, + "consentRequired": false, + "standardFlowEnabled": false, + "implicitFlowEnabled": false, + "directAccessGrantsEnabled": true, + "serviceAccountsEnabled": false, + "publicClient": true, + "frontchannelLogout": false, + "protocol": "openid-connect", + "attributes": { + "realm_client": "false", + "client.use.lightweight.access.token.enabled": "true", + "post.logout.redirect.uris": "+" + }, + "authenticationFlowBindingOverrides": {}, + "fullScopeAllowed": true, + "nodeReRegistrationTimeout": 0, + "defaultClientScopes": [ + "web-origins", + "acr", + "roles", + "profile", + "basic", + "email" + ], + "optionalClientScopes": [ + "address", + "phone", + "offline_access", + "microprofile-jwt" + ] + }, + { + "id": "b57d62bb-77ff-42bd-b8ff-381c7288f327", + "clientId": "broker", + "name": "${client_broker}", + "surrogateAuthRequired": false, + "enabled": true, + "alwaysDisplayInConsole": false, + "clientAuthenticatorType": "client-secret", + "redirectUris": [], + "webOrigins": [], + "notBefore": 0, + "bearerOnly": true, + "consentRequired": false, + "standardFlowEnabled": true, + "implicitFlowEnabled": false, + "directAccessGrantsEnabled": false, + "serviceAccountsEnabled": false, + "publicClient": false, + "frontchannelLogout": false, + "protocol": "openid-connect", + "attributes": { + "realm_client": "true", + "post.logout.redirect.uris": "+" + }, + "authenticationFlowBindingOverrides": {}, + "fullScopeAllowed": false, + "nodeReRegistrationTimeout": 0, + "defaultClientScopes": [ + "web-origins", + "acr", + "roles", + "profile", + "email" + ], + "optionalClientScopes": [ + "address", + "phone", + "offline_access", + "microprofile-jwt" + ] + }, + { + "id": "dada0ae8-ee9f-415a-9685-42da7c563660", + "clientId": "realm-management", + "name": "${client_realm-management}", + "surrogateAuthRequired": false, + "enabled": true, + "alwaysDisplayInConsole": false, + "clientAuthenticatorType": "client-secret", + "redirectUris": [], + "webOrigins": [], + "notBefore": 0, + "bearerOnly": true, + "consentRequired": false, + "standardFlowEnabled": true, + "implicitFlowEnabled": false, + "directAccessGrantsEnabled": false, + "serviceAccountsEnabled": false, + "publicClient": false, + "frontchannelLogout": false, + "protocol": "openid-connect", + "attributes": { + "realm_client": "true", + "post.logout.redirect.uris": "+" + }, + "authenticationFlowBindingOverrides": {}, + "fullScopeAllowed": false, + "nodeReRegistrationTimeout": 0, + "defaultClientScopes": [ + "web-origins", + "acr", + "roles", + "profile", + "email" + ], + "optionalClientScopes": [ + "address", + "phone", + "offline_access", + "microprofile-jwt" + ] + }, + { + "id": "bf7cf550-3875-4f97-9878-b2419a854058", + "clientId": "security-admin-console", + "name": "${client_security-admin-console}", + "rootUrl": "${authAdminUrl}", + "baseUrl": "/admin/test/console/", + "surrogateAuthRequired": false, + "enabled": true, + "alwaysDisplayInConsole": false, + "clientAuthenticatorType": "client-secret", + "redirectUris": [ + "/admin/test/console/*" + ], + "webOrigins": [ + "+" + ], + "notBefore": 0, + "bearerOnly": false, + "consentRequired": false, + "standardFlowEnabled": true, + "implicitFlowEnabled": false, + "directAccessGrantsEnabled": false, + "serviceAccountsEnabled": false, + "publicClient": true, + "frontchannelLogout": false, + "protocol": "openid-connect", + "attributes": { + "realm_client": "false", + "client.use.lightweight.access.token.enabled": "true", + "post.logout.redirect.uris": "+", + "pkce.code.challenge.method": "S256" + }, + "authenticationFlowBindingOverrides": {}, + "fullScopeAllowed": true, + "nodeReRegistrationTimeout": 0, + "protocolMappers": [ + { + "id": "ff845e16-e200-4894-ab51-37d8b9f2a445", + "name": "locale", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "locale", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "locale", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + } + ], + "defaultClientScopes": [ + "web-origins", + "acr", + "roles", + "profile", + "basic", + "email" + ], + "optionalClientScopes": [ + "address", + "phone", + "offline_access", + "microprofile-jwt" + ] + }, + { + "id": "9c27faa8-4b8d-4ad9-9cd1-880032ef06aa", + "clientId": "test", + "name": "A Test Client", + "description": "Use for hacking and testing away a confidential client", + "rootUrl": "", + "adminUrl": "", + "baseUrl": "", + "surrogateAuthRequired": false, + "enabled": true, + "secret": "94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8", + "alwaysDisplayInConsole": false, + "clientAuthenticatorType": "client-secret", + "redirectUris": [ + "*" + ], + "webOrigins": [], + "notBefore": 0, + "bearerOnly": false, + "consentRequired": false, + "standardFlowEnabled": true, + "implicitFlowEnabled": false, + "directAccessGrantsEnabled": true, + "serviceAccountsEnabled": false, + "publicClient": true, + "frontchannelLogout": true, + "protocol": "openid-connect", + "attributes": { + "realm_client": "false", + "oidc.ciba.grant.enabled": "false", + "client.secret.creation.time": "1684735831", + "backchannel.logout.session.required": "true", + "post.logout.redirect.uris": "+", + "display.on.consent.screen": "false", + "oauth2.device.authorization.grant.enabled": "false", + "backchannel.logout.revoke.offline.tokens": "false" + }, + "authenticationFlowBindingOverrides": {}, + "fullScopeAllowed": true, + "nodeReRegistrationTimeout": -1, + "protocolMappers": [ + { + "id": "480de80c-656e-401c-b146-f777db45340b", + "name": "idp", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "introspection.token.claim": "true", + "userinfo.token.claim": "true", + "user.attribute": "idp", + "id.token.claim": "true", + "lightweight.claim": "false", + "access.token.claim": "true", + "claim.name": "idp", + "jsonType.label": "String" + } + } + ], + "defaultClientScopes": [ + "web-origins", + "acr", + "roles", + "profile", + "basic", + "email" + ], + "optionalClientScopes": [ + "address", + "phone", + "offline_access", + "microprofile-jwt" + ] + } + ], + "clientScopes": [ + { + "id": "72f29e57-92fa-437b-828c-2b9d6fe56192", + "name": "address", + "description": "OpenID Connect built-in scope: address", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "true", + "consent.screen.text": "${addressScopeConsentText}", + "display.on.consent.screen": "true" + }, + "protocolMappers": [ + { + "id": "59581aea-70d6-4ee8-bec2-1fea5fc497ae", + "name": "address", + "protocol": "openid-connect", + "protocolMapper": "oidc-address-mapper", + "consentRequired": false, + "config": { + "user.attribute.formatted": "formatted", + "user.attribute.country": "country", + "user.attribute.postal_code": "postal_code", + "userinfo.token.claim": "true", + "user.attribute.street": "street", + "id.token.claim": "true", + "user.attribute.region": "region", + "access.token.claim": "true", + "user.attribute.locality": "locality" + } + } + ] + }, + { + "id": "0f4cc9ba-5a00-429f-b90c-611734b324bc", + "name": "service_account", + "description": "Specific scope for a client enabled for service accounts", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "false", + "display.on.consent.screen": "false" + }, + "protocolMappers": [ + { + "id": "35d9b373-4794-4306-b264-336dfc37e726", + "name": "Client ID", + "protocol": "openid-connect", + "protocolMapper": "oidc-usersessionmodel-note-mapper", + "consentRequired": false, + "config": { + "user.session.note": "client_id", + "id.token.claim": "true", + "introspection.token.claim": "true", + "access.token.claim": "true", + "claim.name": "client_id", + "jsonType.label": "String" + } + }, + { + "id": "4be0bf73-e37f-447d-9c65-caec81826c9b", + "name": "Client Host", + "protocol": "openid-connect", + "protocolMapper": "oidc-usersessionmodel-note-mapper", + "consentRequired": false, + "config": { + "user.session.note": "clientHost", + "id.token.claim": "true", + "introspection.token.claim": "true", + "access.token.claim": "true", + "claim.name": "clientHost", + "jsonType.label": "String" + } + }, + { + "id": "179dc118-75cf-408d-9369-49bbde8557d3", + "name": "Client IP Address", + "protocol": "openid-connect", + "protocolMapper": "oidc-usersessionmodel-note-mapper", + "consentRequired": false, + "config": { + "user.session.note": "clientAddress", + "id.token.claim": "true", + "introspection.token.claim": "true", + "access.token.claim": "true", + "claim.name": "clientAddress", + "jsonType.label": "String" + } + } + ] + }, + { + "id": "33f9a8b9-c57b-4d08-ad64-f78649d6ea55", + "name": "basic", + "description": "OpenID Connect scope for add all basic claims to the token", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "false", + "display.on.consent.screen": "false" + }, + "protocolMappers": [ + { + "id": "f033292b-ae53-4ebd-a6b7-e51ee324a2a1", + "name": "auth_time", + "protocol": "openid-connect", + "protocolMapper": "oidc-usersessionmodel-note-mapper", + "consentRequired": false, + "config": { + "user.session.note": "AUTH_TIME", + "id.token.claim": "true", + "introspection.token.claim": "true", + "access.token.claim": "true", + "claim.name": "auth_time", + "jsonType.label": "long" + } + }, + { + "id": "12d4bacb-cac0-4cae-b033-22de18adcdd2", + "name": "sub", + "protocol": "openid-connect", + "protocolMapper": "oidc-sub-mapper", + "consentRequired": false, + "config": { + "introspection.token.claim": "true", + "access.token.claim": "true" + } + } + ] + }, + { + "id": "f515ec81-3c1b-4d4d-b7a2-e7e8d47b6447", + "name": "roles", + "description": "OpenID Connect scope for add user roles to the access token", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "false", + "consent.screen.text": "${rolesScopeConsentText}", + "display.on.consent.screen": "true" + }, + "protocolMappers": [ + { + "id": "26d299a8-69e2-4864-9595-17a5b417fc61", + "name": "realm roles", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-realm-role-mapper", + "consentRequired": false, + "config": { + "user.attribute": "foo", + "access.token.claim": "true", + "claim.name": "realm_access.roles", + "jsonType.label": "String", + "multivalued": "true" + } + }, + { + "id": "d2998083-a8db-4f4e-9aaa-9cad68d65b97", + "name": "audience resolve", + "protocol": "openid-connect", + "protocolMapper": "oidc-audience-resolve-mapper", + "consentRequired": false, + "config": {} + }, + { + "id": "7a4cb2e5-07a0-4c16-a024-71df7ddd6868", + "name": "client roles", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-client-role-mapper", + "consentRequired": false, + "config": { + "user.attribute": "foo", + "access.token.claim": "true", + "claim.name": "resource_access.${client_id}.roles", + "jsonType.label": "String", + "multivalued": "true" + } + } + ] + }, + { + "id": "8f1eafef-92d6-434e-b9ec-6edec1fddd0a", + "name": "offline_access", + "description": "OpenID Connect built-in scope: offline_access", + "protocol": "openid-connect", + "attributes": { + "consent.screen.text": "${offlineAccessScopeConsentText}", + "display.on.consent.screen": "true" + } + }, + { + "id": "c03095aa-b656-447a-9767-0763c2ccb070", + "name": "acr", + "description": "OpenID Connect scope for add acr (authentication context class reference) to the token", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "false", + "display.on.consent.screen": "false" + }, + "protocolMappers": [ + { + "id": "948b230c-56d0-4000-937c-841cd395d3f9", + "name": "acr loa level", + "protocol": "openid-connect", + "protocolMapper": "oidc-acr-mapper", + "consentRequired": false, + "config": { + "id.token.claim": "true", + "access.token.claim": "true", + "userinfo.token.claim": "true" + } + } + ] + }, + { + "id": "cdf35f63-8ec7-41a0-ae12-f05d415818cc", + "name": "phone", + "description": "OpenID Connect built-in scope: phone", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "true", + "consent.screen.text": "${phoneScopeConsentText}", + "display.on.consent.screen": "true" + }, + "protocolMappers": [ + { + "id": "ba4348ff-90b1-4e09-89a8-e5c08b04d3d1", + "name": "phone number", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "phoneNumber", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "phone_number", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "e6cceae5-8392-4348-b302-f610ece6056e", + "name": "phone number verified", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "phoneNumberVerified", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "phone_number_verified", + "jsonType.label": "boolean", + "userinfo.token.claim": "true" + } + } + ] + }, + { + "id": "4318001c-2970-41d3-91b9-e31c08569872", + "name": "email", + "description": "OpenID Connect built-in scope: email", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "true", + "consent.screen.text": "${emailScopeConsentText}", + "display.on.consent.screen": "true" + }, + "protocolMappers": [ + { + "id": "406d02a6-866a-4962-8838-e8c58ada1505", + "name": "email", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-property-mapper", + "consentRequired": false, + "config": { + "user.attribute": "email", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "email", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "33baabc1-9bf2-42e4-8b8e-a53c13f0b744", + "name": "email verified", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-property-mapper", + "consentRequired": false, + "config": { + "user.attribute": "emailVerified", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "email_verified", + "jsonType.label": "boolean", + "userinfo.token.claim": "true" + } + } + ] + }, + { + "id": "5277a84f-d727-4c64-8432-d513127beee1", + "name": "profile", + "description": "OpenID Connect built-in scope: profile", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "true", + "consent.screen.text": "${profileScopeConsentText}", + "display.on.consent.screen": "true" + }, + "protocolMappers": [ + { + "id": "0a609875-2678-4056-93ef-dd5c03e6059d", + "name": "given name", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-property-mapper", + "consentRequired": false, + "config": { + "user.attribute": "firstName", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "given_name", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "7c510d18-07ee-4b78-8acd-24b777d11b3c", + "name": "website", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "website", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "website", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "0bb6d0ea-195f-49e8-918c-c419a26a661c", + "name": "username", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-property-mapper", + "consentRequired": false, + "config": { + "user.attribute": "username", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "preferred_username", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "5f1e644c-1acf-440c-b1a6-b5f65bcebfd9", + "name": "profile", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "profile", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "profile", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "c710bdb2-6cfd-4f60-9c4e-730188fc62f7", + "name": "family name", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-property-mapper", + "consentRequired": false, + "config": { + "user.attribute": "lastName", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "family_name", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "012d5038-0e13-42ba-9df7-2487c8e2eead", + "name": "nickname", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "nickname", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "nickname", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "21590b19-517d-4b6d-92f6-d4f71238677e", + "name": "updated at", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "updatedAt", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "updated_at", + "jsonType.label": "long", + "userinfo.token.claim": "true" + } + }, + { + "id": "e4cddca7-1360-42f3-9854-da6cbe00c71e", + "name": "birthdate", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "birthdate", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "birthdate", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "afee328f-c64c-43e6-80d0-be2721c2ed0e", + "name": "locale", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "locale", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "locale", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "780a1e2c-5b63-46f4-a5bf-dc3fd8ce0cbb", + "name": "full name", + "protocol": "openid-connect", + "protocolMapper": "oidc-full-name-mapper", + "consentRequired": false, + "config": { + "id.token.claim": "true", + "access.token.claim": "true", + "userinfo.token.claim": "true" + } + }, + { + "id": "aeebffff-f776-427e-83ed-064707ffce57", + "name": "zoneinfo", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "zoneinfo", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "zoneinfo", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "b3e840a2-1794-4da1-bf69-31905cbff0d6", + "name": "middle name", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "middleName", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "middle_name", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "0607e0e4-4f7f-4214-996d-3599772ce1c7", + "name": "picture", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "picture", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "picture", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "426a609b-4e28-4132-af0d-13297b8cb63a", + "name": "gender", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-attribute-mapper", + "consentRequired": false, + "config": { + "user.attribute": "gender", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "gender", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + } + ] + }, + { + "id": "a1ebde82-ce21-438f-a3ad-261d3eeb1c01", + "name": "role_list", + "description": "SAML role list", + "protocol": "saml", + "attributes": { + "consent.screen.text": "${samlRoleListScopeConsentText}", + "display.on.consent.screen": "true" + }, + "protocolMappers": [ + { + "id": "64653ac7-7ffc-4f7c-a589-03e3b68bbd25", + "name": "role list", + "protocol": "saml", + "protocolMapper": "saml-role-list-mapper", + "consentRequired": false, + "config": { + "single": "false", + "attribute.nameformat": "Basic", + "attribute.name": "Role" + } + } + ] + }, + { + "id": "aeb5b852-dfec-4e67-9d9e-104abe9b3bf2", + "name": "web-origins", + "description": "OpenID Connect scope for add allowed web origins to the access token", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "false", + "consent.screen.text": "", + "display.on.consent.screen": "false" + }, + "protocolMappers": [ + { + "id": "e2fa8437-a0f1-46fc-af9c-c40fc09cd6a1", + "name": "allowed web origins", + "protocol": "openid-connect", + "protocolMapper": "oidc-allowed-origins-mapper", + "consentRequired": false, + "config": {} + } + ] + }, + { + "id": "4fecd0d7-d4ad-457e-90f2-c7202bf01ff5", + "name": "microprofile-jwt", + "description": "Microprofile - JWT built-in scope", + "protocol": "openid-connect", + "attributes": { + "include.in.token.scope": "true", + "display.on.consent.screen": "false" + }, + "protocolMappers": [ + { + "id": "a9536634-a9f6-4ed5-a8e7-8379d3b002ca", + "name": "upn", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-property-mapper", + "consentRequired": false, + "config": { + "user.attribute": "username", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "upn", + "jsonType.label": "String", + "userinfo.token.claim": "true" + } + }, + { + "id": "2ce1a702-9458-4926-9b8a-f82c07215755", + "name": "groups", + "protocol": "openid-connect", + "protocolMapper": "oidc-usermodel-realm-role-mapper", + "consentRequired": false, + "config": { + "multivalued": "true", + "userinfo.token.claim": "true", + "user.attribute": "foo", + "id.token.claim": "true", + "access.token.claim": "true", + "claim.name": "groups", + "jsonType.label": "String" + } + } + ] + } + ], + "defaultDefaultClientScopes": [ + "role_list", + "profile", + "email", + "roles", + "web-origins", + "acr", + "basic" + ], + "defaultOptionalClientScopes": [ + "offline_access", + "address", + "phone", + "microprofile-jwt" + ], + "browserSecurityHeaders": { + "contentSecurityPolicyReportOnly": "", + "xContentTypeOptions": "nosniff", + "referrerPolicy": "no-referrer", + "xRobotsTag": "none", + "xFrameOptions": "SAMEORIGIN", + "contentSecurityPolicy": "frame-src 'self'; frame-ancestors 'self'; object-src 'none';", + "xXSSProtection": "1; mode=block", + "strictTransportSecurity": "max-age=31536000; includeSubDomains" + }, + "smtpServer": {}, + "eventsEnabled": false, + "eventsListeners": [ + "jboss-logging" + ], + "enabledEventTypes": [], + "adminEventsEnabled": false, + "adminEventsDetailsEnabled": false, + "identityProviders": [], + "identityProviderMappers": [], + "components": { + "org.keycloak.services.clientregistration.policy.ClientRegistrationPolicy": [ + { + "id": "8115796f-8f1f-4d6a-88f8-ca2938451260", + "name": "Allowed Client Scopes", + "providerId": "allowed-client-templates", + "subType": "authenticated", + "subComponents": {}, + "config": { + "allow-default-scopes": [ + "true" + ] + } + }, + { + "id": "044bd055-714d-478e-aa93-303d2161c427", + "name": "Allowed Protocol Mapper Types", + "providerId": "allowed-protocol-mappers", + "subType": "authenticated", + "subComponents": {}, + "config": { + "allowed-protocol-mapper-types": [ + "saml-user-property-mapper", + "oidc-full-name-mapper", + "oidc-address-mapper", + "saml-user-attribute-mapper", + "saml-role-list-mapper", + "oidc-sha256-pairwise-sub-mapper", + "oidc-usermodel-property-mapper", + "oidc-usermodel-attribute-mapper" + ] + } + }, + { + "id": "be465734-3b0f-4370-a144-73db756e23f8", + "name": "Allowed Protocol Mapper Types", + "providerId": "allowed-protocol-mappers", + "subType": "anonymous", + "subComponents": {}, + "config": { + "allowed-protocol-mapper-types": [ + "saml-user-attribute-mapper", + "oidc-address-mapper", + "oidc-full-name-mapper", + "oidc-usermodel-property-mapper", + "oidc-sha256-pairwise-sub-mapper", + "saml-role-list-mapper", + "saml-user-property-mapper", + "oidc-usermodel-attribute-mapper" + ] + } + }, + { + "id": "42a2f64d-ac9e-4221-9cf6-40ff8c868629", + "name": "Trusted Hosts", + "providerId": "trusted-hosts", + "subType": "anonymous", + "subComponents": {}, + "config": { + "host-sending-registration-request-must-match": [ + "true" + ], + "client-uris-must-match": [ + "true" + ] + } + }, + { + "id": "7ca08915-6c33-454c-88f2-20e1d6553b26", + "name": "Max Clients Limit", + "providerId": "max-clients", + "subType": "anonymous", + "subComponents": {}, + "config": { + "max-clients": [ + "200" + ] + } + }, + { + "id": "f01f2b6f-3f01-4d01-b2f4-70577c6f599c", + "name": "Allowed Client Scopes", + "providerId": "allowed-client-templates", + "subType": "anonymous", + "subComponents": {}, + "config": { + "allow-default-scopes": [ + "true" + ] + } + }, + { + "id": "516d7f21-f21a-4690-831e-36ad313093b2", + "name": "Consent Required", + "providerId": "consent-required", + "subType": "anonymous", + "subComponents": {}, + "config": {} + }, + { + "id": "c79df6a0-d4d8-4866-b9e6-8ddb5d1bd38e", + "name": "Full Scope Disabled", + "providerId": "scope", + "subType": "anonymous", + "subComponents": {}, + "config": {} + } + ], + "org.keycloak.storage.UserStorageProvider": [ + { + "id": "6b14f48d-4b83-4d6e-bd8d-dc7541124927", + "name": "Dataverse built-in users authentication", + "providerId": "dv-builtin-users-authenticator", + "subComponents": {}, + "config": { + "datasource": [ + "user-store" + ] + } + } + ], + "org.keycloak.userprofile.UserProfileProvider": [ + { + "id": "cf47a21f-c8fb-42f2-9bff-feca967db183", + "providerId": "declarative-user-profile", + "subComponents": {}, + "config": { + "kc.user.profile.config": [ + "{\"attributes\":[{\"name\":\"username\",\"displayName\":\"${username}\",\"validations\":{\"length\":{\"min\":3,\"max\":255},\"username-prohibited-characters\":{},\"up-username-not-idn-homograph\":{}},\"permissions\":{\"view\":[\"admin\",\"user\"],\"edit\":[\"admin\",\"user\"]},\"multivalued\":false},{\"name\":\"email\",\"displayName\":\"${email}\",\"validations\":{\"email\":{},\"length\":{\"max\":255}},\"required\":{\"roles\":[\"user\"]},\"permissions\":{\"view\":[\"admin\",\"user\"],\"edit\":[\"admin\",\"user\"]},\"multivalued\":false},{\"name\":\"firstName\",\"displayName\":\"${firstName}\",\"validations\":{\"length\":{\"max\":255},\"person-name-prohibited-characters\":{}},\"required\":{\"roles\":[\"user\"]},\"permissions\":{\"view\":[\"admin\",\"user\"],\"edit\":[\"admin\",\"user\"]},\"multivalued\":false},{\"name\":\"lastName\",\"displayName\":\"${lastName}\",\"validations\":{\"length\":{\"max\":255},\"person-name-prohibited-characters\":{}},\"required\":{\"roles\":[\"user\"]},\"permissions\":{\"view\":[\"admin\",\"user\"],\"edit\":[\"admin\",\"user\"]},\"multivalued\":false}],\"groups\":[{\"name\":\"user-metadata\",\"displayHeader\":\"User metadata\",\"displayDescription\":\"Attributes, which refer to user metadata\"}],\"unmanagedAttributePolicy\":\"ENABLED\"}" + ] + } + } + ], + "org.keycloak.keys.KeyProvider": [ + { + "id": "6b4a2281-a9e8-43ab-aee7-190ae91b2842", + "name": "aes-generated", + "providerId": "aes-generated", + "subComponents": {}, + "config": { + "priority": [ + "100" + ] + } + }, + { + "id": "68e2d2b0-4976-480f-ab76-f84a17686b05", + "name": "rsa-enc-generated", + "providerId": "rsa-enc-generated", + "subComponents": {}, + "config": { + "priority": [ + "100" + ], + "algorithm": [ + "RSA-OAEP" + ] + } + }, + { + "id": "728769a3-99a4-4cca-959d-28181dfee7e8", + "name": "rsa-generated", + "providerId": "rsa-generated", + "subComponents": {}, + "config": { + "priority": [ + "100" + ] + } + }, + { + "id": "33274da1-1e5b-4190-bda4-5912dfde073b", + "name": "hmac-generated-hs512", + "providerId": "hmac-generated", + "subComponents": {}, + "config": { + "priority": [ + "100" + ], + "algorithm": [ + "HS512" + ] + } + }, + { + "id": "f30af2d2-d042-43b8-bc6d-22f6bab6934c", + "name": "hmac-generated", + "providerId": "hmac-generated", + "subComponents": {}, + "config": { + "priority": [ + "100" + ], + "algorithm": [ + "HS256" + ] + } + } + ] + }, + "internationalizationEnabled": false, + "authenticationFlows": [ + { + "id": "94c65ba1-ba50-4be2-94c4-de656145eb67", + "alias": "Account verification options", + "description": "Method with which to verity the existing account", + "providerId": "basic-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "idp-email-verification", + "authenticatorFlow": false, + "requirement": "ALTERNATIVE", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticatorFlow": true, + "requirement": "ALTERNATIVE", + "priority": 20, + "autheticatorFlow": true, + "flowAlias": "Verify Existing Account by Re-authentication", + "userSetupAllowed": false + } + ] + }, + { + "id": "9ea0b8f6-882c-45ad-9110-78adf5a5d233", + "alias": "Browser - Conditional OTP", + "description": "Flow to determine if the OTP is required for the authentication", + "providerId": "basic-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "conditional-user-configured", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "auth-otp-form", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 20, + "autheticatorFlow": false, + "userSetupAllowed": false + } + ] + }, + { + "id": "99c5ba83-b585-4601-b740-1a26670bf4e9", + "alias": "Direct Grant - Conditional OTP", + "description": "Flow to determine if the OTP is required for the authentication", + "providerId": "basic-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "conditional-user-configured", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "direct-grant-validate-otp", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 20, + "autheticatorFlow": false, + "userSetupAllowed": false + } + ] + }, + { + "id": "65b73dec-7dd1-4de8-b542-a023b7104afc", + "alias": "First broker login - Conditional OTP", + "description": "Flow to determine if the OTP is required for the authentication", + "providerId": "basic-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "conditional-user-configured", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "auth-otp-form", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 20, + "autheticatorFlow": false, + "userSetupAllowed": false + } + ] + }, + { + "id": "9a26b76f-da95-43f1-8da3-16c4a0654f07", + "alias": "Handle Existing Account", + "description": "Handle what to do if there is existing account with same email/username like authenticated identity provider", + "providerId": "basic-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "idp-confirm-link", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticatorFlow": true, + "requirement": "REQUIRED", + "priority": 20, + "autheticatorFlow": true, + "flowAlias": "Account verification options", + "userSetupAllowed": false + } + ] + }, + { + "id": "0a77285e-d7d5-4b6c-aa9a-3eadb5e7e3d3", + "alias": "Reset - Conditional OTP", + "description": "Flow to determine if the OTP should be reset or not. Set to REQUIRED to force.", + "providerId": "basic-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "conditional-user-configured", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "reset-otp", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 20, + "autheticatorFlow": false, + "userSetupAllowed": false + } + ] + }, + { + "id": "cb6c0b3b-2f5f-4493-9d14-6130f8b58dd7", + "alias": "User creation or linking", + "description": "Flow for the existing/non-existing user alternatives", + "providerId": "basic-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticatorConfig": "create unique user config", + "authenticator": "idp-create-user-if-unique", + "authenticatorFlow": false, + "requirement": "ALTERNATIVE", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticatorFlow": true, + "requirement": "ALTERNATIVE", + "priority": 20, + "autheticatorFlow": true, + "flowAlias": "Handle Existing Account", + "userSetupAllowed": false + } + ] + }, + { + "id": "0fd3db1b-e93d-4768-82ca-a1498ddc11d0", + "alias": "Verify Existing Account by Re-authentication", + "description": "Reauthentication of existing account", + "providerId": "basic-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "idp-username-password-form", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticatorFlow": true, + "requirement": "CONDITIONAL", + "priority": 20, + "autheticatorFlow": true, + "flowAlias": "First broker login - Conditional OTP", + "userSetupAllowed": false + } + ] + }, + { + "id": "86610e70-f9f5-4c11-8a9e-9de1770565fb", + "alias": "browser", + "description": "browser based authentication", + "providerId": "basic-flow", + "topLevel": true, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "auth-cookie", + "authenticatorFlow": false, + "requirement": "ALTERNATIVE", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "auth-spnego", + "authenticatorFlow": false, + "requirement": "DISABLED", + "priority": 20, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "identity-provider-redirector", + "authenticatorFlow": false, + "requirement": "ALTERNATIVE", + "priority": 25, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticatorFlow": true, + "requirement": "ALTERNATIVE", + "priority": 30, + "autheticatorFlow": true, + "flowAlias": "forms", + "userSetupAllowed": false + } + ] + }, + { + "id": "f6aa23dd-8532-4d92-9780-3ea226481e3b", + "alias": "clients", + "description": "Base authentication for clients", + "providerId": "client-flow", + "topLevel": true, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "client-secret", + "authenticatorFlow": false, + "requirement": "ALTERNATIVE", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "client-jwt", + "authenticatorFlow": false, + "requirement": "ALTERNATIVE", + "priority": 20, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "client-secret-jwt", + "authenticatorFlow": false, + "requirement": "ALTERNATIVE", + "priority": 30, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "client-x509", + "authenticatorFlow": false, + "requirement": "ALTERNATIVE", + "priority": 40, + "autheticatorFlow": false, + "userSetupAllowed": false + } + ] + }, + { + "id": "4d2caf65-1703-4ddb-8890-70232e91bcd8", + "alias": "direct grant", + "description": "OpenID Connect Resource Owner Grant", + "providerId": "basic-flow", + "topLevel": true, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "direct-grant-validate-username", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "direct-grant-validate-password", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 20, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticatorFlow": true, + "requirement": "CONDITIONAL", + "priority": 30, + "autheticatorFlow": true, + "flowAlias": "Direct Grant - Conditional OTP", + "userSetupAllowed": false + } + ] + }, + { + "id": "eaa20c41-5334-4fb4-8c45-fb9cc71f7f74", + "alias": "docker auth", + "description": "Used by Docker clients to authenticate against the IDP", + "providerId": "basic-flow", + "topLevel": true, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "docker-http-basic-authenticator", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + } + ] + }, + { + "id": "b9febfb1-f0aa-4590-b782-272a4aa11575", + "alias": "first broker login", + "description": "Actions taken after first broker login with identity provider account, which is not yet linked to any Keycloak account", + "providerId": "basic-flow", + "topLevel": true, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticatorConfig": "review profile config", + "authenticator": "idp-review-profile", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticatorFlow": true, + "requirement": "REQUIRED", + "priority": 20, + "autheticatorFlow": true, + "flowAlias": "User creation or linking", + "userSetupAllowed": false + } + ] + }, + { + "id": "03bb6ff4-eccb-4f2f-8953-3769f78c3bf3", + "alias": "forms", + "description": "Username, password, otp and other auth forms.", + "providerId": "basic-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "auth-username-password-form", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticatorFlow": true, + "requirement": "CONDITIONAL", + "priority": 20, + "autheticatorFlow": true, + "flowAlias": "Browser - Conditional OTP", + "userSetupAllowed": false + } + ] + }, + { + "id": "1022f3c2-0469-41c9-861e-918908f103df", + "alias": "registration", + "description": "registration flow", + "providerId": "basic-flow", + "topLevel": true, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "registration-page-form", + "authenticatorFlow": true, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": true, + "flowAlias": "registration form", + "userSetupAllowed": false + } + ] + }, + { + "id": "00d36c3b-e1dc-41f8-bfd0-5f8c80ea07e8", + "alias": "registration form", + "description": "registration form", + "providerId": "form-flow", + "topLevel": false, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "registration-user-creation", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 20, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "registration-password-action", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 50, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "registration-recaptcha-action", + "authenticatorFlow": false, + "requirement": "DISABLED", + "priority": 60, + "autheticatorFlow": false, + "userSetupAllowed": false + } + ] + }, + { + "id": "4374c16e-8c65-4168-94c2-df1ab3f3e6ad", + "alias": "reset credentials", + "description": "Reset credentials for a user if they forgot their password or something", + "providerId": "basic-flow", + "topLevel": true, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "reset-credentials-choose-user", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "reset-credential-email", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 20, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticator": "reset-password", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 30, + "autheticatorFlow": false, + "userSetupAllowed": false + }, + { + "authenticatorFlow": true, + "requirement": "CONDITIONAL", + "priority": 40, + "autheticatorFlow": true, + "flowAlias": "Reset - Conditional OTP", + "userSetupAllowed": false + } + ] + }, + { + "id": "04d6ed6a-76c9-41fb-9074-bff8a80c2286", + "alias": "saml ecp", + "description": "SAML ECP Profile Authentication Flow", + "providerId": "basic-flow", + "topLevel": true, + "builtIn": true, + "authenticationExecutions": [ + { + "authenticator": "http-basic-authenticator", + "authenticatorFlow": false, + "requirement": "REQUIRED", + "priority": 10, + "autheticatorFlow": false, + "userSetupAllowed": false + } + ] + } + ], + "authenticatorConfig": [ + { + "id": "e7bad67d-1236-430a-a327-9194f9d1e2b0", + "alias": "create unique user config", + "config": { + "require.password.update.after.registration": "false" + } + }, + { + "id": "287b5989-a927-4cf5-8067-74594ce19bc1", + "alias": "review profile config", + "config": { + "update.profile.on.first.login": "missing" + } + } + ], + "requiredActions": [ + { + "alias": "CONFIGURE_TOTP", + "name": "Configure OTP", + "providerId": "CONFIGURE_TOTP", + "enabled": true, + "defaultAction": false, + "priority": 10, + "config": {} + }, + { + "alias": "TERMS_AND_CONDITIONS", + "name": "Terms and Conditions", + "providerId": "TERMS_AND_CONDITIONS", + "enabled": false, + "defaultAction": false, + "priority": 20, + "config": {} + }, + { + "alias": "UPDATE_PASSWORD", + "name": "Update Password", + "providerId": "UPDATE_PASSWORD", + "enabled": true, + "defaultAction": false, + "priority": 30, + "config": {} + }, + { + "alias": "UPDATE_PROFILE", + "name": "Update Profile", + "providerId": "UPDATE_PROFILE", + "enabled": true, + "defaultAction": false, + "priority": 40, + "config": {} + }, + { + "alias": "VERIFY_EMAIL", + "name": "Verify Email", + "providerId": "VERIFY_EMAIL", + "enabled": true, + "defaultAction": false, + "priority": 50, + "config": {} + }, + { + "alias": "delete_account", + "name": "Delete Account", + "providerId": "delete_account", + "enabled": false, + "defaultAction": false, + "priority": 60, + "config": {} + }, + { + "alias": "webauthn-register", + "name": "Webauthn Register", + "providerId": "webauthn-register", + "enabled": true, + "defaultAction": false, + "priority": 70, + "config": {} + }, + { + "alias": "webauthn-register-passwordless", + "name": "Webauthn Register Passwordless", + "providerId": "webauthn-register-passwordless", + "enabled": true, + "defaultAction": false, + "priority": 80, + "config": {} + }, + { + "alias": "delete_credential", + "name": "Delete Credential", + "providerId": "delete_credential", + "enabled": true, + "defaultAction": false, + "priority": 100, + "config": {} + }, + { + "alias": "idp_link", + "name": "Linking Identity Provider", + "providerId": "idp_link", + "enabled": true, + "defaultAction": false, + "priority": 110, + "config": {} + }, + { + "alias": "update_user_locale", + "name": "Update User Locale", + "providerId": "update_user_locale", + "enabled": true, + "defaultAction": false, + "priority": 1000, + "config": {} + } + ], + "browserFlow": "browser", + "registrationFlow": "registration", + "directGrantFlow": "direct grant", + "resetCredentialsFlow": "reset credentials", + "clientAuthenticationFlow": "clients", + "dockerAuthenticationFlow": "docker auth", + "firstBrokerLoginFlow": "first broker login", + "attributes": { + "cibaBackchannelTokenDeliveryMode": "poll", + "cibaAuthRequestedUserHint": "login_hint", + "clientOfflineSessionMaxLifespan": "0", + "oauth2DevicePollingInterval": "5", + "clientSessionIdleTimeout": "0", + "clientOfflineSessionIdleTimeout": "0", + "cibaInterval": "5", + "realmReusableOtpCode": "false", + "cibaExpiresIn": "120", + "oauth2DeviceCodeLifespan": "600", + "parRequestUriLifespan": "60", + "clientSessionMaxLifespan": "0", + "frontendUrl": "" + }, + "keycloakVersion": "26.3.2", + "userManagedAccessAllowed": false, + "organizationsEnabled": false, + "verifiableCredentialsEnabled": false, + "adminPermissionsEnabled": false, + "clientProfiles": { + "profiles": [] + }, + "clientPolicies": { + "policies": [] + } +} diff --git a/conf/keycloak/test-realm.json b/conf/keycloak/test-realm.json new file mode 100644 index 00000000000..42432a17a30 --- /dev/null +++ b/conf/keycloak/test-realm.json @@ -0,0 +1,2063 @@ +{ + "id" : "80a7e04b-a2b5-4891-a2d1-5ad4e915f983", + "realm" : "test", + "displayName" : "", + "displayNameHtml" : "", + "notBefore" : 0, + "defaultSignatureAlgorithm" : "RS256", + "revokeRefreshToken" : false, + "refreshTokenMaxReuse" : 0, + "accessTokenLifespan" : 300, + "accessTokenLifespanForImplicitFlow" : 900, + "ssoSessionIdleTimeout" : 1800, + "ssoSessionMaxLifespan" : 36000, + "ssoSessionIdleTimeoutRememberMe" : 0, + "ssoSessionMaxLifespanRememberMe" : 0, + "offlineSessionIdleTimeout" : 2592000, + "offlineSessionMaxLifespanEnabled" : false, + "offlineSessionMaxLifespan" : 5184000, + "clientSessionIdleTimeout" : 0, + "clientSessionMaxLifespan" : 0, + "clientOfflineSessionIdleTimeout" : 0, + "clientOfflineSessionMaxLifespan" : 0, + "accessCodeLifespan" : 60, + "accessCodeLifespanUserAction" : 300, + "accessCodeLifespanLogin" : 1800, + "actionTokenGeneratedByAdminLifespan" : 43200, + "actionTokenGeneratedByUserLifespan" : 300, + "oauth2DeviceCodeLifespan" : 600, + "oauth2DevicePollingInterval" : 5, + "enabled" : true, + "sslRequired" : "none", + "registrationAllowed" : false, + "registrationEmailAsUsername" : false, + "rememberMe" : false, + "verifyEmail" : false, + "loginWithEmailAllowed" : true, + "duplicateEmailsAllowed" : false, + "resetPasswordAllowed" : false, + "editUsernameAllowed" : false, + "bruteForceProtected" : false, + "permanentLockout" : false, + "maxFailureWaitSeconds" : 900, + "minimumQuickLoginWaitSeconds" : 60, + "waitIncrementSeconds" : 60, + "quickLoginCheckMilliSeconds" : 1000, + "maxDeltaTimeSeconds" : 43200, + "failureFactor" : 30, + "roles": { + "realm": [ + { + "id": "075daee1-5ab2-44b5-adbf-fa49a3da8305", + "name": "uma_authorization", + "description": "${role_uma_authorization}", + "composite": false, + "clientRole": false, + "containerId": "80a7e04b-a2b5-4891-a2d1-5ad4e915f983", + "attributes": {} + }, + { + "id": "b4ff9091-ddf9-4536-b175-8cfa3e331d71", + "name": "default-roles-test", + "description": "${role_default-roles}", + "composite": true, + "composites": { + "realm": [ + "offline_access", + "uma_authorization" + ], + "client": { + "account": [ + "view-profile", + "manage-account" + ] + } + }, + "clientRole": false, + "containerId": "80a7e04b-a2b5-4891-a2d1-5ad4e915f983", + "attributes": {} + }, + { + "id": "131ff85b-0c25-491b-8e13-dde779ec0854", + "name": "admin", + "description": "", + "composite": true, + "composites": { + "client": { + "realm-management": [ + "impersonation", + "view-authorization", + "query-users", + "manage-realm", + "view-identity-providers", + "manage-authorization", + "view-clients", + "manage-events", + "query-clients", + "view-events", + "query-groups", + "realm-admin", + "manage-clients", + "query-realms", + "manage-identity-providers", + "manage-users", + "view-users", + "view-realm", + "create-client" + ], + "broker": [ + "read-token" + ], + "account": [ + "delete-account", + "manage-consent", + "view-consent", + "view-applications", + "view-groups", + "manage-account-links", + "view-profile", + "manage-account" + ] + } + }, + "clientRole": false, + "containerId": "80a7e04b-a2b5-4891-a2d1-5ad4e915f983", + "attributes": {} + }, + { + "id": "e6d31555-6be6-4dee-bc6a-40a53108e4c2", + "name": "offline_access", + "description": "${role_offline-access}", + "composite": false, + "clientRole": false, + "containerId": "80a7e04b-a2b5-4891-a2d1-5ad4e915f983", + "attributes": {} + } + ], + "client": { + "realm-management": [ + { + "id": "1955bd12-5f86-4a74-b130-d68a8ef6f0ee", + "name": "impersonation", + "description": "${role_impersonation}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "1109c350-9ab1-426c-9876-ef67d4310f35", + "name": "view-authorization", + "description": "${role_view-authorization}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "980c3fd3-1ae3-4b8f-9a00-d764c939035f", + "name": "query-users", + "description": "${role_query-users}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "5363e601-0f9d-4633-a8c8-28cb0f859b7b", + "name": "query-groups", + "description": "${role_query-groups}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "59aa7992-ad78-48db-868a-25d6e1d7db50", + "name": "realm-admin", + "description": "${role_realm-admin}", + "composite": true, + "composites": { + "client": { + "realm-management": [ + "impersonation", + "view-authorization", + "query-users", + "query-groups", + "manage-clients", + "manage-realm", + "view-identity-providers", + "query-realms", + "manage-authorization", + "manage-identity-providers", + "manage-users", + "view-users", + "view-realm", + "create-client", + "view-clients", + "manage-events", + "query-clients", + "view-events" + ] + } + }, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "112f53c2-897d-4c01-81db-b8dc10c5b995", + "name": "manage-clients", + "description": "${role_manage-clients}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "c7f57bbd-ef32-4a64-9888-7b8abd90777a", + "name": "manage-realm", + "description": "${role_manage-realm}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "8885dac8-0af3-45af-94ce-eff5e801bb80", + "name": "view-identity-providers", + "description": "${role_view-identity-providers}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "2673346c-b0ef-4e01-8a90-be03866093af", + "name": "manage-authorization", + "description": "${role_manage-authorization}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "b7182885-9e57-445f-8dae-17c16eb31b5d", + "name": "manage-identity-providers", + "description": "${role_manage-identity-providers}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "ba7bfe0c-cb07-4a47-b92c-b8132b57e181", + "name": "manage-users", + "description": "${role_manage-users}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "13a8f0fc-647d-4bfe-b525-73956898e550", + "name": "query-realms", + "description": "${role_query-realms}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "ef4c57dc-78c2-4f9a-8d2b-0e97d46fc842", + "name": "view-realm", + "description": "${role_view-realm}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "2875da34-006c-4b7f-bfc8-9ae8e46af3a2", + "name": "view-users", + "description": "${role_view-users}", + "composite": true, + "composites": { + "client": { + "realm-management": [ + "query-users", + "query-groups" + ] + } + }, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "c8c8f7dc-876b-4263-806f-3329f7cd5fd3", + "name": "create-client", + "description": "${role_create-client}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "21b84f90-5a9a-4845-a7ba-bbd98ac0fcc4", + "name": "view-clients", + "description": "${role_view-clients}", + "composite": true, + "composites": { + "client": { + "realm-management": [ + "query-clients" + ] + } + }, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "6fd64c94-d663-4501-ad77-0dcf8887d434", + "name": "manage-events", + "description": "${role_manage-events}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "b321927a-023c-4d2a-99ad-24baf7ff6d83", + "name": "query-clients", + "description": "${role_query-clients}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + }, + { + "id": "2fc21160-78de-457b-8594-e5c76cde1d5e", + "name": "view-events", + "description": "${role_view-events}", + "composite": false, + "clientRole": true, + "containerId": "dada0ae8-ee9f-415a-9685-42da7c563660", + "attributes": {} + } + ], + "test": [], + "security-admin-console": [], + "admin-cli": [], + "account-console": [], + "broker": [ + { + "id": "07ee59b5-dca6-48fb-83d4-2994ef02850e", + "name": "read-token", + "description": "${role_read-token}", + "composite": false, + "clientRole": true, + "containerId": "b57d62bb-77ff-42bd-b8ff-381c7288f327", + "attributes": {} + } + ], + "account": [ + { + "id": "17d2f811-7bdf-4c73-83b4-1037001797b8", + "name": "view-applications", + "description": "${role_view-applications}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "f5918d56-bd4d-4035-8fa7-8622075ed690", + "name": "view-groups", + "description": "${role_view-groups}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "d1ff44f9-419e-42fd-98e8-1add1169a972", + "name": "delete-account", + "description": "${role_delete-account}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "14c23a18-ae2d-43c9-b0c0-aaf6e0c7f5b0", + "name": "manage-account-links", + "description": "${role_manage-account-links}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "6fbe58af-d2fe-4d66-95fe-a2e8a818cb55", + "name": "view-profile", + "description": "${role_view-profile}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "bdfd02bc-6f6a-47d2-82bc-0ca52d78ff48", + "name": "manage-consent", + "description": "${role_manage-consent}", + "composite": true, + "composites": { + "client": { + "account": [ + "view-consent" + ] + } + }, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "782f3b0c-a17b-4a87-988b-1a711401f3b0", + "name": "manage-account", + "description": "${role_manage-account}", + "composite": true, + "composites": { + "client": { + "account": [ + "manage-account-links" + ] + } + }, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + }, + { + "id": "8a3bfe15-66d9-4f3d-83ac-801d682d42b0", + "name": "view-consent", + "description": "${role_view-consent}", + "composite": false, + "clientRole": true, + "containerId": "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "attributes": {} + } + ] + } + }, + "groups" : [ { + "id" : "d46f94c2-3b47-4288-b937-9cf918e54f0a", + "name" : "admins", + "path" : "/admins", + "attributes" : { }, + "realmRoles" : [ ], + "clientRoles" : { }, + "subGroups" : [ ] + }, { + "id" : "e992ce15-baac-48a0-8834-06f6fcf6c05b", + "name" : "curators", + "path" : "/curators", + "attributes" : { }, + "realmRoles" : [ ], + "clientRoles" : { }, + "subGroups" : [ ] + }, { + "id" : "531cf81d-a700-4336-808f-37a49709b48c", + "name" : "members", + "path" : "/members", + "attributes" : { }, + "realmRoles" : [ ], + "clientRoles" : { }, + "subGroups" : [ ] + } ], + "defaultRole" : { + "id" : "b4ff9091-ddf9-4536-b175-8cfa3e331d71", + "name" : "default-roles-test", + "description" : "${role_default-roles}", + "composite" : true, + "clientRole" : false, + "containerId" : "80a7e04b-a2b5-4891-a2d1-5ad4e915f983" + }, + "requiredCredentials" : [ "password" ], + "otpPolicyType" : "totp", + "otpPolicyAlgorithm" : "HmacSHA1", + "otpPolicyInitialCounter" : 0, + "otpPolicyDigits" : 6, + "otpPolicyLookAheadWindow" : 1, + "otpPolicyPeriod" : 30, + "otpSupportedApplications" : [ "FreeOTP", "Google Authenticator" ], + "webAuthnPolicyRpEntityName" : "keycloak", + "webAuthnPolicySignatureAlgorithms" : [ "ES256" ], + "webAuthnPolicyRpId" : "", + "webAuthnPolicyAttestationConveyancePreference" : "not specified", + "webAuthnPolicyAuthenticatorAttachment" : "not specified", + "webAuthnPolicyRequireResidentKey" : "not specified", + "webAuthnPolicyUserVerificationRequirement" : "not specified", + "webAuthnPolicyCreateTimeout" : 0, + "webAuthnPolicyAvoidSameAuthenticatorRegister" : false, + "webAuthnPolicyAcceptableAaguids" : [ ], + "webAuthnPolicyPasswordlessRpEntityName" : "keycloak", + "webAuthnPolicyPasswordlessSignatureAlgorithms" : [ "ES256" ], + "webAuthnPolicyPasswordlessRpId" : "", + "webAuthnPolicyPasswordlessAttestationConveyancePreference" : "not specified", + "webAuthnPolicyPasswordlessAuthenticatorAttachment" : "not specified", + "webAuthnPolicyPasswordlessRequireResidentKey" : "not specified", + "webAuthnPolicyPasswordlessUserVerificationRequirement" : "not specified", + "webAuthnPolicyPasswordlessCreateTimeout" : 0, + "webAuthnPolicyPasswordlessAvoidSameAuthenticatorRegister" : false, + "webAuthnPolicyPasswordlessAcceptableAaguids" : [ ], + "users" : [ { + "id" : "52cddd46-251c-4534-acc8-0580eeafb577", + "createdTimestamp" : 1684736014759, + "username" : "admin", + "enabled" : true, + "totp" : false, + "emailVerified" : true, + "firstName" : "Dataverse", + "lastName" : "Admin", + "email" : "dataverse-admin@mailinator.com", + "credentials" : [ { + "id" : "28f1ece7-26fb-40f1-9174-5ffce7b85c0a", + "type" : "password", + "userLabel" : "Set to \"admin\"", + "createdDate" : 1684736057302, + "secretData" : "{\"value\":\"ONI7fl6BmooVTUgwN1W3m7hsRjMAYEr2l+Fp5+7IOYw1iIntwvZ3U3W0ZBcCFJ7uhcKqF101+rueM3dZfoshPQ==\",\"salt\":\"Hj7co7zYVei7xwx8EaYP3A==\",\"additionalParameters\":{}}", + "credentialData" : "{\"hashIterations\":27500,\"algorithm\":\"pbkdf2-sha256\",\"additionalParameters\":{}}" + } ], + "disableableCredentialTypes" : [ ], + "requiredActions" : [ ], + "realmRoles" : [ "default-roles-test", "admin" ], + "notBefore" : 0, + "groups" : [ "/admins" ] + }, { + "id" : "a3d8e76d-7e7b-42dc-bbd7-4258818a8a1b", + "createdTimestamp" : 1684755806552, + "username" : "affiliate", + "enabled" : true, + "totp" : false, + "emailVerified" : true, + "firstName" : "Dataverse", + "lastName" : "Affiliate", + "email" : "dataverse-affiliate@mailinator.com", + "credentials" : [ { + "id" : "31c8eb1e-b2a8-4f86-833b-7c0536cd61a1", + "type" : "password", + "userLabel" : "My password", + "createdDate" : 1684755821743, + "secretData" : "{\"value\":\"T+RQ4nvmjknj7ds8NU7782j6PJ++uCu98zNoDQjIe9IKXah+13q4EcXO9IHmi2BJ7lgT0OIzwIoac4JEQLxhjQ==\",\"salt\":\"fnRmE9WmjAp4tlvGh/bxxQ==\",\"additionalParameters\":{}}", + "credentialData" : "{\"hashIterations\":27500,\"algorithm\":\"pbkdf2-sha256\",\"additionalParameters\":{}}" + } ], + "disableableCredentialTypes" : [ ], + "requiredActions" : [ ], + "realmRoles" : [ "default-roles-test" ], + "notBefore" : 0, + "groups" : [ ] + }, { + "id" : "e5531496-cfb8-498c-a902-50c98d649e79", + "createdTimestamp" : 1684755721064, + "username" : "curator", + "enabled" : true, + "totp" : false, + "emailVerified" : true, + "firstName" : "Dataverse", + "lastName" : "Curator", + "email" : "dataverse-curator@mailinator.com", + "credentials" : [ { + "id" : "664546b4-b936-45cf-a4cf-5e98b743fc7f", + "type" : "password", + "userLabel" : "My password", + "createdDate" : 1684755740776, + "secretData" : "{\"value\":\"AvVqybCNtCBVAdLEeJKresy9tc3c4BBUQvu5uHVQw4IjVagN6FpKGlDEKOrxhzdSM8skEvthOEqJkloPo1w+NQ==\",\"salt\":\"2em2DDRRlNEYsNR3xDqehw==\",\"additionalParameters\":{}}", + "credentialData" : "{\"hashIterations\":27500,\"algorithm\":\"pbkdf2-sha256\",\"additionalParameters\":{}}" + } ], + "disableableCredentialTypes" : [ ], + "requiredActions" : [ ], + "realmRoles" : [ "default-roles-test" ], + "notBefore" : 0, + "groups" : [ "/curators" ] + }, { + "id" : "c0082e7e-a3e9-45e6-95e9-811a34adce9d", + "createdTimestamp" : 1684755585802, + "username" : "user", + "enabled" : true, + "totp" : false, + "emailVerified" : true, + "firstName" : "Dataverse", + "lastName" : "User", + "email" : "dataverse-user@mailinator.com", + "credentials" : [ { + "id" : "00d6d67f-2e30-4da6-a567-bec38a1886a0", + "type" : "password", + "userLabel" : "My password", + "createdDate" : 1684755599597, + "secretData" : "{\"value\":\"z991rnjznAgosi5nX962HjM8/gN5GLJTdrlvi6G9cj8470X2/oZUb4Lka6s8xImgtEloCgWiKqH0EH9G4Y3a5A==\",\"salt\":\"/Uz7w+2IqDo+fQUGqxjVHw==\",\"additionalParameters\":{}}", + "credentialData" : "{\"hashIterations\":27500,\"algorithm\":\"pbkdf2-sha256\",\"additionalParameters\":{}}" + } ], + "disableableCredentialTypes" : [ ], + "requiredActions" : [ ], + "realmRoles" : [ "default-roles-test" ], + "notBefore" : 0, + "groups" : [ "/members" ] + } ], + "scopeMappings" : [ { + "clientScope" : "offline_access", + "roles" : [ "offline_access" ] + } ], + "clientScopeMappings" : { + "account" : [ { + "client" : "account-console", + "roles" : [ "manage-account" ] + } ] + }, + "clients" : [ { + "id" : "77f8127a-261e-4cd8-a77d-b74a389f7fd4", + "clientId" : "account", + "name" : "${client_account}", + "rootUrl" : "${authBaseUrl}", + "baseUrl" : "/realms/test/account/", + "surrogateAuthRequired" : false, + "enabled" : true, + "alwaysDisplayInConsole" : false, + "clientAuthenticatorType" : "client-secret", + "redirectUris" : [ "/realms/test/account/*" ], + "webOrigins" : [ ], + "notBefore" : 0, + "bearerOnly" : false, + "consentRequired" : false, + "standardFlowEnabled" : true, + "implicitFlowEnabled" : false, + "directAccessGrantsEnabled" : false, + "serviceAccountsEnabled" : false, + "publicClient" : true, + "frontchannelLogout" : false, + "protocol" : "openid-connect", + "attributes" : { + "post.logout.redirect.uris" : "+" + }, + "authenticationFlowBindingOverrides" : { }, + "fullScopeAllowed" : false, + "nodeReRegistrationTimeout" : 0, + "defaultClientScopes" : [ "web-origins", "acr", "roles", "profile", "email" ], + "optionalClientScopes" : [ "address", "phone", "offline_access", "microprofile-jwt" ] + }, { + "id" : "5d99f721-027c-478d-867d-61114e0a8192", + "clientId" : "account-console", + "name" : "${client_account-console}", + "rootUrl" : "${authBaseUrl}", + "baseUrl" : "/realms/test/account/", + "surrogateAuthRequired" : false, + "enabled" : true, + "alwaysDisplayInConsole" : false, + "clientAuthenticatorType" : "client-secret", + "redirectUris" : [ "/realms/test/account/*" ], + "webOrigins" : [ ], + "notBefore" : 0, + "bearerOnly" : false, + "consentRequired" : false, + "standardFlowEnabled" : true, + "implicitFlowEnabled" : false, + "directAccessGrantsEnabled" : false, + "serviceAccountsEnabled" : false, + "publicClient" : true, + "frontchannelLogout" : false, + "protocol" : "openid-connect", + "attributes" : { + "post.logout.redirect.uris" : "+", + "pkce.code.challenge.method" : "S256" + }, + "authenticationFlowBindingOverrides" : { }, + "fullScopeAllowed" : false, + "nodeReRegistrationTimeout" : 0, + "protocolMappers" : [ { + "id" : "e181a0ce-9a04-4468-a38a-aaef9f78f989", + "name" : "audience resolve", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-audience-resolve-mapper", + "consentRequired" : false, + "config" : { } + } ], + "defaultClientScopes" : [ "web-origins", "acr", "roles", "profile", "email" ], + "optionalClientScopes" : [ "address", "phone", "offline_access", "microprofile-jwt" ] + }, { + "id" : "5eccc178-121e-4d0f-bcb2-04ae3c2e52ed", + "clientId" : "admin-cli", + "name" : "${client_admin-cli}", + "surrogateAuthRequired" : false, + "enabled" : true, + "alwaysDisplayInConsole" : false, + "clientAuthenticatorType" : "client-secret", + "redirectUris" : [ ], + "webOrigins" : [ ], + "notBefore" : 0, + "bearerOnly" : false, + "consentRequired" : false, + "standardFlowEnabled" : false, + "implicitFlowEnabled" : false, + "directAccessGrantsEnabled" : true, + "serviceAccountsEnabled" : false, + "publicClient" : true, + "frontchannelLogout" : false, + "protocol" : "openid-connect", + "attributes" : { }, + "authenticationFlowBindingOverrides" : { }, + "fullScopeAllowed" : false, + "nodeReRegistrationTimeout" : 0, + "defaultClientScopes" : [ "web-origins", "acr", "roles", "profile", "email" ], + "optionalClientScopes" : [ "address", "phone", "offline_access", "microprofile-jwt" ] + }, { + "id" : "b57d62bb-77ff-42bd-b8ff-381c7288f327", + "clientId" : "broker", + "name" : "${client_broker}", + "surrogateAuthRequired" : false, + "enabled" : true, + "alwaysDisplayInConsole" : false, + "clientAuthenticatorType" : "client-secret", + "redirectUris" : [ ], + "webOrigins" : [ ], + "notBefore" : 0, + "bearerOnly" : true, + "consentRequired" : false, + "standardFlowEnabled" : true, + "implicitFlowEnabled" : false, + "directAccessGrantsEnabled" : false, + "serviceAccountsEnabled" : false, + "publicClient" : false, + "frontchannelLogout" : false, + "protocol" : "openid-connect", + "attributes" : { }, + "authenticationFlowBindingOverrides" : { }, + "fullScopeAllowed" : false, + "nodeReRegistrationTimeout" : 0, + "defaultClientScopes" : [ "web-origins", "acr", "roles", "profile", "email" ], + "optionalClientScopes" : [ "address", "phone", "offline_access", "microprofile-jwt" ] + }, { + "id" : "dada0ae8-ee9f-415a-9685-42da7c563660", + "clientId" : "realm-management", + "name" : "${client_realm-management}", + "surrogateAuthRequired" : false, + "enabled" : true, + "alwaysDisplayInConsole" : false, + "clientAuthenticatorType" : "client-secret", + "redirectUris" : [ ], + "webOrigins" : [ ], + "notBefore" : 0, + "bearerOnly" : true, + "consentRequired" : false, + "standardFlowEnabled" : true, + "implicitFlowEnabled" : false, + "directAccessGrantsEnabled" : false, + "serviceAccountsEnabled" : false, + "publicClient" : false, + "frontchannelLogout" : false, + "protocol" : "openid-connect", + "attributes" : { }, + "authenticationFlowBindingOverrides" : { }, + "fullScopeAllowed" : false, + "nodeReRegistrationTimeout" : 0, + "defaultClientScopes" : [ "web-origins", "acr", "roles", "profile", "email" ], + "optionalClientScopes" : [ "address", "phone", "offline_access", "microprofile-jwt" ] + }, { + "id" : "bf7cf550-3875-4f97-9878-b2419a854058", + "clientId" : "security-admin-console", + "name" : "${client_security-admin-console}", + "rootUrl" : "${authAdminUrl}", + "baseUrl" : "/admin/test/console/", + "surrogateAuthRequired" : false, + "enabled" : true, + "alwaysDisplayInConsole" : false, + "clientAuthenticatorType" : "client-secret", + "redirectUris" : [ "/admin/test/console/*" ], + "webOrigins" : [ "+" ], + "notBefore" : 0, + "bearerOnly" : false, + "consentRequired" : false, + "standardFlowEnabled" : true, + "implicitFlowEnabled" : false, + "directAccessGrantsEnabled" : false, + "serviceAccountsEnabled" : false, + "publicClient" : true, + "frontchannelLogout" : false, + "protocol" : "openid-connect", + "attributes" : { + "post.logout.redirect.uris" : "+", + "pkce.code.challenge.method" : "S256" + }, + "authenticationFlowBindingOverrides" : { }, + "fullScopeAllowed" : false, + "nodeReRegistrationTimeout" : 0, + "protocolMappers" : [ { + "id" : "ff845e16-e200-4894-ab51-37d8b9f2a445", + "name" : "locale", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "locale", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "locale", + "jsonType.label" : "String" + } + } ], + "defaultClientScopes" : [ "web-origins", "acr", "roles", "profile", "email" ], + "optionalClientScopes" : [ "address", "phone", "offline_access", "microprofile-jwt" ] + }, { + "id" : "9c27faa8-4b8d-4ad9-9cd1-880032ef06aa", + "clientId" : "test", + "name" : "A Test Client", + "description" : "Use for hacking and testing away a confidential client", + "rootUrl" : "", + "adminUrl" : "", + "baseUrl" : "", + "surrogateAuthRequired" : false, + "enabled" : true, + "alwaysDisplayInConsole" : false, + "clientAuthenticatorType" : "client-secret", + "secret" : "94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8", + "redirectUris" : [ "*" ], + "webOrigins" : [ ], + "notBefore" : 0, + "bearerOnly" : false, + "consentRequired" : false, + "standardFlowEnabled" : true, + "implicitFlowEnabled" : false, + "directAccessGrantsEnabled" : true, + "serviceAccountsEnabled" : false, + "publicClient" : false, + "frontchannelLogout" : true, + "protocol" : "openid-connect", + "attributes" : { + "oidc.ciba.grant.enabled" : "false", + "client.secret.creation.time" : "1684735831", + "backchannel.logout.session.required" : "true", + "display.on.consent.screen" : "false", + "oauth2.device.authorization.grant.enabled" : "false", + "backchannel.logout.revoke.offline.tokens" : "false" + }, + "authenticationFlowBindingOverrides" : { }, + "fullScopeAllowed" : true, + "nodeReRegistrationTimeout" : -1, + "defaultClientScopes" : [ "web-origins", "acr", "roles", "profile", "email" ], + "optionalClientScopes" : [ "address", "phone", "offline_access", "microprofile-jwt" ] + } ], + "clientScopes" : [ { + "id" : "72f29e57-92fa-437b-828c-2b9d6fe56192", + "name" : "address", + "description" : "OpenID Connect built-in scope: address", + "protocol" : "openid-connect", + "attributes" : { + "include.in.token.scope" : "true", + "display.on.consent.screen" : "true", + "consent.screen.text" : "${addressScopeConsentText}" + }, + "protocolMappers" : [ { + "id" : "59581aea-70d6-4ee8-bec2-1fea5fc497ae", + "name" : "address", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-address-mapper", + "consentRequired" : false, + "config" : { + "user.attribute.formatted" : "formatted", + "user.attribute.country" : "country", + "user.attribute.postal_code" : "postal_code", + "userinfo.token.claim" : "true", + "user.attribute.street" : "street", + "id.token.claim" : "true", + "user.attribute.region" : "region", + "access.token.claim" : "true", + "user.attribute.locality" : "locality" + } + } ] + }, { + "id" : "f515ec81-3c1b-4d4d-b7a2-e7e8d47b6447", + "name" : "roles", + "description" : "OpenID Connect scope for add user roles to the access token", + "protocol" : "openid-connect", + "attributes" : { + "include.in.token.scope" : "false", + "display.on.consent.screen" : "true", + "consent.screen.text" : "${rolesScopeConsentText}" + }, + "protocolMappers" : [ { + "id" : "26d299a8-69e2-4864-9595-17a5b417fc61", + "name" : "realm roles", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-realm-role-mapper", + "consentRequired" : false, + "config" : { + "user.attribute" : "foo", + "access.token.claim" : "true", + "claim.name" : "realm_access.roles", + "jsonType.label" : "String", + "multivalued" : "true" + } + }, { + "id" : "d2998083-a8db-4f4e-9aaa-9cad68d65b97", + "name" : "audience resolve", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-audience-resolve-mapper", + "consentRequired" : false, + "config" : { } + }, { + "id" : "7a4cb2e5-07a0-4c16-a024-71df7ddd6868", + "name" : "client roles", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-client-role-mapper", + "consentRequired" : false, + "config" : { + "user.attribute" : "foo", + "access.token.claim" : "true", + "claim.name" : "resource_access.${client_id}.roles", + "jsonType.label" : "String", + "multivalued" : "true" + } + } ] + }, { + "id" : "8f1eafef-92d6-434e-b9ec-6edec1fddd0a", + "name" : "offline_access", + "description" : "OpenID Connect built-in scope: offline_access", + "protocol" : "openid-connect", + "attributes" : { + "consent.screen.text" : "${offlineAccessScopeConsentText}", + "display.on.consent.screen" : "true" + } + }, { + "id" : "c03095aa-b656-447a-9767-0763c2ccb070", + "name" : "acr", + "description" : "OpenID Connect scope for add acr (authentication context class reference) to the token", + "protocol" : "openid-connect", + "attributes" : { + "include.in.token.scope" : "false", + "display.on.consent.screen" : "false" + }, + "protocolMappers" : [ { + "id" : "948b230c-56d0-4000-937c-841cd395d3f9", + "name" : "acr loa level", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-acr-mapper", + "consentRequired" : false, + "config" : { + "id.token.claim" : "true", + "access.token.claim" : "true" + } + } ] + }, { + "id" : "cdf35f63-8ec7-41a0-ae12-f05d415818cc", + "name" : "phone", + "description" : "OpenID Connect built-in scope: phone", + "protocol" : "openid-connect", + "attributes" : { + "include.in.token.scope" : "true", + "display.on.consent.screen" : "true", + "consent.screen.text" : "${phoneScopeConsentText}" + }, + "protocolMappers" : [ { + "id" : "ba4348ff-90b1-4e09-89a8-e5c08b04d3d1", + "name" : "phone number", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "phoneNumber", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "phone_number", + "jsonType.label" : "String" + } + }, { + "id" : "e6cceae5-8392-4348-b302-f610ece6056e", + "name" : "phone number verified", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "phoneNumberVerified", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "phone_number_verified", + "jsonType.label" : "boolean" + } + } ] + }, { + "id" : "4318001c-2970-41d3-91b9-e31c08569872", + "name" : "email", + "description" : "OpenID Connect built-in scope: email", + "protocol" : "openid-connect", + "attributes" : { + "include.in.token.scope" : "true", + "display.on.consent.screen" : "true", + "consent.screen.text" : "${emailScopeConsentText}" + }, + "protocolMappers" : [ { + "id" : "406d02a6-866a-4962-8838-e8c58ada1505", + "name" : "email", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-property-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "email", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "email", + "jsonType.label" : "String" + } + }, { + "id" : "33baabc1-9bf2-42e4-8b8e-a53c13f0b744", + "name" : "email verified", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-property-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "emailVerified", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "email_verified", + "jsonType.label" : "boolean" + } + } ] + }, { + "id" : "5277a84f-d727-4c64-8432-d513127beee1", + "name" : "profile", + "description" : "OpenID Connect built-in scope: profile", + "protocol" : "openid-connect", + "attributes" : { + "include.in.token.scope" : "true", + "display.on.consent.screen" : "true", + "consent.screen.text" : "${profileScopeConsentText}" + }, + "protocolMappers" : [ { + "id" : "0a609875-2678-4056-93ef-dd5c03e6059d", + "name" : "given name", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-property-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "firstName", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "given_name", + "jsonType.label" : "String" + } + }, { + "id" : "7c510d18-07ee-4b78-8acd-24b777d11b3c", + "name" : "website", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "website", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "website", + "jsonType.label" : "String" + } + }, { + "id" : "0bb6d0ea-195f-49e8-918c-c419a26a661c", + "name" : "username", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-property-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "username", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "preferred_username", + "jsonType.label" : "String" + } + }, { + "id" : "5f1e644c-1acf-440c-b1a6-b5f65bcebfd9", + "name" : "profile", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "profile", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "profile", + "jsonType.label" : "String" + } + }, { + "id" : "c710bdb2-6cfd-4f60-9c4e-730188fc62f7", + "name" : "family name", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-property-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "lastName", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "family_name", + "jsonType.label" : "String" + } + }, { + "id" : "012d5038-0e13-42ba-9df7-2487c8e2eead", + "name" : "nickname", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "nickname", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "nickname", + "jsonType.label" : "String" + } + }, { + "id" : "21590b19-517d-4b6d-92f6-d4f71238677e", + "name" : "updated at", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "updatedAt", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "updated_at", + "jsonType.label" : "long" + } + }, { + "id" : "e4cddca7-1360-42f3-9854-da6cbe00c71e", + "name" : "birthdate", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "birthdate", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "birthdate", + "jsonType.label" : "String" + } + }, { + "id" : "afee328f-c64c-43e6-80d0-be2721c2ed0e", + "name" : "locale", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "locale", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "locale", + "jsonType.label" : "String" + } + }, { + "id" : "780a1e2c-5b63-46f4-a5bf-dc3fd8ce0cbb", + "name" : "full name", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-full-name-mapper", + "consentRequired" : false, + "config" : { + "id.token.claim" : "true", + "access.token.claim" : "true", + "userinfo.token.claim" : "true" + } + }, { + "id" : "aeebffff-f776-427e-83ed-064707ffce57", + "name" : "zoneinfo", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "zoneinfo", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "zoneinfo", + "jsonType.label" : "String" + } + }, { + "id" : "b3e840a2-1794-4da1-bf69-31905cbff0d6", + "name" : "middle name", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "middleName", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "middle_name", + "jsonType.label" : "String" + } + }, { + "id" : "0607e0e4-4f7f-4214-996d-3599772ce1c7", + "name" : "picture", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "picture", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "picture", + "jsonType.label" : "String" + } + }, { + "id" : "426a609b-4e28-4132-af0d-13297b8cb63a", + "name" : "gender", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-attribute-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "gender", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "gender", + "jsonType.label" : "String" + } + } ] + }, { + "id" : "a1ebde82-ce21-438f-a3ad-261d3eeb1c01", + "name" : "role_list", + "description" : "SAML role list", + "protocol" : "saml", + "attributes" : { + "consent.screen.text" : "${samlRoleListScopeConsentText}", + "display.on.consent.screen" : "true" + }, + "protocolMappers" : [ { + "id" : "64653ac7-7ffc-4f7c-a589-03e3b68bbd25", + "name" : "role list", + "protocol" : "saml", + "protocolMapper" : "saml-role-list-mapper", + "consentRequired" : false, + "config" : { + "single" : "false", + "attribute.nameformat" : "Basic", + "attribute.name" : "Role" + } + } ] + }, { + "id" : "aeb5b852-dfec-4e67-9d9e-104abe9b3bf2", + "name" : "web-origins", + "description" : "OpenID Connect scope for add allowed web origins to the access token", + "protocol" : "openid-connect", + "attributes" : { + "include.in.token.scope" : "false", + "display.on.consent.screen" : "false", + "consent.screen.text" : "" + }, + "protocolMappers" : [ { + "id" : "e2fa8437-a0f1-46fc-af9c-c40fc09cd6a1", + "name" : "allowed web origins", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-allowed-origins-mapper", + "consentRequired" : false, + "config" : { } + } ] + }, { + "id" : "4fecd0d7-d4ad-457e-90f2-c7202bf01ff5", + "name" : "microprofile-jwt", + "description" : "Microprofile - JWT built-in scope", + "protocol" : "openid-connect", + "attributes" : { + "include.in.token.scope" : "true", + "display.on.consent.screen" : "false" + }, + "protocolMappers" : [ { + "id" : "a9536634-a9f6-4ed5-a8e7-8379d3b002ca", + "name" : "upn", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-property-mapper", + "consentRequired" : false, + "config" : { + "userinfo.token.claim" : "true", + "user.attribute" : "username", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "upn", + "jsonType.label" : "String" + } + }, { + "id" : "2ce1a702-9458-4926-9b8a-f82c07215755", + "name" : "groups", + "protocol" : "openid-connect", + "protocolMapper" : "oidc-usermodel-realm-role-mapper", + "consentRequired" : false, + "config" : { + "multivalued" : "true", + "user.attribute" : "foo", + "id.token.claim" : "true", + "access.token.claim" : "true", + "claim.name" : "groups", + "jsonType.label" : "String" + } + } ] + } ], + "defaultDefaultClientScopes" : [ "role_list", "profile", "email", "roles", "web-origins", "acr" ], + "defaultOptionalClientScopes" : [ "offline_access", "address", "phone", "microprofile-jwt" ], + "browserSecurityHeaders" : { + "contentSecurityPolicyReportOnly" : "", + "xContentTypeOptions" : "nosniff", + "xRobotsTag" : "none", + "xFrameOptions" : "SAMEORIGIN", + "contentSecurityPolicy" : "frame-src 'self'; frame-ancestors 'self'; object-src 'none';", + "xXSSProtection" : "1; mode=block", + "strictTransportSecurity" : "max-age=31536000; includeSubDomains" + }, + "smtpServer" : { }, + "eventsEnabled" : false, + "eventsListeners" : [ "jboss-logging" ], + "enabledEventTypes" : [ ], + "adminEventsEnabled" : false, + "adminEventsDetailsEnabled" : false, + "identityProviders" : [ ], + "identityProviderMappers" : [ ], + "components" : { + "org.keycloak.services.clientregistration.policy.ClientRegistrationPolicy" : [ { + "id" : "8115796f-8f1f-4d6a-88f8-ca2938451260", + "name" : "Allowed Client Scopes", + "providerId" : "allowed-client-templates", + "subType" : "authenticated", + "subComponents" : { }, + "config" : { + "allow-default-scopes" : [ "true" ] + } + }, { + "id" : "044bd055-714d-478e-aa93-303d2161c427", + "name" : "Allowed Protocol Mapper Types", + "providerId" : "allowed-protocol-mappers", + "subType" : "authenticated", + "subComponents" : { }, + "config" : { + "allowed-protocol-mapper-types" : [ "saml-user-property-mapper", "oidc-address-mapper", "oidc-sha256-pairwise-sub-mapper", "saml-role-list-mapper", "saml-user-attribute-mapper", "oidc-usermodel-property-mapper", "oidc-usermodel-attribute-mapper", "oidc-full-name-mapper" ] + } + }, { + "id" : "be465734-3b0f-4370-a144-73db756e23f8", + "name" : "Allowed Protocol Mapper Types", + "providerId" : "allowed-protocol-mappers", + "subType" : "anonymous", + "subComponents" : { }, + "config" : { + "allowed-protocol-mapper-types" : [ "oidc-usermodel-attribute-mapper", "saml-user-property-mapper", "oidc-address-mapper", "oidc-sha256-pairwise-sub-mapper", "saml-user-attribute-mapper", "oidc-full-name-mapper", "oidc-usermodel-property-mapper", "saml-role-list-mapper" ] + } + }, { + "id" : "42a2f64d-ac9e-4221-9cf6-40ff8c868629", + "name" : "Trusted Hosts", + "providerId" : "trusted-hosts", + "subType" : "anonymous", + "subComponents" : { }, + "config" : { + "host-sending-registration-request-must-match" : [ "true" ], + "client-uris-must-match" : [ "true" ] + } + }, { + "id" : "7ca08915-6c33-454c-88f2-20e1d6553b26", + "name" : "Max Clients Limit", + "providerId" : "max-clients", + "subType" : "anonymous", + "subComponents" : { }, + "config" : { + "max-clients" : [ "200" ] + } + }, { + "id" : "f01f2b6f-3f01-4d01-b2f4-70577c6f599c", + "name" : "Allowed Client Scopes", + "providerId" : "allowed-client-templates", + "subType" : "anonymous", + "subComponents" : { }, + "config" : { + "allow-default-scopes" : [ "true" ] + } + }, { + "id" : "516d7f21-f21a-4690-831e-36ad313093b2", + "name" : "Consent Required", + "providerId" : "consent-required", + "subType" : "anonymous", + "subComponents" : { }, + "config" : { } + }, { + "id" : "c79df6a0-d4d8-4866-b9e6-8ddb5d1bd38e", + "name" : "Full Scope Disabled", + "providerId" : "scope", + "subType" : "anonymous", + "subComponents" : { }, + "config" : { } + } ], + "org.keycloak.userprofile.UserProfileProvider" : [ { + "id" : "cf47a21f-c8fb-42f2-9bff-feca967db183", + "providerId" : "declarative-user-profile", + "subComponents" : { }, + "config" : { } + } ], + "org.keycloak.keys.KeyProvider" : [ { + "id" : "6b4a2281-a9e8-43ab-aee7-190ae91b2842", + "name" : "aes-generated", + "providerId" : "aes-generated", + "subComponents" : { }, + "config" : { + "kid" : [ "47b9c2c2-32dc-4317-bd8b-1c4e5bb740ca" ], + "secret" : [ "9VWsVSqbj5zWa8Mq-rRzOw" ], + "priority" : [ "100" ] + } + }, { + "id" : "68e2d2b0-4976-480f-ab76-f84a17686b05", + "name" : "rsa-enc-generated", + "providerId" : "rsa-enc-generated", + "subComponents" : { }, + "config" : { + "privateKey" : [ "MIIEpQIBAAKCAQEAwuIcVVJDncorsQcFef4M/J9dsaNNmwEv/+4pCSZuco7IlA9uCfvwjYgfwQlWoCHCc7JFEtUOXhpLNR0SJ9w2eCC9A/0horjLmiVGU5sGACGrAxSgipt399k83mtkPBTikT1BXumPrX51ovdEPVPQSO0hIBwFn4ZDwA9P/00jNzzswyLC2UDdQrwIjm2xWjq1X82d8mL3+Yp8lF9qD1w305+XPiqCC+TUunKsuCQq5sddet+UoCDsFQyxsJi6cWJrryDvQmiDgM2wm68jn6hyzDE76J1az0wKEGqoMEwIy0juqZCyAqgsm3xA+zHpTcI3EyTwDGpMvWNJp8AWqXPNaQIDAQABAoIBAAethL1+n/6WpUBEaoHcVrq5/2+vo0+dfTyVZNKRFqtG0WOWPzOflFd1HZV7YVPuJI+uPi8ANmsnbh9YcaYg9JiTZ0hMZ++giBf0ID2hZxv995NyXnf7fkoFKghevYG+9mVPtHRmxKlKiPFWfHQjP1ACNKAD2UZdcdbzxicaIkPV/hP996mZA3xaaudggAJq7u/W67H2Q6ofGqW4TI5241d8T+6yobbvXRe4n8FKz4eK2aZv+N+zwh5JDMsJ8050+lCDsyoyakEPf+4veuPkewx4FemAiotDNcmoUQSDL26wLw8kk1uZ9JY0M88OL5pMyBuxTqy0F6BWBltq80mlefECgYEA4vZ8Agu2plXOzWASn0dyhCel3QoeUqNY8D8A+0vK9qWxUE9jMG13jAZmsL2I38SuwRN1DhJezbrn4QTuxTukxgSjLDv/pBp9UnXnCz/fg4yPTYsZ0zHqTMbwvdtfIzBHTCYyIJ+unxVYoenC0XZKSQXA3NN2zNqYpLhjStWdEZECgYEA29DznJxpDZsRUieRxFgZ+eRCjbQ9Q2A46preqMo1KOZ6bt9avxG3uM7pUC+UOeIizeRzxPSJ2SyptYPzdaNwKN3Lq+RhjHe1zYLngXb0CIQaRwNHqePxXF1sg0dTbmcxf+Co7yPG+Nd5nrQq9SQHC3tLTyL6x3VU/yAfMQqUklkCgYEAyVl8iGAV6RkE/4R04OOEv6Ng7WkVn6CUvYZXe5kw9YHnfWUAjS0AOrRPFAsBy+r0UgvN8+7uNjvTjPhQT5/rPVVN4WdVEyQA/E/m6j7/LvhbBaMbBRcqUnTHjNd6XoBtMCxOmkyvoShR2krE8AiuPHwjLoVXxsNDWhbO18wMrVECgYEAlmkICOXNzI2K8Jg62gse2yshjy0BrpSs3XtTWFPkxDPRGwSiZ5OMD10lsMSdvG3MOu5TeTWLDZvOFHJRqPFI0e3Sa7A+P4u6TwF/v8rRePJLuMO5ybo7cWRL2Bh6MlVSPZpQfjIQ+D0Y70uBCXS5jVW0VlYtG0Zh/qDQNxJyTyECgYEAuRINlZ0ag+1QTITapSatbFWd/KquGLpMjZyF4k5gVHs+4zHnnTi1YIDUInp1FJBqKD27z2byy7KFgbMBZQmsDs8i4fgzQrJHe3D4WFFHCjiClbeReejbas9bOnqhSQCiIy1Ck8vMAriAtctSA/g/qq6dQApSgcWaKvTVL2Ywa7E=" ], + "keyUse" : [ "ENC" ], + "certificate" : [ "MIIClzCCAX8CBgGIQhOIijANBgkqhkiG9w0BAQsFADAPMQ0wCwYDVQQDDAR0ZXN0MB4XDTIzMDUyMjA2MDczNloXDTMzMDUyMjA2MDkxNlowDzENMAsGA1UEAwwEdGVzdDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAMLiHFVSQ53KK7EHBXn+DPyfXbGjTZsBL//uKQkmbnKOyJQPbgn78I2IH8EJVqAhwnOyRRLVDl4aSzUdEifcNnggvQP9IaK4y5olRlObBgAhqwMUoIqbd/fZPN5rZDwU4pE9QV7pj61+daL3RD1T0EjtISAcBZ+GQ8APT/9NIzc87MMiwtlA3UK8CI5tsVo6tV/NnfJi9/mKfJRfag9cN9Oflz4qggvk1LpyrLgkKubHXXrflKAg7BUMsbCYunFia68g70Jog4DNsJuvI5+ocswxO+idWs9MChBqqDBMCMtI7qmQsgKoLJt8QPsx6U3CNxMk8AxqTL1jSafAFqlzzWkCAwEAATANBgkqhkiG9w0BAQsFAAOCAQEAIEIfjqOr2m+8s2RR8VW/nBgOgu9HtPRda4qNhGbgBkZ8NDy7TwHqlHo1ujKW5RO438pRyLJmOibWN4a/rkUsSjin6vgy4l8KpQy+7a4cQCQHyl34TmPjbtiw1jKgiOjzRQY54NVwIJNMIMc1ZyQo4u0U30/FxgUv6akXfS5O1ePD+5xKOOC/Af9AletjhQMPwVxXDwFqfQf/p+SM4Pyn4L633MESfDrH8v9FjJd0lV5ZlEI4hpPtnbi9U+CInqCy3VDNlZjsXswaDRujjg3LERfOMvCgj+Dck3FzWG7EiCwXWNEPvdMzv4w7M6KXuiPPQkST8DUWjgkjUCeLBzT3yw==" ], + "priority" : [ "100" ], + "algorithm" : [ "RSA-OAEP" ] + } + }, { + "id" : "728769a3-99a4-4cca-959d-28181dfee7e8", + "name" : "rsa-generated", + "providerId" : "rsa-generated", + "subComponents" : { }, + "config" : { + "privateKey" : [ "MIIEowIBAAKCAQEAxIszQCv8bX3sKXJVtuLJV6cH/uhkzxcTEIcDe7y2Y2SFM0x2nF6wRLk8QkvIrRmelilegUIJttqZxLXMpxwUJGizehHQMrOCzNoGBZdVanoK7nNa5+FOYtlvL4GxNfwzS36sp3PnKQiGv5Q7RGuPthjLFfqTmYx/7GTDJC4vLEW5S01Vy/Xc9FE4FsT0hnm91lRWjppc9893M5QUy/TPu8udIuNV87Ko5yiIxQqcPiAQXJaN4CyGaDcYhhzzHdxVptIk2FvtxhpmNxrbtmBCx/o9/rBDQNTis8Ex6ItWC2PvC17UPvyOcZ4Fv/qO0L6JZ0mrpH95CeDU1kEP+KKZrwIDAQABAoIBAGGl6SYiVG1PyTQEXqqY/UCjt3jBnEg5ZhrpgWUKKrGyAO2uOSXSc5AJWfN0NHUwC9b+IbplhW8IJ6qQSmfiLu2x6S2mSQLPphZB4gkIGYNntCOpQ0p+aZP6BGAddt5j+VYyTvR5RKlh15S6QEHrkMB/i/LVBl0c7XeUzlEc8wnyj8DGvlmpcQzIcbWfqEZ/FciDdKGNN0M4V/r1uQiOUVZ69SWDBBwu41YwF7PYUsX83q8zn0nBeMqz0ggSf33lW4w31fox9c7EjIF01gPArE5uT+d+AwjVKHpd08LWGR9W9NSXVOPUKkzOM+PyvKGvzjMnlrm/feqowKQbL2q/GP0CgYEA/EsrvUojkFIWxHc19KJdJvqlYgLeWq6P/J7UmHgpl+S3nG6b9HH4/aM/ICDa5hxd5bmP5p2V3EuZWnyb6/QB5eipC7Ss3oM7XeS/PwvTp6NTC1fypx2zHKse3iuLeCGneRxiw15mB02ArJ/qJw/VSQK2J7RiR4+b6HYpdzQnIysCgYEAx25dTQqskQqsx/orJzuUqfNv/C0W4vqfz1eL3akFrdK+YqghXKFsDmh61JpTrTKnRLAdQeyOrhKwbNsdxSEEaeeLayKLVlimoFXGd/LZb5LQiwFcrvTzhnB+FLmFgqTnuLkpfY1woHEwSW9TpJewjbT9S6g0L2uh223nVXuLMY0CgYEA3pMOlmMGtvbEoTSuRBDNb2rmZm4zbfrcijgxRAWWZCtiFL68FU5LJLBVK2nw09sot1cabZCOuhdzxhFymRneZs73+5y8eV17DV2VnvA3HIiI5dQD/YzFDECm7ceqtiOylLUHKGZqSn0ETMaTkzxzpIKg4qxPm+RE3jMIZ+J5uJsCgYBk2iUIrtsxxgo2Xwavomu9vkPlbQ/j3QYwHn+2qqEalDZ/QbMNWvyAFMn49cpXDgSUsdM54V0OHpllkzFs3ROUUumoViHMmqw47OefBQp8Z+xaP2gVef4lAIJiDKe9t5MPUWPwADTyjgrzN/8+fw9juiFVv0wUpwOFKgEQs5diiQKBgC6RpZESc5Nl4nHrDvIl5n/zYED6BaXoLl15NhcoBudt5SIRO/RpvBW69A7aE/UK6p7WXjq4mP1ssIWz4KgATCoXUgYvn0a7Ql79r/CMce6/FvcuweED6u6bD0kdXuYhe8fR9IPmLfnnb4Cx3JOJeRZbiBSP5HOZJ7nsKibxcgPm" ], + "keyUse" : [ "SIG" ], + "certificate" : [ "MIIClzCCAX8CBgGIQhOHjjANBgkqhkiG9w0BAQsFADAPMQ0wCwYDVQQDDAR0ZXN0MB4XDTIzMDUyMjA2MDczNloXDTMzMDUyMjA2MDkxNlowDzENMAsGA1UEAwwEdGVzdDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAMSLM0Ar/G197ClyVbbiyVenB/7oZM8XExCHA3u8tmNkhTNMdpxesES5PEJLyK0ZnpYpXoFCCbbamcS1zKccFCRos3oR0DKzgszaBgWXVWp6Cu5zWufhTmLZby+BsTX8M0t+rKdz5ykIhr+UO0Rrj7YYyxX6k5mMf+xkwyQuLyxFuUtNVcv13PRROBbE9IZ5vdZUVo6aXPfPdzOUFMv0z7vLnSLjVfOyqOcoiMUKnD4gEFyWjeAshmg3GIYc8x3cVabSJNhb7cYaZjca27ZgQsf6Pf6wQ0DU4rPBMeiLVgtj7wte1D78jnGeBb/6jtC+iWdJq6R/eQng1NZBD/iima8CAwEAATANBgkqhkiG9w0BAQsFAAOCAQEAe0Bo1UpGfpOlJiVhp0XWExm8bdxFgXOU2M5XeZBsWAqBehvJkzn+tbAtlVNiIiN58XFFpH+xLZ2nJIZR5FHeCD3bYAgK72j5k45HJI95vPyslelfT/m3Np78+1iUa1U1WxN40JaowP1EeTkk5O8Pk4zTQ1Ne1usmKd+SJxI1KWN0kKuVFMmdNRb5kQKWeQvOSlWl7rd4bvHGvVnxgcPC1bshEJKRt+VpaUjpm6CKd8C3Kt7IWfIX4HTVhKZkmLn7qv6aSfwWelwZfLdaXcLXixqzqNuUk/VWbF9JT4iiag9F3mt7xryIkoRp1AEjCA82HqK72F4JCFyOhCiGrMfKJw==" ], + "priority" : [ "100" ] + } + }, { + "id" : "f30af2d2-d042-43b8-bc6d-22f6bab6934c", + "name" : "hmac-generated", + "providerId" : "hmac-generated", + "subComponents" : { }, + "config" : { + "kid" : [ "6f0d9688-e974-42b4-9d84-8d098c51007c" ], + "secret" : [ "8nruwD66Revr9k21e-BHtcyvNzAMFOsstxSAB0Gdy2qe2qGRm2kYOwsPzrH9ZQSdj2041SraKo6a3SHvCyTBAQ" ], + "priority" : [ "100" ], + "algorithm" : [ "HS256" ] + } + } ] + }, + "internationalizationEnabled" : false, + "supportedLocales" : [ ], + "authenticationFlows" : [ { + "id" : "94c65ba1-ba50-4be2-94c4-de656145eb67", + "alias" : "Account verification options", + "description" : "Method with which to verity the existing account", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "idp-email-verification", + "authenticatorFlow" : false, + "requirement" : "ALTERNATIVE", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "ALTERNATIVE", + "priority" : 20, + "autheticatorFlow" : true, + "flowAlias" : "Verify Existing Account by Re-authentication", + "userSetupAllowed" : false + } ] + }, { + "id" : "3b706ddf-c4b6-498a-803c-772878bc9bc3", + "alias" : "Authentication Options", + "description" : "Authentication options.", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "basic-auth", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "basic-auth-otp", + "authenticatorFlow" : false, + "requirement" : "DISABLED", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "auth-spnego", + "authenticatorFlow" : false, + "requirement" : "DISABLED", + "priority" : 30, + "autheticatorFlow" : false, + "userSetupAllowed" : false + } ] + }, { + "id" : "9ea0b8f6-882c-45ad-9110-78adf5a5d233", + "alias" : "Browser - Conditional OTP", + "description" : "Flow to determine if the OTP is required for the authentication", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "conditional-user-configured", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "auth-otp-form", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + } ] + }, { + "id" : "99c5ba83-b585-4601-b740-1a26670bf4e9", + "alias" : "Direct Grant - Conditional OTP", + "description" : "Flow to determine if the OTP is required for the authentication", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "conditional-user-configured", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "direct-grant-validate-otp", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + } ] + }, { + "id" : "65b73dec-7dd1-4de8-b542-a023b7104afc", + "alias" : "First broker login - Conditional OTP", + "description" : "Flow to determine if the OTP is required for the authentication", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "conditional-user-configured", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "auth-otp-form", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + } ] + }, { + "id" : "9a26b76f-da95-43f1-8da3-16c4a0654f07", + "alias" : "Handle Existing Account", + "description" : "Handle what to do if there is existing account with same email/username like authenticated identity provider", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "idp-confirm-link", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : true, + "flowAlias" : "Account verification options", + "userSetupAllowed" : false + } ] + }, { + "id" : "0a77285e-d7d5-4b6c-aa9a-3eadb5e7e3d3", + "alias" : "Reset - Conditional OTP", + "description" : "Flow to determine if the OTP should be reset or not. Set to REQUIRED to force.", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "conditional-user-configured", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "reset-otp", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + } ] + }, { + "id" : "cb6c0b3b-2f5f-4493-9d14-6130f8b58dd7", + "alias" : "User creation or linking", + "description" : "Flow for the existing/non-existing user alternatives", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticatorConfig" : "create unique user config", + "authenticator" : "idp-create-user-if-unique", + "authenticatorFlow" : false, + "requirement" : "ALTERNATIVE", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "ALTERNATIVE", + "priority" : 20, + "autheticatorFlow" : true, + "flowAlias" : "Handle Existing Account", + "userSetupAllowed" : false + } ] + }, { + "id" : "0fd3db1b-e93d-4768-82ca-a1498ddc11d0", + "alias" : "Verify Existing Account by Re-authentication", + "description" : "Reauthentication of existing account", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "idp-username-password-form", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "CONDITIONAL", + "priority" : 20, + "autheticatorFlow" : true, + "flowAlias" : "First broker login - Conditional OTP", + "userSetupAllowed" : false + } ] + }, { + "id" : "86610e70-f9f5-4c11-8a9e-9de1770565fb", + "alias" : "browser", + "description" : "browser based authentication", + "providerId" : "basic-flow", + "topLevel" : true, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "auth-cookie", + "authenticatorFlow" : false, + "requirement" : "ALTERNATIVE", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "auth-spnego", + "authenticatorFlow" : false, + "requirement" : "DISABLED", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "identity-provider-redirector", + "authenticatorFlow" : false, + "requirement" : "ALTERNATIVE", + "priority" : 25, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "ALTERNATIVE", + "priority" : 30, + "autheticatorFlow" : true, + "flowAlias" : "forms", + "userSetupAllowed" : false + } ] + }, { + "id" : "f6aa23dd-8532-4d92-9780-3ea226481e3b", + "alias" : "clients", + "description" : "Base authentication for clients", + "providerId" : "client-flow", + "topLevel" : true, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "client-secret", + "authenticatorFlow" : false, + "requirement" : "ALTERNATIVE", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "client-jwt", + "authenticatorFlow" : false, + "requirement" : "ALTERNATIVE", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "client-secret-jwt", + "authenticatorFlow" : false, + "requirement" : "ALTERNATIVE", + "priority" : 30, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "client-x509", + "authenticatorFlow" : false, + "requirement" : "ALTERNATIVE", + "priority" : 40, + "autheticatorFlow" : false, + "userSetupAllowed" : false + } ] + }, { + "id" : "4d2caf65-1703-4ddb-8890-70232e91bcd8", + "alias" : "direct grant", + "description" : "OpenID Connect Resource Owner Grant", + "providerId" : "basic-flow", + "topLevel" : true, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "direct-grant-validate-username", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "direct-grant-validate-password", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "CONDITIONAL", + "priority" : 30, + "autheticatorFlow" : true, + "flowAlias" : "Direct Grant - Conditional OTP", + "userSetupAllowed" : false + } ] + }, { + "id" : "eaa20c41-5334-4fb4-8c45-fb9cc71f7f74", + "alias" : "docker auth", + "description" : "Used by Docker clients to authenticate against the IDP", + "providerId" : "basic-flow", + "topLevel" : true, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "docker-http-basic-authenticator", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + } ] + }, { + "id" : "b9febfb1-f0aa-4590-b782-272a4aa11575", + "alias" : "first broker login", + "description" : "Actions taken after first broker login with identity provider account, which is not yet linked to any Keycloak account", + "providerId" : "basic-flow", + "topLevel" : true, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticatorConfig" : "review profile config", + "authenticator" : "idp-review-profile", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : true, + "flowAlias" : "User creation or linking", + "userSetupAllowed" : false + } ] + }, { + "id" : "03bb6ff4-eccb-4f2f-8953-3769f78c3bf3", + "alias" : "forms", + "description" : "Username, password, otp and other auth forms.", + "providerId" : "basic-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "auth-username-password-form", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "CONDITIONAL", + "priority" : 20, + "autheticatorFlow" : true, + "flowAlias" : "Browser - Conditional OTP", + "userSetupAllowed" : false + } ] + }, { + "id" : "38385189-246b-4ea0-ac05-d49dfe1709da", + "alias" : "http challenge", + "description" : "An authentication flow based on challenge-response HTTP Authentication Schemes", + "providerId" : "basic-flow", + "topLevel" : true, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "no-cookie-redirect", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : true, + "flowAlias" : "Authentication Options", + "userSetupAllowed" : false + } ] + }, { + "id" : "1022f3c2-0469-41c9-861e-918908f103df", + "alias" : "registration", + "description" : "registration flow", + "providerId" : "basic-flow", + "topLevel" : true, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "registration-page-form", + "authenticatorFlow" : true, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : true, + "flowAlias" : "registration form", + "userSetupAllowed" : false + } ] + }, { + "id" : "00d36c3b-e1dc-41f8-bfd0-5f8c80ea07e8", + "alias" : "registration form", + "description" : "registration form", + "providerId" : "form-flow", + "topLevel" : false, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "registration-user-creation", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "registration-profile-action", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 40, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "registration-password-action", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 50, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "registration-recaptcha-action", + "authenticatorFlow" : false, + "requirement" : "DISABLED", + "priority" : 60, + "autheticatorFlow" : false, + "userSetupAllowed" : false + } ] + }, { + "id" : "4374c16e-8c65-4168-94c2-df1ab3f3e6ad", + "alias" : "reset credentials", + "description" : "Reset credentials for a user if they forgot their password or something", + "providerId" : "basic-flow", + "topLevel" : true, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "reset-credentials-choose-user", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "reset-credential-email", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 20, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticator" : "reset-password", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 30, + "autheticatorFlow" : false, + "userSetupAllowed" : false + }, { + "authenticatorFlow" : true, + "requirement" : "CONDITIONAL", + "priority" : 40, + "autheticatorFlow" : true, + "flowAlias" : "Reset - Conditional OTP", + "userSetupAllowed" : false + } ] + }, { + "id" : "04d6ed6a-76c9-41fb-9074-bff8a80c2286", + "alias" : "saml ecp", + "description" : "SAML ECP Profile Authentication Flow", + "providerId" : "basic-flow", + "topLevel" : true, + "builtIn" : true, + "authenticationExecutions" : [ { + "authenticator" : "http-basic-authenticator", + "authenticatorFlow" : false, + "requirement" : "REQUIRED", + "priority" : 10, + "autheticatorFlow" : false, + "userSetupAllowed" : false + } ] + } ], + "authenticatorConfig" : [ { + "id" : "e7bad67d-1236-430a-a327-9194f9d1e2b0", + "alias" : "create unique user config", + "config" : { + "require.password.update.after.registration" : "false" + } + }, { + "id" : "287b5989-a927-4cf5-8067-74594ce19bc1", + "alias" : "review profile config", + "config" : { + "update.profile.on.first.login" : "missing" + } + } ], + "requiredActions" : [ { + "alias" : "CONFIGURE_TOTP", + "name" : "Configure OTP", + "providerId" : "CONFIGURE_TOTP", + "enabled" : true, + "defaultAction" : false, + "priority" : 10, + "config" : { } + }, { + "alias" : "terms_and_conditions", + "name" : "Terms and Conditions", + "providerId" : "terms_and_conditions", + "enabled" : false, + "defaultAction" : false, + "priority" : 20, + "config" : { } + }, { + "alias" : "UPDATE_PASSWORD", + "name" : "Update Password", + "providerId" : "UPDATE_PASSWORD", + "enabled" : true, + "defaultAction" : false, + "priority" : 30, + "config" : { } + }, { + "alias" : "UPDATE_PROFILE", + "name" : "Update Profile", + "providerId" : "UPDATE_PROFILE", + "enabled" : true, + "defaultAction" : false, + "priority" : 40, + "config" : { } + }, { + "alias" : "VERIFY_EMAIL", + "name" : "Verify Email", + "providerId" : "VERIFY_EMAIL", + "enabled" : true, + "defaultAction" : false, + "priority" : 50, + "config" : { } + }, { + "alias" : "delete_account", + "name" : "Delete Account", + "providerId" : "delete_account", + "enabled" : false, + "defaultAction" : false, + "priority" : 60, + "config" : { } + }, { + "alias" : "webauthn-register", + "name" : "Webauthn Register", + "providerId" : "webauthn-register", + "enabled" : true, + "defaultAction" : false, + "priority" : 70, + "config" : { } + }, { + "alias" : "webauthn-register-passwordless", + "name" : "Webauthn Register Passwordless", + "providerId" : "webauthn-register-passwordless", + "enabled" : true, + "defaultAction" : false, + "priority" : 80, + "config" : { } + }, { + "alias" : "update_user_locale", + "name" : "Update User Locale", + "providerId" : "update_user_locale", + "enabled" : true, + "defaultAction" : false, + "priority" : 1000, + "config" : { } + } ], + "browserFlow" : "browser", + "registrationFlow" : "registration", + "directGrantFlow" : "direct grant", + "resetCredentialsFlow" : "reset credentials", + "clientAuthenticationFlow" : "clients", + "dockerAuthenticationFlow" : "docker auth", + "attributes" : { + "cibaBackchannelTokenDeliveryMode" : "poll", + "cibaAuthRequestedUserHint" : "login_hint", + "oauth2DevicePollingInterval" : "5", + "clientOfflineSessionMaxLifespan" : "0", + "clientSessionIdleTimeout" : "0", + "clientOfflineSessionIdleTimeout" : "0", + "cibaInterval" : "5", + "cibaExpiresIn" : "120", + "oauth2DeviceCodeLifespan" : "600", + "parRequestUriLifespan" : "60", + "clientSessionMaxLifespan" : "0", + "frontendUrl" : "" + }, + "keycloakVersion" : "19.0.3", + "userManagedAccessAllowed" : false, + "clientProfiles" : { + "profiles" : [ ] + }, + "clientPolicies" : { + "policies" : [ ] + } +} diff --git a/conf/localstack/buckets.sh b/conf/localstack/buckets.sh new file mode 100755 index 00000000000..fe940d9890d --- /dev/null +++ b/conf/localstack/buckets.sh @@ -0,0 +1,3 @@ +#!/usr/bin/env bash +# https://stackoverflow.com/questions/53619901/auto-create-s3-buckets-on-localstack +awslocal s3 mb s3://mybucket diff --git a/conf/openshift/openshift.json b/conf/openshift/openshift.json deleted file mode 100644 index 583079a5260..00000000000 --- a/conf/openshift/openshift.json +++ /dev/null @@ -1,646 +0,0 @@ -{ - "kind": "Template", - "apiVersion": "v1", - "metadata": { - "name": "dataverse", - "labels": { - "name": "dataverse" - }, - "annotations": { - "openshift.io/description": "Dataverse is open source research data repository software: https://dataverse.org", - "openshift.io/display-name": "Dataverse" - } - }, - "objects": [ - { - "kind": "Secret", - "apiVersion": "v1", - "metadata": { - "name": "dataverse-admin-secret" - }, - "stringData" : { - "admin-password" : "${ADMIN_PASSWORD}" - } - }, - { - "kind": "Secret", - "apiVersion": "v1", - "metadata": { - "name": "dataverse-postgresql-secret" - }, - "stringData" : { - "postgresql-user" : "${POSTGRESQL_USER}", - "postgresql-password" : "${POSTGRESQL_PASSWORD}" - } - }, - { - "kind": "Secret", - "apiVersion": "v1", - "metadata": { - "name": "dataverse-postgresql-master-secret" - }, - "stringData" : { - "postgresql-master-user" : "${POSTGRESQL_MASTER_USER}", - "postgresql-master-password" : "${POSTGRESQL_MASTER_PASSWORD}" - } - }, - { - "kind": "Secret", - "apiVersion": "v1", - "metadata": { - "name": "dataverse-postgresql-admin-secret" - }, - "stringData" : { - "postgresql-admin-password" : "${POSTGRESQL_ADMIN_PASSWORD}" - } - }, - { - "kind": "Service", - "apiVersion": "v1", - "metadata": { - "name": "dataverse-glassfish-service" - }, - "spec": { - "selector": { - "name": "iqss-dataverse-glassfish" - }, - "ports": [ - { - "name": "web", - "protocol": "TCP", - "port": 8080, - "targetPort": 8080 - } - ] - } - }, - { - "kind": "Service", - "apiVersion": "v1", - "metadata": { - "name": "dataverse-postgresql-service" - }, - "spec": { - "selector": { - "name": "iqss-dataverse-postgresql" - }, - "clusterIP": "None", - "ports": [ - { - "name": "database", - "protocol": "TCP", - "port": 5432, - "targetPort": 5432 - } - ] - } - }, - { - "kind": "Service", - "apiVersion": "v1", - "metadata": { - "name": "dataverse-solr-service" - }, - "spec": { - "selector": { - "name": "iqss-dataverse-solr" - }, - "clusterIP": "None", - "ports": [ - { - "name": "search", - "protocol": "TCP", - "port": 8983, - "targetPort": 8983 - } - ] - } - }, - { - "apiVersion": "v1", - "kind": "Route", - "metadata": { - "annotations": { - "openshift.io/host.generated": "true" - }, - "name": "dataverse" - }, - "spec": { - "port": { - "targetPort": "web" - }, - "to": { - "kind": "Service", - "name": "dataverse-glassfish-service", - "weight": 100 - } - } - }, - { - "kind": "ImageStream", - "apiVersion": "v1", - "metadata": { - "name": "dataverse-plus-glassfish" - }, - "spec": { - "dockerImageRepository": "iqss/dataverse-glassfish" - } - }, - { - "kind": "ImageStream", - "apiVersion": "v1", - "metadata": { - "name": "centos-postgresql-94-centos7" - }, - "spec": { - "dockerImageRepository": "centos/postgresql-94-centos7" - } - }, - { - "kind": "ImageStream", - "apiVersion": "v1", - "metadata": { - "name": "iqss-dataverse-solr" - }, - "spec": { - "dockerImageRepository": "iqss/dataverse-solr" - } - }, - { - "kind": "StatefulSet", - "apiVersion": "apps/v1beta1", - "metadata": { - "name": "dataverse-glassfish", - "annotations": { - "template.alpha.openshift.io/wait-for-ready": "true", - "alpha.image.policy.openshift.io/resolve-names": "*" - } - }, - "spec": { - "serviceName": "dataverse-glassfish", - "replicas": 1, - "template": { - "metadata": { - "labels": { - "name": "iqss-dataverse-glassfish" - } - }, - "spec": { - "initContainers": [ - { - "name": "start-glassfish", - "image": "iqss/init-container:latest", - "imagePullPolicy": "IfNotPresent", - "env": [ - { - "name": "CONTAINER_NAME", - "valueFrom": { - "fieldRef": { - "fieldPath": "metadata.name" - } - } - }, - { - "name": "MY_POD_NAME", - "value": "start-glassfish" - }, - { - "name": "POSTGRES_ADMIN_PASSWORD", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-admin-secret", - "key" : "postgresql-admin-password" - } - } - }, - { - "name": "POSTGRES_SERVER", - "value": "dataverse-postgresql-0" - }, - - { - "name": "POSTGRES_SERVICE_HOST", - "value": "dataverse-postgresql-service" - }, - { - "name": "POSTGRES_USER", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-secret", - "key" : "postgresql-user" - } - } - }, - { - "name": "POSTGRES_PASSWORD", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-secret", - "key" : "postgresql-password" - } - } - }, - { - "name": "POSTGRES_DATABASE", - "value": "dvndb" - } - ] - } - ], - "containers": [ - { - "name": "dataverse-plus-glassfish", - "image": "iqss/dataverse-glassfish", - "ports": [ - { - "containerPort": 8080, - "protocol": "TCP" - } - ], - "resources": { - "limits": { - "memory": "2048Mi" - } - }, - "env": [ - { - "name": "MY_POD_NAME", - "valueFrom": { - "fieldRef": { - "fieldPath": "metadata.name" - } - } - }, - { - "name": "POSTGRES_SERVER", - "value": "dataverse-postgresql-0" - }, - { - "name": "POSTGRESQL_ADMIN_PASSWORD", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-admin-secret", - "key" : "postgresql-admin-password" - } - } - }, - { - "name": "POSTGRES_SERVICE_HOST", - "value": "dataverse-postgresql-service" - }, - { - "name": "SOLR_SERVICE_HOST", - "value": "dataverse-solr-service" - }, - { - "name": "ADMIN_PASSWORD", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-admin-secret", - "key" : "admin-password" - } - } - }, - { - "name": "SMTP_HOST", - "value": "localhost" - }, - { - "name": "POSTGRES_USER", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-secret", - "key" : "postgresql-user" - } - } - }, - { - "name": "POSTGRES_PASSWORD", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-secret", - "key" : "postgresql-password" - } - } - }, - { - "name": "POSTGRES_DATABASE", - "value": "dvndb" - } - ], - "imagePullPolicy": "IfNotPresent", - "securityContext": { - "capabilities": {}, - "privileged": false - } - } - ] - } - }, - "strategy": { - "type": "Rolling", - "rollingParams": { - "updatePeriodSeconds": 1, - "intervalSeconds": 1, - "timeoutSeconds": 300 - }, - "resources": { - "limits": { - "memory": "512Mi" - } - } - }, - "triggers": [ - { - "type": "ImageChange", - "imageChangeParams": { - "automatic": true, - "containerNames": [ - "dataverse-plus-glassfish" - ], - "from": { - "kind": "ImageStreamTag", - "name": "dataverse-plus-glassfish:latest" - } - } - }, - { - "type": "ConfigChange" - } - ], - "selector": { - "name": "iqss-dataverse-glassfish", - "matchLabels": { - "name": "iqss-dataverse-glassfish" - } - } - } - }, - { - "kind": "StatefulSet", - "apiVersion": "apps/v1beta1", - "metadata": { - "name": "dataverse-postgresql", - "annotations": { - "template.alpha.openshift.io/wait-for-ready": "true" - } - }, - "spec": { - "serviceName": "dataverse-postgresql-service", - "replicas": 1, - "template": { - "metadata": { - "labels": { - "name": "iqss-dataverse-postgresql" - } - }, - "spec": { - "containers": [ - { - "name": "centos-postgresql-94-centos7", - "image": "centos/postgresql-94-centos7", - "command": [ - "sh", - "-c", - "echo 'Setting up Postgres Master/Slave replication...'; [[ `hostname` =~ -([0-9]+)$ ]] || exit 1; ordinal=${BASH_REMATCH[1]}; if [[ $ordinal -eq 0 ]]; then run-postgresql-master; else run-postgresql-slave; fi;" - ], - "ports": [ - { - "containerPort": 5432, - "protocol": "TCP" - } - ], - "env": [ - { - "name": "POSTGRESQL_USER", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-secret", - "key" : "postgresql-user" - } - } - }, - { - "name": "POSTGRESQL_MASTER_USER", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-master-secret", - "key" : "postgresql-master-user" - } - } - }, - { - "name": "POSTGRESQL_PASSWORD", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-secret", - "key" : "postgresql-password" - } - } - }, - { - "name": "POSTGRESQL_MASTER_PASSWORD", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-master-secret", - "key" : "postgresql-master-password" - } - } - }, - { - "name": "POSTGRESQL_MASTER_SERVICE_NAME", - "value": "dataverse-postgresql-service" - }, - { - "name": "POSTGRESQL_MASTER_IP", - "value": "dataverse-postgresql-0.dataverse-postgresql-service" - }, - { - "name": "postgresql_master_addr", - "value": "dataverse-postgresql-0.dataverse-postgresql-service" - }, - { - "name": "master_fqdn", - "value": "dataverse-postgresql-0.dataverse-postgresql-service" - }, - { - "name": "POSTGRESQL_DATABASE", - "value": "dvndb" - }, - { - "name": "POSTGRESQL_ADMIN_PASSWORD", - "valueFrom": { - "secretKeyRef": { - "name" : "dataverse-postgresql-admin-secret", - "key" : "postgresql-admin-password" - } - } - } - ], - "resources": { - "limits": { - "memory": "256Mi" - } - }, - "imagePullPolicy": "IfNotPresent", - "securityContext": { - "capabilities": {}, - "privileged": false - } - } - ] - } - }, - "strategy": { - "type": "Rolling", - "rollingParams": { - "updatePeriodSeconds": 1, - "intervalSeconds": 1, - "timeoutSeconds": 300 - }, - "resources": {} - }, - "triggers": [ - { - "type": "ImageChange", - "imageChangeParams": { - "automatic": true, - "containerNames": [ - "centos-postgresql-94-centos7" - ], - "from": { - "kind": "ImageStreamTag", - "name": "centos/postgresql-94-centos7:latest" - } - } - }, - { - "type": "ConfigChange" - } - ], - "selector": { - "name": "iqss-dataverse-postgresql", - "matchLabels": { - "name": "iqss-dataverse-postgresql" - } - } - } - }, - { - "kind": "StatefulSet", - "apiVersion": "apps/v1beta1", - "metadata": { - "name": "dataverse-solr", - "annotations": { - "template.alpha.openshift.io/wait-for-ready": "true" - } - }, - "spec": { - "serviceName" : "dataverse-solr-service", - "template": { - "metadata": { - "labels": { - "name": "iqss-dataverse-solr" - } - }, - "spec": { - "containers": [ - { - "name": "iqss-dataverse-solr", - "image": "iqss/dataverse-solr", - "ports": [ - { - "containerPort": 8983, - "protocol": "TCP" - } - ], - "resources": { - "limits": { - "memory": "1024Mi" - } - }, - "imagePullPolicy": "IfNotPresent", - "securityContext": { - "capabilities": {}, - "privileged": false - } - } - ] - } - }, - "strategy": { - "type": "Rolling", - "rollingParams": { - "updatePeriodSeconds": 1, - "intervalSeconds": 1, - "timeoutSeconds": 300 - }, - "resources": {} - }, - "triggers": [ - { - "type": "ImageChange", - "imageChangeParams": { - "automatic": true, - "containerNames": [ - "iqss-dataverse-solr" - ], - "from": { - "kind": "ImageStreamTag", - "name": "iqss-dataverse-solr:latest" - } - } - }, - { - "type": "ConfigChange" - } - ], - "replicas": 1, - "selector": { - "name": "iqss-dataverse-solr", - "matchLabels" : { - "name" : "iqss-dataverse-solr" - } - } - } - } - ], - "parameters": [ - { - "name": "ADMIN_PASSWORD", - "description": "admin password", - "generate": "expression", - "from": "[a-zA-Z0-9]{8}" - }, - { - "name": "POSTGRESQL_USER", - "description": "postgresql user", - "generate": "expression", - "from": "user[A-Z0-9]{3}" - }, - { - "name": "POSTGRESQL_PASSWORD", - "description": "postgresql password", - "generate": "expression", - "from": "[a-zA-Z0-9]{8}" - }, - { - "name": "POSTGRESQL_MASTER_USER", - "description": "postgresql master user", - "generate": "expression", - "from": "user[A-Z0-9]{3}" - }, - { - "name": "POSTGRESQL_MASTER_PASSWORD", - "description": "postgresql master password", - "generate": "expression", - "from": "[a-zA-Z0-9]{8}" - }, - { - "name": "POSTGRESQL_ADMIN_PASSWORD", - "description": "postgresql admin password", - "generate": "expression", - "from": "[a-zA-Z0-9]{8}" - } - ] -} diff --git a/conf/proxy/Caddyfile b/conf/proxy/Caddyfile new file mode 100644 index 00000000000..70e6904d26e --- /dev/null +++ b/conf/proxy/Caddyfile @@ -0,0 +1,12 @@ +# This configuration is intended to be used with Caddy, a very small high perf proxy. +# It will serve the application containers Payara Admin GUI via HTTP instead of HTTPS, +# avoiding the trouble of self signed certificates for local development. + +:4848 { + reverse_proxy https://dataverse:4848 { + transport http { + tls_insecure_skip_verify + } + header_down Location "^https://" "http://" + } +} diff --git a/conf/solr/7.3.1/readme.md b/conf/solr/7.3.1/readme.md deleted file mode 100644 index 4457cf9a7df..00000000000 --- a/conf/solr/7.3.1/readme.md +++ /dev/null @@ -1 +0,0 @@ -Please see the dev guide for what to do with Solr config files. \ No newline at end of file diff --git a/conf/solr/7.3.1/schema.xml b/conf/solr/7.3.1/schema.xml deleted file mode 100644 index fd307a32f07..00000000000 --- a/conf/solr/7.3.1/schema.xml +++ /dev/null @@ -1,1186 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - id - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/conf/solr/7.3.1/schema_dv_mdb_copies.xml b/conf/solr/7.3.1/schema_dv_mdb_copies.xml deleted file mode 100644 index 0208fdf3910..00000000000 --- a/conf/solr/7.3.1/schema_dv_mdb_copies.xml +++ /dev/null @@ -1,157 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/conf/solr/7.3.1/schema_dv_mdb_fields.xml b/conf/solr/7.3.1/schema_dv_mdb_fields.xml deleted file mode 100644 index 6caa7c6de69..00000000000 --- a/conf/solr/7.3.1/schema_dv_mdb_fields.xml +++ /dev/null @@ -1,157 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/conf/solr/7.3.1/solrconfig.xml b/conf/solr/7.3.1/solrconfig.xml deleted file mode 100644 index 51100af7d6e..00000000000 --- a/conf/solr/7.3.1/solrconfig.xml +++ /dev/null @@ -1,1410 +0,0 @@ - - - - - - - - - 7.3.0 - - - - - - - - - - - - - - - - - - - - ${solr.data.dir:} - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ${solr.lock.type:native} - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ${solr.ulog.dir:} - ${solr.ulog.numVersionBuckets:65536} - - - - - ${solr.autoCommit.maxTime:15000} - false - - - - - - ${solr.autoSoftCommit.maxTime:-1} - - - - - - - - - - - - - - 1024 - - - - - - - - - - - - - - - - - - - - - - - - true - - - - - - 20 - - - 200 - - - - - - - - - - - - - - - - false - - - - - - - - - - - - - - - - - - - - - - explicit - 10 - edismax - 0.075 - - dvName^400 - authorName^180 - dvSubject^190 - dvDescription^180 - dvAffiliation^170 - title^130 - subject^120 - keyword^110 - topicClassValue^100 - dsDescriptionValue^90 - authorAffiliation^80 - publicationCitation^60 - producerName^50 - fileName^30 - fileDescription^30 - variableLabel^20 - variableName^10 - _text_^1.0 - - - dvName^200 - authorName^100 - dvSubject^100 - dvDescription^100 - dvAffiliation^100 - title^75 - subject^75 - keyword^75 - topicClassValue^75 - dsDescriptionValue^75 - authorAffiliation^75 - publicationCitation^75 - producerName^75 - - - - isHarvested:false^25000 - - - - - - - - - - - - - - - - - - explicit - json - true - - - - - - - - explicit - - - - - - _text_ - - - - - - - true - ignored_ - _text_ - - - - - - - - - text_general - - - - - - default - _text_ - solr.DirectSolrSpellChecker - - internal - - 0.5 - - 2 - - 1 - - 5 - - 4 - - 0.01 - - - - - - - - - - - - default - on - true - 10 - 5 - 5 - true - true - 10 - 5 - - - spellcheck - - - - - - - - - - true - - - tvComponent - - - - - - - - - - - - true - false - - - terms - - - - - - - - string - - - - - - explicit - - - elevator - - - - - - - - - - - 100 - - - - - - - - 70 - - 0.5 - - [-\w ,/\n\"']{20,200} - - - - - - - ]]> - ]]> - - - - - - - - - - - - - - - - - - - - - - - - ,, - ,, - ,, - ,, - ,]]> - ]]> - - - - - - 10 - .,!? - - - - - - - WORD - - - en - US - - - - - - - - - - - - - - [^\w-\.] - _ - - - - - - - yyyy-MM-dd'T'HH:mm:ss.SSSZ - yyyy-MM-dd'T'HH:mm:ss,SSSZ - yyyy-MM-dd'T'HH:mm:ss.SSS - yyyy-MM-dd'T'HH:mm:ss,SSS - yyyy-MM-dd'T'HH:mm:ssZ - yyyy-MM-dd'T'HH:mm:ss - yyyy-MM-dd'T'HH:mmZ - yyyy-MM-dd'T'HH:mm - yyyy-MM-dd HH:mm:ss.SSSZ - yyyy-MM-dd HH:mm:ss,SSSZ - yyyy-MM-dd HH:mm:ss.SSS - yyyy-MM-dd HH:mm:ss,SSS - yyyy-MM-dd HH:mm:ssZ - yyyy-MM-dd HH:mm:ss - yyyy-MM-dd HH:mmZ - yyyy-MM-dd HH:mm - yyyy-MM-dd - - - - - - - - - - - - - - - - - - - - - - - - - - - - - text/plain; charset=UTF-8 - - - - - ${velocity.template.base.dir:} - ${velocity.solr.resource.loader.enabled:true} - ${velocity.params.resource.loader.enabled:false} - - - - - 5 - - - - - - - - - - - - - - diff --git a/conf/solr/7.3.1/updateSchemaMDB.sh b/conf/solr/7.3.1/updateSchemaMDB.sh deleted file mode 100755 index e4446083442..00000000000 --- a/conf/solr/7.3.1/updateSchemaMDB.sh +++ /dev/null @@ -1,79 +0,0 @@ -#!/bin/sh -set -euo pipefail - -# This script updates the and schema configuration necessary to properly -# index custom metadata fields in Solr. -# 1. Retrieve from Dataverse API endpoint -# 2. Parse and write Solr schema files (which might replace the included files) -# 3. Reload Solr -# -# List of variables: -# ${DATAVERSE_URL}: URL to Dataverse. Defaults to http://localhost:8080 -# ${SOLR_URL}: URL to Solr. Defaults to http://localhost:8983 -# ${UNBLOCK_KEY}: File path to secret or unblock key as string. Only necessary on k8s or when you secured your installation. -# ${TARGET}: Directory where to write the XML files. Defaults to /tmp -# -# Programs used (need to be available on your PATH): -# coreutils: mktemp, csplit -# curl - -usage() { - echo "usage: updateSchemaMDB.sh [options]" - echo "options:" - echo " -d Dataverse URL, defaults to http://localhost:8080" - echo " -h Show this help text" - echo " -s Solr URL, defaults to http://localhost:8983" - echo " -t Directory where to write the XML files. Defaults to /tmp" - echo " -u Dataverse unblock key either as key string or path to keyfile" -} - -### Init (with sane defaults) -DATAVERSE_URL=${DATAVERSE_URL:-"http://localhost:8080"} -SOLR_URL=${SOLR_URL:-"http://localhost:8983"} -TARGET=${TARGET:-"/tmp"} -UNBLOCK_KEY=${UNBLOCK_KEY:-""} - -# if cmdline args are given, override any env var setting (or defaults) -while getopts ":d:hs:t:u:" opt -do - case $opt in - d) DATAVERSE_URL=${OPTARG};; - h) usage; exit 0;; - s) SOLR_URL=${OPTARG};; - t) TARGET=${OPTARG};; - u) UNBLOCK_KEY=${OPTARG};; - :) echo "Missing option argument for -${OPTARG}. Use -h for help." >&2; exit 1;; - \?) echo "Unknown option -${OPTARG}." >&2; usage; exit 1;; - esac -done - -# Special handling of unblock key depending on referencing a secret file or key in var -if [ ! -z "${UNBLOCK_KEY}" ]; then - if [ -f "${UNBLOCK_KEY}" ]; then - UNBLOCK_KEY="?unblock-key=$(cat ${UNBLOCK_KEY})" - else - UNBLOCK_KEY="?unblock-key=${UNBLOCK_KEY}" - fi -fi - -### Retrieval -echo "Retrieve schema data from ${DATAVERSE_URL}/api/admin/index/solr/schema" -TMPFILE=`mktemp` -curl -f -sS "${DATAVERSE_URL}/api/admin/index/solr/schema${UNBLOCK_KEY}" > $TMPFILE - -### Processing -echo "Writing ${TARGET}/schema_dv_mdb_fields.xml" -echo "" > ${TARGET}/schema_dv_mdb_fields.xml -cat ${TMPFILE} | grep ".*> ${TARGET}/schema_dv_mdb_fields.xml -echo "" >> ${TARGET}/schema_dv_mdb_fields.xml - -echo "Writing ${TARGET}/schema_dv_mdb_copies.xml" -echo "" > ${TARGET}/schema_dv_mdb_copies.xml -cat ${TMPFILE} | grep ".*> ${TARGET}/schema_dv_mdb_copies.xml -echo "" >> ${TARGET}/schema_dv_mdb_copies.xml - -rm ${TMPFILE}* - -### Reloading -echo "Triggering Solr RELOAD at ${SOLR_URL}/solr/admin/cores?action=RELOAD&core=collection1" -curl -f -sS "${SOLR_URL}/solr/admin/cores?action=RELOAD&core=collection1" \ No newline at end of file diff --git a/conf/solr/schema.xml b/conf/solr/schema.xml new file mode 100644 index 00000000000..34f888acec4 --- /dev/null +++ b/conf/solr/schema.xml @@ -0,0 +1,1607 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + id + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/conf/solr/solrconfig.xml b/conf/solr/solrconfig.xml new file mode 100644 index 00000000000..003b71c85c1 --- /dev/null +++ b/conf/solr/solrconfig.xml @@ -0,0 +1,1129 @@ + + + + + + + + + 9.11 + + + ${solr.data.dir:} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ${solr.lock.type:native} + + + + + + + + + + + + + + + + + + + + + ${solr.ulog.dir:} + + + + + ${solr.autoCommit.maxTime:300000} + false + + + + + + ${solr.autoSoftCommit.maxTime:1000} + + + + + + + + + + + + + + ${solr.max.booleanClauses:1024} + + + ${solr.query.minPrefixLength:-1} + + + + + + + + + + + + + + + + + + + + + + + + true + + + + + + 20 + + + 200 + + + + + + + + + + + + + + + + false + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + explicit + 10 + + edismax + 0.075 + + dvName^400 + authorName^180 + dvSubject^190 + dvDescription^180 + dvAffiliation^170 + title^130 + subject^120 + keyword^110 + topicClassValue^100 + dsDescriptionValue^90 + authorAffiliation^80 + publicationCitation^60 + producerName^50 + fileName^30 + fileDescription^30 + variableLabel^20 + variableName^10 + _text_^1.0 + + + dvName^200 + authorName^100 + dvSubject^100 + dvDescription^100 + dvAffiliation^100 + title^75 + subject^75 + keyword^75 + topicClassValue^75 + dsDescriptionValue^75 + authorAffiliation^75 + publicationCitation^75 + producerName^75 + + + + isHarvested:false^25000 + + + + + + + + explicit + json + true + + + + + + + _text_ + + + + + + + text_general + + + + + + default + _text_ + solr.DirectSolrSpellChecker + + internal + + 0.5 + + 2 + + 1 + + 5 + + 4 + + 0.01 + + + + + + + + + + + + + + + + + + + + + 100 + + + + + + + + 70 + + 0.5 + + [-\w ,/\n\"']{20,200} + + + + + + + ]]> + ]]> + + + + + + + + + + + + + + + + + + + + + + + + ,, + ,, + ,, + ,, + ,]]> + ]]> + + + + + + 10 + .,!? + + + + + + + WORD + + + en + US + + + + + + + + + + + + + [^\w-\.] + _ + + + 1000 + true + + + + + + + yyyy-MM-dd['T'[HH:mm[:ss[.SSS]][z + yyyy-MM-dd['T'[HH:mm[:ss[,SSS]][z + yyyy-MM-dd HH:mm[:ss[.SSS]][z + yyyy-MM-dd HH:mm[:ss[,SSS]][z + [EEE, ]dd MMM yyyy HH:mm[:ss] z + EEEE, dd-MMM-yy HH:mm:ss z + EEE MMM ppd HH:mm:ss [z ]yyyy + + + + + java.lang.String + text_general + + *_str + 256 + + + true + + + java.lang.Boolean + booleans + + + java.util.Date + pdates + + + java.lang.Long + java.lang.Integer + plongs + + + java.lang.Number + pdoubles + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/conf/solr/update-fields.sh b/conf/solr/update-fields.sh new file mode 100755 index 00000000000..5dbe062026e --- /dev/null +++ b/conf/solr/update-fields.sh @@ -0,0 +1,220 @@ +#!/usr/bin/env bash + +set -euo pipefail + +# [INFO]: Update a prepared Solr schema.xml for Dataverse with a given list of metadata fields + +#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### +# This script will +# 1. take a file (or read it from STDIN) with all and definitions +# 2. and replace the sections between the include guards with those in a given +# schema.xml file +# The script validates the presence, uniqueness and order of the include guards. +#### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### + + +### Variables +# Internal use only (fork to change) +VERSION="0.1" +INPUT="" +FIELDS="" +COPY_FIELDS="" +TRIGGER_CHAIN=0 +ED_DELETE_FIELDS="'a+,'b-d" +ED_DELETE_COPYFIELDS="'a+,'b-d" + +SOLR_SCHEMA_FIELD_BEGIN_MARK="SCHEMA-FIELDS::BEGIN" +SOLR_SCHEMA_FIELD_END_MARK="SCHEMA-FIELDS::END" +SOLR_SCHEMA_COPYFIELD_BEGIN_MARK="SCHEMA-COPY-FIELDS::BEGIN" +SOLR_SCHEMA_COPYFIELD_END_MARK="SCHEMA-COPY-FIELDS::END" +MARKS_ORDERED="${SOLR_SCHEMA_FIELD_BEGIN_MARK} ${SOLR_SCHEMA_FIELD_END_MARK} ${SOLR_SCHEMA_COPYFIELD_BEGIN_MARK} ${SOLR_SCHEMA_COPYFIELD_END_MARK}" + +### Common functions +function error { + echo "ERROR:" "$@" >&2 + exit 2 +} + +function exists { + type "$1" >/dev/null 2>&1 && return 0 + ( IFS=:; for p in $PATH; do [ -x "${p%/}/$1" ] && return 0; done; return 1 ) +} + +function usage { + cat << EOF +$(basename "$0") ${VERSION} +Usage: $(basename "$0") [-hp] [ schema file ] [ source file ] + +-h Print usage (this text) +-p Chained printing: write all metadata schema related + and present in Solr XML to stdout + +Provide target Solr Schema XML via argument or \$SCHEMA env var. + +Provide source file via argument, \$SOURCE env var or piped input +(wget/curl, chained). Source file = "-" means read STDIN. +EOF + exit 0 +} + +### Options +while getopts ":hp" opt; do + case $opt in + h) usage ;; + p) TRIGGER_CHAIN=1 ;; + \?) echo "Invalid option: -$OPTARG" >&2; exit 1 ;; + :) echo "Option -$OPTARG requires an argument." >&2; exit 1 ;; + esac +done + +# Check for ed and bc being present +exists ed || error "Please ensure ed, bc, sed + awk are installed (ed is missing)" +exists bc || error "Please ensure ed, bc, sed + awk are installed (bc is missing)" +exists awk || error "Please ensure ed, bc, sed + awk are installed (awk is missing)" +exists sed || error "Please ensure ed, bc, sed + awk are installed (sed is missing)" + +# remove all the parsed options +shift $((OPTIND-1)) + +# User overrideable locations +SCHEMA=${SCHEMA:-${1:-schema.xml}} +SOURCE=${SOURCE:-${2:-"-"}} + + +### VERIFY SCHEMA FILE EXISTS AND CONTAINS INCLUDE GUARDS ### +# Check for schema file & writeable +if [ ! -w "${SCHEMA}" ]; then + error "Cannot find or write to a XML schema at ${SCHEMA}" +else + # Check schema file for include guards + CHECKS=$( + for MARK in ${MARKS_ORDERED} + do + grep -c "${MARK}" "${SCHEMA}" || error "Missing ${MARK} from ${SCHEMA}" + done + ) + + # Check guards are unique (count occurrences and sum calc via bc) + # Note: fancy workaround to re-add closing \n on Linux & MacOS or no calculation + [ "$( (echo -n "${CHECKS}" | tr '\n' '+' ; echo ) | bc)" -eq 4 ] || \ + error "Some include guards are not unique in ${SCHEMA}" + + # Check guards are in order (line number comparison via bc tricks) + CHECKS=$( + for MARK in ${MARKS_ORDERED} + do + grep -n "${MARK}" "${SCHEMA}" | cut -f 1 -d ":" + done + ) + # Actual comparison of line numbers + echo "${CHECKS}" | tr '\n' '<' | awk -F'<' '{ if ($1 < $2 && $2 < $3 && $3 < $4) {exit 0} else {exit 1} }' || \ + error "Include guards are not in correct order in ${SCHEMA}" + + # Check guards are exclusively in their lines + # (no field or copyField on same line) + for MARK in ${MARKS_ORDERED} + do + grep "${MARK}" "${SCHEMA}" | grep -q -v -e '\(" "${SOURCE}" | sed -e 's#^\s\+##' -e 's#\s\+$##' || true) + + +### DATA HANDLING ### +# Split input into different types +if [ -z "${INPUT}" ]; then + error "No or in input" +else + # Check for definitions (if nomatch, avoid failing pipe) + FIELDS=$(mktemp) + echo "${INPUT}" | grep -e "" | sed -e 's#^# #' > "${FIELDS}" || true + # If file actually contains output, write to schema + if [ -s "${FIELDS}" ]; then + # Use an ed script to replace all + cat << EOF | grep -v -e "^#" | ed -s "${SCHEMA}" +H +# Mark field begin as 'a' +/${SOLR_SCHEMA_FIELD_BEGIN_MARK}/ka +# Mark field end as 'b' +/${SOLR_SCHEMA_FIELD_END_MARK}/kb +# Delete all between lines a and b +${ED_DELETE_FIELDS} +# Read fields file and paste after line a +'ar ${FIELDS} +# Write fields to schema +w +q +EOF + fi + rm "${FIELDS}" + + # Check for definitions (if nomatch, avoid failing pipe) + COPY_FIELDS=$(mktemp) + echo "${INPUT}" | grep -e "" | sed -e 's#^# #' > "${COPY_FIELDS}" || true + # If file actually contains output, write to schema + if [ -s "${COPY_FIELDS}" ]; then + # Use an ed script to replace all , filter comments (BSD ed does not support comments) + cat << EOF | grep -v -e "^#" | ed -s "${SCHEMA}" +H +# Mark copyField begin as 'a' +/${SOLR_SCHEMA_COPYFIELD_BEGIN_MARK}/ka +# Mark copyField end as 'b' +/${SOLR_SCHEMA_COPYFIELD_END_MARK}/kb +# Delete all between lines a and b +${ED_DELETE_COPYFIELDS} +# Read fields file and paste after line a +'ar ${COPY_FIELDS} +# Write copyFields to schema +w +q +EOF + fi + rm "${COPY_FIELDS}" +fi + + +### CHAINING OUTPUT +# Scripts following this one might want to use the field definitions now present +if [ "${TRIGGER_CHAIN}" -eq 1 ]; then + grep -A1000 "${SOLR_SCHEMA_FIELD_BEGIN_MARK}" "${SCHEMA}" | grep -B1000 "${SOLR_SCHEMA_FIELD_END_MARK}" + grep -A1000 "${SOLR_SCHEMA_COPYFIELD_BEGIN_MARK}" "${SCHEMA}" | grep -B1000 "${SOLR_SCHEMA_COPYFIELD_END_MARK}" +fi diff --git a/conf/vagrant/etc/shibboleth/attribute-map.xml b/conf/vagrant/etc/shibboleth/attribute-map.xml deleted file mode 100644 index f6386b620f5..00000000000 --- a/conf/vagrant/etc/shibboleth/attribute-map.xml +++ /dev/null @@ -1,141 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/conf/vagrant/etc/shibboleth/dataverse-idp-metadata.xml b/conf/vagrant/etc/shibboleth/dataverse-idp-metadata.xml deleted file mode 100644 index 67225b5e670..00000000000 --- a/conf/vagrant/etc/shibboleth/dataverse-idp-metadata.xml +++ /dev/null @@ -1,298 +0,0 @@ - - - - - - - - - - - - - - - - - - - - testshib.org - - TestShib Test IdP - TestShib IdP. Use this as a source of attributes - for your test SP. - https://www.testshib.org/images/testshib-transp.png - - - - - - - - MIIEDjCCAvagAwIBAgIBADANBgkqhkiG9w0BAQUFADBnMQswCQYDVQQGEwJVUzEV - MBMGA1UECBMMUGVubnN5bHZhbmlhMRMwEQYDVQQHEwpQaXR0c2J1cmdoMREwDwYD - VQQKEwhUZXN0U2hpYjEZMBcGA1UEAxMQaWRwLnRlc3RzaGliLm9yZzAeFw0wNjA4 - MzAyMTEyMjVaFw0xNjA4MjcyMTEyMjVaMGcxCzAJBgNVBAYTAlVTMRUwEwYDVQQI - EwxQZW5uc3lsdmFuaWExEzARBgNVBAcTClBpdHRzYnVyZ2gxETAPBgNVBAoTCFRl - c3RTaGliMRkwFwYDVQQDExBpZHAudGVzdHNoaWIub3JnMIIBIjANBgkqhkiG9w0B - AQEFAAOCAQ8AMIIBCgKCAQEArYkCGuTmJp9eAOSGHwRJo1SNatB5ZOKqDM9ysg7C - yVTDClcpu93gSP10nH4gkCZOlnESNgttg0r+MqL8tfJC6ybddEFB3YBo8PZajKSe - 3OQ01Ow3yT4I+Wdg1tsTpSge9gEz7SrC07EkYmHuPtd71CHiUaCWDv+xVfUQX0aT - NPFmDixzUjoYzbGDrtAyCqA8f9CN2txIfJnpHE6q6CmKcoLADS4UrNPlhHSzd614 - kR/JYiks0K4kbRqCQF0Dv0P5Di+rEfefC6glV8ysC8dB5/9nb0yh/ojRuJGmgMWH - gWk6h0ihjihqiu4jACovUZ7vVOCgSE5Ipn7OIwqd93zp2wIDAQABo4HEMIHBMB0G - A1UdDgQWBBSsBQ869nh83KqZr5jArr4/7b+QazCBkQYDVR0jBIGJMIGGgBSsBQ86 - 9nh83KqZr5jArr4/7b+Qa6FrpGkwZzELMAkGA1UEBhMCVVMxFTATBgNVBAgTDFBl - bm5zeWx2YW5pYTETMBEGA1UEBxMKUGl0dHNidXJnaDERMA8GA1UEChMIVGVzdFNo - aWIxGTAXBgNVBAMTEGlkcC50ZXN0c2hpYi5vcmeCAQAwDAYDVR0TBAUwAwEB/zAN - BgkqhkiG9w0BAQUFAAOCAQEAjR29PhrCbk8qLN5MFfSVk98t3CT9jHZoYxd8QMRL - I4j7iYQxXiGJTT1FXs1nd4Rha9un+LqTfeMMYqISdDDI6tv8iNpkOAvZZUosVkUo - 93pv1T0RPz35hcHHYq2yee59HJOco2bFlcsH8JBXRSRrJ3Q7Eut+z9uo80JdGNJ4 - /SJy5UorZ8KazGj16lfJhOBXldgrhppQBb0Nq6HKHguqmwRfJ+WkxemZXzhediAj - Geka8nz8JjwxpUjAiSWYKLtJhGEaTqCYxCCX2Dw+dOTqUzHOZ7WKv4JXPK5G/Uhr - 8K/qhmFT2nIQi538n6rVYLeWj8Bbnl+ev0peYzxFyF5sQA== - - - - - - - - - - - - - - - urn:mace:shibboleth:1.0:nameIdentifier - urn:oasis:names:tc:SAML:2.0:nameid-format:transient - - - - - - - - - - - - - - - - MIIEDjCCAvagAwIBAgIBADANBgkqhkiG9w0BAQUFADBnMQswCQYDVQQGEwJVUzEV - MBMGA1UECBMMUGVubnN5bHZhbmlhMRMwEQYDVQQHEwpQaXR0c2J1cmdoMREwDwYD - VQQKEwhUZXN0U2hpYjEZMBcGA1UEAxMQaWRwLnRlc3RzaGliLm9yZzAeFw0wNjA4 - MzAyMTEyMjVaFw0xNjA4MjcyMTEyMjVaMGcxCzAJBgNVBAYTAlVTMRUwEwYDVQQI - EwxQZW5uc3lsdmFuaWExEzARBgNVBAcTClBpdHRzYnVyZ2gxETAPBgNVBAoTCFRl - c3RTaGliMRkwFwYDVQQDExBpZHAudGVzdHNoaWIub3JnMIIBIjANBgkqhkiG9w0B - AQEFAAOCAQ8AMIIBCgKCAQEArYkCGuTmJp9eAOSGHwRJo1SNatB5ZOKqDM9ysg7C - yVTDClcpu93gSP10nH4gkCZOlnESNgttg0r+MqL8tfJC6ybddEFB3YBo8PZajKSe - 3OQ01Ow3yT4I+Wdg1tsTpSge9gEz7SrC07EkYmHuPtd71CHiUaCWDv+xVfUQX0aT - NPFmDixzUjoYzbGDrtAyCqA8f9CN2txIfJnpHE6q6CmKcoLADS4UrNPlhHSzd614 - kR/JYiks0K4kbRqCQF0Dv0P5Di+rEfefC6glV8ysC8dB5/9nb0yh/ojRuJGmgMWH - gWk6h0ihjihqiu4jACovUZ7vVOCgSE5Ipn7OIwqd93zp2wIDAQABo4HEMIHBMB0G - A1UdDgQWBBSsBQ869nh83KqZr5jArr4/7b+QazCBkQYDVR0jBIGJMIGGgBSsBQ86 - 9nh83KqZr5jArr4/7b+Qa6FrpGkwZzELMAkGA1UEBhMCVVMxFTATBgNVBAgTDFBl - bm5zeWx2YW5pYTETMBEGA1UEBxMKUGl0dHNidXJnaDERMA8GA1UEChMIVGVzdFNo - aWIxGTAXBgNVBAMTEGlkcC50ZXN0c2hpYi5vcmeCAQAwDAYDVR0TBAUwAwEB/zAN - BgkqhkiG9w0BAQUFAAOCAQEAjR29PhrCbk8qLN5MFfSVk98t3CT9jHZoYxd8QMRL - I4j7iYQxXiGJTT1FXs1nd4Rha9un+LqTfeMMYqISdDDI6tv8iNpkOAvZZUosVkUo - 93pv1T0RPz35hcHHYq2yee59HJOco2bFlcsH8JBXRSRrJ3Q7Eut+z9uo80JdGNJ4 - /SJy5UorZ8KazGj16lfJhOBXldgrhppQBb0Nq6HKHguqmwRfJ+WkxemZXzhediAj - Geka8nz8JjwxpUjAiSWYKLtJhGEaTqCYxCCX2Dw+dOTqUzHOZ7WKv4JXPK5G/Uhr - 8K/qhmFT2nIQi538n6rVYLeWj8Bbnl+ev0peYzxFyF5sQA== - - - - - - - - - - - - - - - - urn:mace:shibboleth:1.0:nameIdentifier - urn:oasis:names:tc:SAML:2.0:nameid-format:transient - - - - - TestShib Two Identity Provider - TestShib Two - http://www.testshib.org/testshib-two/ - - - Nate - Klingenstein - ndk@internet2.edu - - - - - - - - - - - - - - - - - - - - - - - - - TestShib Test SP - TestShib SP. Log into this to test your machine. - Once logged in check that all attributes that you expected have been - released. - https://www.testshib.org/images/testshib-transp.png - - - - - - - - MIIEPjCCAyagAwIBAgIBADANBgkqhkiG9w0BAQUFADB3MQswCQYDVQQGEwJVUzEV - MBMGA1UECBMMUGVubnN5bHZhbmlhMRMwEQYDVQQHEwpQaXR0c2J1cmdoMSIwIAYD - VQQKExlUZXN0U2hpYiBTZXJ2aWNlIFByb3ZpZGVyMRgwFgYDVQQDEw9zcC50ZXN0 - c2hpYi5vcmcwHhcNMDYwODMwMjEyNDM5WhcNMTYwODI3MjEyNDM5WjB3MQswCQYD - VQQGEwJVUzEVMBMGA1UECBMMUGVubnN5bHZhbmlhMRMwEQYDVQQHEwpQaXR0c2J1 - cmdoMSIwIAYDVQQKExlUZXN0U2hpYiBTZXJ2aWNlIFByb3ZpZGVyMRgwFgYDVQQD - Ew9zcC50ZXN0c2hpYi5vcmcwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB - AQDJyR6ZP6MXkQ9z6RRziT0AuCabDd3x1m7nLO9ZRPbr0v1LsU+nnC363jO8nGEq - sqkgiZ/bSsO5lvjEt4ehff57ERio2Qk9cYw8XCgmYccVXKH9M+QVO1MQwErNobWb - AjiVkuhWcwLWQwTDBowfKXI87SA7KR7sFUymNx5z1aoRvk3GM++tiPY6u4shy8c7 - vpWbVfisfTfvef/y+galxjPUQYHmegu7vCbjYP3On0V7/Ivzr+r2aPhp8egxt00Q - XpilNai12LBYV3Nv/lMsUzBeB7+CdXRVjZOHGuQ8mGqEbsj8MBXvcxIKbcpeK5Zi - JCVXPfarzuriM1G5y5QkKW+LAgMBAAGjgdQwgdEwHQYDVR0OBBYEFKB6wPDxwYrY - StNjU5P4b4AjBVQVMIGhBgNVHSMEgZkwgZaAFKB6wPDxwYrYStNjU5P4b4AjBVQV - oXukeTB3MQswCQYDVQQGEwJVUzEVMBMGA1UECBMMUGVubnN5bHZhbmlhMRMwEQYD - VQQHEwpQaXR0c2J1cmdoMSIwIAYDVQQKExlUZXN0U2hpYiBTZXJ2aWNlIFByb3Zp - ZGVyMRgwFgYDVQQDEw9zcC50ZXN0c2hpYi5vcmeCAQAwDAYDVR0TBAUwAwEB/zAN - BgkqhkiG9w0BAQUFAAOCAQEAc06Kgt7ZP6g2TIZgMbFxg6vKwvDL0+2dzF11Onpl - 5sbtkPaNIcj24lQ4vajCrrGKdzHXo9m54BzrdRJ7xDYtw0dbu37l1IZVmiZr12eE - Iay/5YMU+aWP1z70h867ZQ7/7Y4HW345rdiS6EW663oH732wSYNt9kr7/0Uer3KD - 9CuPuOidBacospDaFyfsaJruE99Kd6Eu/w5KLAGG+m0iqENCziDGzVA47TngKz2v - PVA+aokoOyoz3b53qeti77ijatSEoKjxheBWpO+eoJeGq/e49Um3M2ogIX/JAlMa - Inh+vYSYngQB2sx9LGkR9KHaMKNIGCDehk93Xla4pWJx1w== - - - - - - - - - - - - - - - - - - - - - urn:oasis:names:tc:SAML:2.0:nameid-format:transient - urn:mace:shibboleth:1.0:nameIdentifier - - - - - - - - - - - - - - - - - - - - TestShib Two Service Provider - TestShib Two - http://www.testshib.org/testshib-two/ - - - Nate - Klingenstein - ndk@internet2.edu - - - - - - - diff --git a/conf/vagrant/etc/shibboleth/shibboleth2.xml b/conf/vagrant/etc/shibboleth/shibboleth2.xml deleted file mode 100644 index 946e73bdf6a..00000000000 --- a/conf/vagrant/etc/shibboleth/shibboleth2.xml +++ /dev/null @@ -1,85 +0,0 @@ - - - - - - - - - - - - - - - - - - - - SAML2 SAML1 - - - - SAML2 Local - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/conf/vagrant/etc/yum.repos.d/epel-apache-maven.repo b/conf/vagrant/etc/yum.repos.d/epel-apache-maven.repo deleted file mode 100644 index 1e0f8200040..00000000000 --- a/conf/vagrant/etc/yum.repos.d/epel-apache-maven.repo +++ /dev/null @@ -1,15 +0,0 @@ -# Place this file in your /etc/yum.repos.d/ directory - -[epel-apache-maven] -name=maven from apache foundation. -baseurl=http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-$releasever/$basearch/ -enabled=1 -skip_if_unavailable=1 -gpgcheck=0 - -[epel-apache-maven-source] -name=maven from apache foundation. - Source -baseurl=http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-$releasever/SRPMS -enabled=0 -skip_if_unavailable=1 -gpgcheck=0 diff --git a/conf/vagrant/etc/yum.repos.d/shibboleth.repo b/conf/vagrant/etc/yum.repos.d/shibboleth.repo deleted file mode 100644 index ebbe3747a10..00000000000 --- a/conf/vagrant/etc/yum.repos.d/shibboleth.repo +++ /dev/null @@ -1,7 +0,0 @@ -[security_shibboleth] -name=Shibboleth (CentOS_CentOS-6) -type=rpm-md -baseurl=http://download.opensuse.org/repositories/security:/shibboleth/CentOS_CentOS-6/ -gpgcheck=1 -gpgkey=http://download.opensuse.org/repositories/security:/shibboleth/CentOS_CentOS-6/repodata/repomd.xml.key -enabled=1 diff --git a/conf/vagrant/var/lib/pgsql/data/pg_hba.conf b/conf/vagrant/var/lib/pgsql/data/pg_hba.conf deleted file mode 100644 index e3244686066..00000000000 --- a/conf/vagrant/var/lib/pgsql/data/pg_hba.conf +++ /dev/null @@ -1,74 +0,0 @@ -# PostgreSQL Client Authentication Configuration File -# =================================================== -# -# Refer to the "Client Authentication" section in the -# PostgreSQL documentation for a complete description -# of this file. A short synopsis follows. -# -# This file controls: which hosts are allowed to connect, how clients -# are authenticated, which PostgreSQL user names they can use, which -# databases they can access. Records take one of these forms: -# -# local DATABASE USER METHOD [OPTIONS] -# host DATABASE USER CIDR-ADDRESS METHOD [OPTIONS] -# hostssl DATABASE USER CIDR-ADDRESS METHOD [OPTIONS] -# hostnossl DATABASE USER CIDR-ADDRESS METHOD [OPTIONS] -# -# (The uppercase items must be replaced by actual values.) -# -# The first field is the connection type: "local" is a Unix-domain socket, -# "host" is either a plain or SSL-encrypted TCP/IP socket, "hostssl" is an -# SSL-encrypted TCP/IP socket, and "hostnossl" is a plain TCP/IP socket. -# -# DATABASE can be "all", "sameuser", "samerole", a database name, or -# a comma-separated list thereof. -# -# USER can be "all", a user name, a group name prefixed with "+", or -# a comma-separated list thereof. In both the DATABASE and USER fields -# you can also write a file name prefixed with "@" to include names from -# a separate file. -# -# CIDR-ADDRESS specifies the set of hosts the record matches. -# It is made up of an IP address and a CIDR mask that is an integer -# (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that specifies -# the number of significant bits in the mask. Alternatively, you can write -# an IP address and netmask in separate columns to specify the set of hosts. -# -# METHOD can be "trust", "reject", "md5", "password", "gss", "sspi", "krb5", -# "ident", "pam", "ldap" or "cert". Note that "password" sends passwords -# in clear text; "md5" is preferred since it sends encrypted passwords. -# -# OPTIONS are a set of options for the authentication in the format -# NAME=VALUE. The available options depend on the different authentication -# methods - refer to the "Client Authentication" section in the documentation -# for a list of which options are available for which authentication methods. -# -# Database and user names containing spaces, commas, quotes and other special -# characters must be quoted. Quoting one of the keywords "all", "sameuser" or -# "samerole" makes the name lose its special character, and just match a -# database or username with that name. -# -# This file is read on server startup and when the postmaster receives -# a SIGHUP signal. If you edit the file on a running system, you have -# to SIGHUP the postmaster for the changes to take effect. You can use -# "pg_ctl reload" to do that. - -# Put your actual configuration here -# ---------------------------------- -# -# If you want to allow non-local connections, you need to add more -# "host" records. In that case you will also need to make PostgreSQL listen -# on a non-local interface via the listen_addresses configuration parameter, -# or via the -i or -h command line switches. -# - - - -# TYPE DATABASE USER CIDR-ADDRESS METHOD - -# "local" is for Unix domain socket connections only -local all all trust -# IPv4 local connections: -host all all 127.0.0.1/32 trust -# IPv6 local connections: -host all all ::1/128 trust diff --git a/conf/vagrant/var/www/dataverse/error-documents/503.html b/conf/vagrant/var/www/dataverse/error-documents/503.html deleted file mode 100644 index 95a7dea4107..00000000000 --- a/conf/vagrant/var/www/dataverse/error-documents/503.html +++ /dev/null @@ -1 +0,0 @@ -

Custom "site is unavailable" 503 page.

diff --git a/doc/Architecture/components.uml b/doc/Architecture/components.uml index 5dd65fc714a..ad8119755c0 100644 --- a/doc/Architecture/components.uml +++ b/doc/Architecture/components.uml @@ -40,14 +40,11 @@ node "DatabaseServer2" { } node "RserveServer1" { - component "rApache" { - } database "Rserve" { } } Clients --> LoadBalancer -Clients --> rApache LoadBalancer --> Apache1 LoadBalancer --> Apache2 diff --git a/doc/Architecture/update-user-account-info.png b/doc/Architecture/update-user-account-info.png index a372104438c..aa7d5f881f1 100644 Binary files a/doc/Architecture/update-user-account-info.png and b/doc/Architecture/update-user-account-info.png differ diff --git a/doc/JAVADOC_GUIDE.md b/doc/JAVADOC_GUIDE.md index 8001abda248..997c40e1624 100644 --- a/doc/JAVADOC_GUIDE.md +++ b/doc/JAVADOC_GUIDE.md @@ -88,7 +88,7 @@ Here's a better approach: /** The dataverse we move the dataset from */ private Dataverse sourceDataverse; - /** The dataverse we movet the dataset to */ + /** The dataverse we move the dataset to */ private Dataverse destinationDataverse; diff --git a/doc/mergeParty/readme.md b/doc/mergeParty/readme.md index f97b17f7430..6f3af8511dc 100644 --- a/doc/mergeParty/readme.md +++ b/doc/mergeParty/readme.md @@ -1,5 +1,5 @@ -# Merge Party Readme -Welcome to the merge party! This document is intended to give a short overview of why we need this party, when was changed and how to change it. There's much work to do, so we'll keep it short. Hopefully. +# Merge Party +Welcome to the merge party! This document is intended to give a short overview of why we need this party, when was it changed and how to change it. There's much work to do, so we'll keep it short, hopefully. ## What Just Happened In order to allow users to log into Dataverse using credentials from other systems (e.g. institutional Shibboleth server), we had to refactor out the internal user management sub-system (formerly known as "DataverseUser") and introduce a new user system. The existing system was taken out of Dataverse but kept in the .war file, as we also need to support standalone instances. @@ -16,7 +16,7 @@ From a merge standpoint, this means that code that previously referenced `Datave Most of these changes have been done by Michael/Phil - otherwise, the `auth` branch would not compile. -Since the guest user does not live in the database, it does not have an id. Moreover, JPA classes cannot link directly to it\*. But have no fear - all users (and, really, all `RoleAssignee`s, which are users or groups) have an identifier. When you need to reference a user (and later, a group) just use the identifier (it's of type `String`). When needing to convert an identifier to a user, call `RoleAssigneeServiceBean.getRoleAssignee( identifier )` in the general case, or `AuthenticationServiceBean.getAuthenticatedUser(identifier)` if you're certain the identifier is of an authenticated user. +The guest user does not live in the database so it does not have an id. Moreover, JPA classes cannot link directly to it\*. But have no fear - all users (and, really, all `RoleAssignee`s, which are users or groups) have an identifier. When you need to reference a user (and later, a group) just use the identifier (it's of type `String`). When needing to convert an identifier to a user, call `RoleAssigneeServiceBean.getRoleAssignee( identifier )` in the general case, or `AuthenticationServiceBean.getAuthenticatedUser(identifier)` if you're certain the identifier is of an authenticated user. \* We have debated this for a while, since we could have created a dummy record, like we've done so far. We went with this solution, as it is cleaner, can't be messed up by SQL scripts, and will make even more sense once groups arrive. @@ -73,10 +73,10 @@ Note that before we were asking `isGuest` and now we ask `isAuthenticated`, so t ## Other Added Things ### Settings bean -Settings (in `edu.harvard.iq.dataverse.settings`) are where the application stores its more complex, admin-editable configuration. Technically, its a persistent `Map`, that can be accessed via API (`edu.harvard.iq.dataverse.api.Admin`, on path `{server}/api/s/settings`). Currenly used for the signup mechanism. +Settings (in `edu.harvard.iq.dataverse.settings`) are where the application stores its more complex, admin-editable configuration. Technically, its a persistent `Map`, that can be accessed via API (`edu.harvard.iq.dataverse.api.Admin`, on path `{server}/api/s/settings`). Currently used for the signup mechanism. ### Admin API -Accessible under url `{server}/api/s/`, API calls to this bean should be editing confugurations, allowing full indexing and more. The idea behing putting all of them under the `/s/` path is that we can later block these calls using a filter. This way, we could, say, allow access from localhost only. Or, we could block this completely based on some environemnt variable. +Accessible under url `{server}/api/s/`, API calls to this bean should be editing configurations, allowing full indexing and more. The idea behind putting all of them under the `/s/` path is that we can later block these calls using a filter. This way, we could, say, allow access from localhost only. Or, we could block this completely based on some environment variable. ### `setup-all.sh` script A new script that sets up the users and the dataverses, sets the system up for built-in signup, and then indexes the dataverses using solr. Requires the [jq utility](http://stedolan.github.io/jq/). On Macs with [homebrew](http://brew.sh) installed, getting this utility is a `brew install jq` command away. @@ -84,4 +84,4 @@ A new script that sets up the users and the dataverses, sets the system up for b ## Undoing the undoing the merge When merging back to master, we need to undo commit 8ae3e6a482b87b52a1745bb06f340875803d2c5b (a.k.a 8ae3e6a), which is the commit that undid the erroneous merge. -More at http://www.christianengvall.se/undo-pushed-merge-git/ \ No newline at end of file +More at http://www.christianengvall.se/undo-pushed-merge-git/ diff --git a/doc/release-notes/10190-dataset-count.md b/doc/release-notes/10190-dataset-count.md new file mode 100644 index 00000000000..a3d3a052f87 --- /dev/null +++ b/doc/release-notes/10190-dataset-count.md @@ -0,0 +1,2 @@ +The search index now includes datasetCount for each collection, counting published, linked, and harvested datasets. +Collections can be filtered using datasetCount (e.g., `datasetCount:[1000 TO *]`), and the value is returned in Dataverse search results via the Search API. \ No newline at end of file diff --git a/doc/release-notes/11243-editmetadata-api-extension.md b/doc/release-notes/11243-editmetadata-api-extension.md new file mode 100644 index 00000000000..3666d8bc30a --- /dev/null +++ b/doc/release-notes/11243-editmetadata-api-extension.md @@ -0,0 +1,7 @@ +### Edit Dataset Metadata API extension + +- This endpoint now allows removing fields (by sending empty values), as long as they are not required by the dataset. +- New ``sourceLastUpdateTime`` optional query parameter, which prevents inconsistencies by managing updates that + may occur from other users while a dataset is being edited. + +NOTE: This release note was updated to conform to the refactoring of the validation as part of issue #11392 diff --git a/doc/release-notes/11392-edit-file-metadata-empty-values.md b/doc/release-notes/11392-edit-file-metadata-empty-values.md new file mode 100644 index 00000000000..5839fa100af --- /dev/null +++ b/doc/release-notes/11392-edit-file-metadata-empty-values.md @@ -0,0 +1,7 @@ +### Edit File Metadata empty values should clear data + +Previously the API POST /files/{id}/metadata would ignore fields with empty values. Now the API updates the fields with the empty values essentially clearing the data. Missing fields will still be ignored. + +An optional query parameter (sourceLastUpdateTime) was added to ensure the metadata update doesn't overwrite stale data. + +See also [the guides](https://dataverse-guide--11359.org.readthedocs.build/en/11359/api/native-api.html#updating-file-metadata), #11392, and #11359. diff --git a/doc/release-notes/11448-api-endpoint-for-analytics-html.md b/doc/release-notes/11448-api-endpoint-for-analytics-html.md new file mode 100644 index 00000000000..ac62a1b5257 --- /dev/null +++ b/doc/release-notes/11448-api-endpoint-for-analytics-html.md @@ -0,0 +1,5 @@ +### Feature Request: API endpoint for analytics.html + +New API to get the analytics.html from settings for SPA (Also can be used to get homePage, header, footer, style, and logo) + +See also [the guides](https://dataverse-guide--11359.org.readthedocs.build/en/11359/installation/config.html#web-analytics-code), #11448. diff --git a/doc/release-notes/11485-mpconfig-personororg.md b/doc/release-notes/11485-mpconfig-personororg.md new file mode 100644 index 00000000000..c30ef3829c1 --- /dev/null +++ b/doc/release-notes/11485-mpconfig-personororg.md @@ -0,0 +1,7 @@ +The settings `dataverse.personOrOrg.assumeCommaInPersonName` and `dataverse.personOrOrg.orgPhraseArray` now support configuration via MicroProfile Config. + +They have been renamed to `dataverse.person-or-org.assume-comma-in-person-name` and `dataverse.person-or-org.org-phrase-array` for consistency with naming conventions. + +In addition to the existing `asadmin` JVM option method, any [supported MicroProfile Config API source](https://docs.payara.fish/community/docs/Technical%20Documentation/MicroProfile/Config/Overview.html) can now be used to set their values. + +For backwards compatibility, `dataverse.personOrOrg.assumeCommaInPersonName` is still supported. However, `dataverse.personOrOrg.orgPhraseArray` is not, due to a change in the expected value format. `dataverse.person-or-org.org-phrase-array` now expects a comma-separated list of phrases as a value instead of a JsonArray of strings. Please update both the name and value format if using the old setting. \ No newline at end of file diff --git a/doc/release-notes/11492-list-dataset-links.md b/doc/release-notes/11492-list-dataset-links.md new file mode 100644 index 00000000000..0a5a6f7a198 --- /dev/null +++ b/doc/release-notes/11492-list-dataset-links.md @@ -0,0 +1 @@ +The [API for listing the collections a dataset has been linked to](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#list-collections-that-are-linked-from-a-dataset) (`api/datasets/$linked-dataset-id/links`) is no longer restricted to superusers. For unpublished datasets, users need the "View Unpublished Dataset" permission to access the API. Unpublished collections in the list require the "View Unpublished Dataverse" permission; otherwise, they are hidden. \ No newline at end of file diff --git a/doc/release-notes/11534-link-permissions.md b/doc/release-notes/11534-link-permissions.md new file mode 100644 index 00000000000..29251c1b7d9 --- /dev/null +++ b/doc/release-notes/11534-link-permissions.md @@ -0,0 +1,3 @@ +Linking or unlinking a dataset or dataverse now requires the new "Link Dataset/Dataverse" permission. +Previously, this action was covered by the "Publish Dataset/Dataverse" permission. +Linking and publishing permissions can now be granted separately, allowing for more fine-grained access control. \ No newline at end of file diff --git a/doc/release-notes/11558-show-collections.md b/doc/release-notes/11558-show-collections.md new file mode 100644 index 00000000000..b204da62387 --- /dev/null +++ b/doc/release-notes/11558-show-collections.md @@ -0,0 +1,12 @@ +The Search API now supports a `show_collections` parameter for dataset results. +When the parameter is set, each result includes a `collections` array showing the dataset’s parent and linked collections. Each entry includes `id`, `name`, and `alias`, for example: + +```json +"collections": [ + { + "id": 11, + "name": "My cool collection", + "alias": "dvcb50a190" + } +] +``` \ No newline at end of file diff --git a/doc/release-notes/11562-templates-api.md b/doc/release-notes/11562-templates-api.md new file mode 100644 index 00000000000..30e35687a33 --- /dev/null +++ b/doc/release-notes/11562-templates-api.md @@ -0,0 +1,4 @@ +New endpoints have been implemented in the Dataverses API for the management of dataverse templates: + +- POST `/dataverses/{id}/templates`: Creates a template for a given Dataverse collection ``id``. +- GET `/dataverses/{id}/templates`: Lists the templates for a given Dataverse collection ``id``. diff --git a/doc/release-notes/11592-HandleParsing_fix.md b/doc/release-notes/11592-HandleParsing_fix.md new file mode 100644 index 00000000000..087655d77c4 --- /dev/null +++ b/doc/release-notes/11592-HandleParsing_fix.md @@ -0,0 +1 @@ +A bug introduced in v6.5 broken Handle parsing when using a lower-case shoulder. This is now fixed. \ No newline at end of file diff --git a/doc/release-notes/11605-existing-shib-external-users-auth.md b/doc/release-notes/11605-existing-shib-external-users-auth.md new file mode 100644 index 00000000000..a836c7737bb --- /dev/null +++ b/doc/release-notes/11605-existing-shib-external-users-auth.md @@ -0,0 +1 @@ +Implemented a new feature flag ``dataverse.feature.api-bearer-auth-use-shib-user-on-id-match``, which supports the use of the new Dataverse client in instances that have historically allowed login via Shibboleth. Specifically, with this flag enabled, when an OIDC bridge is configured to allow OIDC login with validation by the bridged Shibboleth providers, users with existing Shibboleth-based accounts in Dataverse can log in to those accounts, thereby maintaining access to their existing content and retaining their roles. (For security reasons, Dataverse's current support for direct login via Shibboleth cannot be used in browser-based clients.) \ No newline at end of file diff --git a/doc/release-notes/11614-include-isAdvancedSeachField-property.md b/doc/release-notes/11614-include-isAdvancedSeachField-property.md new file mode 100644 index 00000000000..a544570c52e --- /dev/null +++ b/doc/release-notes/11614-include-isAdvancedSeachField-property.md @@ -0,0 +1,3 @@ +The API endpoints `api/{dataverse-alias}/metadatablocks` and `/api/metadatablocks/{block_id}` have been extended to include the following field: + +- `isAdvancedSearchFieldType`: Whether the field can be used in advanced search or not. \ No newline at end of file diff --git a/doc/release-notes/11629-CSLFix.md b/doc/release-notes/11629-CSLFix.md new file mode 100644 index 00000000000..374a0dc78be --- /dev/null +++ b/doc/release-notes/11629-CSLFix.md @@ -0,0 +1 @@ +The styled citations available through the "View Styled Citations" menu were including extra characters, e.g. 'doi:' in the URL form of the PIDs in the citation. This is now fixed. \ No newline at end of file diff --git a/doc/release-notes/11632-commons-lang3-update.md b/doc/release-notes/11632-commons-lang3-update.md new file mode 100644 index 00000000000..c03929dfe02 --- /dev/null +++ b/doc/release-notes/11632-commons-lang3-update.md @@ -0,0 +1,20 @@ +Due to changes in how the commons-lang3 library handles a non-ascii chararacter, two keys in the citation.properties and citation.tsv files have changed to include i instead of ɨ. Translations will need to address this. + +controlledvocabulary.language.magɨ_(madang_province) => controlledvocabulary.language.magi_(madang_province) +controlledvocabulary.language.magɨyi => controlledvocabulary.language.magiyi + +## Upgrade Instructions + +x\. Update metadata blocks + +These changes reflect incremental improvements made to the handling of core metadata fields. + +Reload the citation.tsv file to handle the commons-lang3 change mentioned above. + +Expect the loading of the citation block to take several seconds because of its size (especially due to the number of languages). + +```shell +wget https://raw.githubusercontent.com/IQSS/dataverse/v6.8/scripts/api/data/metadatablocks/citation.tsv + +curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv +``` diff --git a/doc/release-notes/11633-list-dataverse-links-api-change.md b/doc/release-notes/11633-list-dataverse-links-api-change.md new file mode 100644 index 00000000000..9026349045b --- /dev/null +++ b/doc/release-notes/11633-list-dataverse-links-api-change.md @@ -0,0 +1 @@ +The [API for listing the collections a dataverse has been linked to](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#list-dataverse-collection-links) (`api/dataverses/$dataverse-alias/links`) has been refactored to return a new Json format. This is a breaking API. diff --git a/doc/release-notes/11634-get-dataset-file-available-categories-api.md b/doc/release-notes/11634-get-dataset-file-available-categories-api.md new file mode 100644 index 00000000000..38fe61b8eed --- /dev/null +++ b/doc/release-notes/11634-get-dataset-file-available-categories-api.md @@ -0,0 +1,4 @@ +### Get Dataset File Available Categories API + +- This new endpoint allows the user to get all of the available file categories for a dataset, both built-in and custom. + diff --git a/doc/release-notes/11645-existing-oauth-external-users-api-auth.md b/doc/release-notes/11645-existing-oauth-external-users-api-auth.md new file mode 100644 index 00000000000..4afc5a38fb2 --- /dev/null +++ b/doc/release-notes/11645-existing-oauth-external-users-api-auth.md @@ -0,0 +1,6 @@ +Implemented a new feature flag ``dataverse.feature.api-bearer-auth-use-oauth-user-on-id-match``, which supports the use of the new Dataverse client in instances that have historically allowed login via GitHub, ORCID, or Google. Specifically, with this flag enabled, when an OIDC bridge is configured to allow OIDC login with validation by the bridged OAuth providers, users with existing GitHub, ORCID, or Google accounts in Dataverse can log in to those accounts, thereby maintaining access to their existing content and retaining their roles. + +## New Settings + +- dataverse.feature.api-bearer-auth-use-oauth-user-on-id-match + diff --git a/doc/release-notes/11648-notifications-api-extension.md b/doc/release-notes/11648-notifications-api-extension.md new file mode 100644 index 00000000000..ee5aa22863d --- /dev/null +++ b/doc/release-notes/11648-notifications-api-extension.md @@ -0,0 +1,7 @@ +# getAllNotificationsForUser API extension + +- Extended endpoint getAllNotificationsForUser(``/notifications/all``), which now supports an optional query parameter ``inAppNotificationFormat`` which, if sent as ``true``, retrieves the fields needed to build the in-app notifications for the Notifications section of the Dataverse UI, omitting fields related to email notifications. See also #11648 and #11696. + +# Notifications triggered by API endpoints + +The addDataset and addDataverse API endpoints now trigger user notifications upon successful execution. See also #1342 and #11696. diff --git a/doc/release-notes/11650-unread.md b/doc/release-notes/11650-unread.md new file mode 100644 index 00000000000..07ab852e24d --- /dev/null +++ b/doc/release-notes/11650-unread.md @@ -0,0 +1,11 @@ +## API Updates + +### Support read/unread status for notifications + +The API for managing notifications has been extended. + +- displayAsRead boolean added to "get all" +- new GET unreadCount API endpoint +- new PUT markAsRead API endpoint + +See also [the guides](https://dataverse-guide--11664.org.readthedocs.build/en/11664/api/native-api.html#notifications), #11650, and #11664. diff --git a/doc/release-notes/11685-CurationStatus_fix.md b/doc/release-notes/11685-CurationStatus_fix.md new file mode 100644 index 00000000000..18bb6c8ad1d --- /dev/null +++ b/doc/release-notes/11685-CurationStatus_fix.md @@ -0,0 +1,6 @@ +The updates to support keeping the history of curation status labels added in #11268 +will incorrectly show curation statuses added prior to v6.7 as the current one, regardless of +whether newer statuses exist. This PR corrects the problem. + +(As a work-around for 6.7 admins can add createtime dates (must be prior to when 6.7 was installed) to the curationstatus table + for entries that have null createtimes. The code fix in this version properly handles null dates as indicating older/pre v6.7 curationstatuses.) \ No newline at end of file diff --git a/doc/release-notes/11689-builtin-users-api-bearer-auth-enhance.md b/doc/release-notes/11689-builtin-users-api-bearer-auth-enhance.md new file mode 100644 index 00000000000..671a5225781 --- /dev/null +++ b/doc/release-notes/11689-builtin-users-api-bearer-auth-enhance.md @@ -0,0 +1,9 @@ +## Security improvements for `api-bearer-auth-use-builtin-user-on-id-match` + +We’ve strengthened the security of the `api-bearer-auth-use-builtin-user-on-id-match` feature flag. It will now only work when the provided bearer token includes an `idp` claim that matches the Keycloak Service Provider identifier. + +By enforcing this check, the risk of impersonation from other identity providers is significantly reduced, since they would need to be explicitly configured with this specific, non-standard identifier. + +See: +- [#11622 (comment)](https://github.com/IQSS/dataverse/pull/11622#discussion_r2216017175) +- [#11689](https://github.com/IQSS/dataverse/issues/11689) diff --git a/doc/release-notes/11722-configbaker-bc-missing.md b/doc/release-notes/11722-configbaker-bc-missing.md new file mode 100644 index 00000000000..3a2074bbf44 --- /dev/null +++ b/doc/release-notes/11722-configbaker-bc-missing.md @@ -0,0 +1 @@ +When following the container demo tutorial, it was not possible to update Solr fields after adding additional metadata blocks. This has been fixed. See #11722 and #11723 diff --git a/doc/release-notes/11724-extend-list-dataverse-collection-links.md b/doc/release-notes/11724-extend-list-dataverse-collection-links.md new file mode 100644 index 00000000000..9026349045b --- /dev/null +++ b/doc/release-notes/11724-extend-list-dataverse-collection-links.md @@ -0,0 +1 @@ +The [API for listing the collections a dataverse has been linked to](https://guides.dataverse.org/en/latest/admin/dataverses-datasets.html#list-dataverse-collection-links) (`api/dataverses/$dataverse-alias/links`) has been refactored to return a new Json format. This is a breaking API. diff --git a/doc/release-notes/11776- index fix.md b/doc/release-notes/11776- index fix.md new file mode 100644 index 00000000000..959b1ca3e95 --- /dev/null +++ b/doc/release-notes/11776- index fix.md @@ -0,0 +1 @@ +A bug, introduced in v6.7, that caused files in draft versions that were added after the initial dataset version was published, has been fixed. \ No newline at end of file diff --git a/doc/release-notes/4.16-release-notes.md b/doc/release-notes/4.16-release-notes.md index 66241a42777..8feb263d2ab 100644 --- a/doc/release-notes/4.16-release-notes.md +++ b/doc/release-notes/4.16-release-notes.md @@ -91,6 +91,7 @@ If this is a new installation, please see our Qualitative Data Repository Github Repository. The spreadsheet viewer was contributed by the [Dataverse SSHOC][] project. + +[Dataverse SSHOC]: https://www.sshopencloud.eu/news/developing-sshoc-dataverse + +### Microsoft Login + +Users can now create Dataverse accounts and login using self-provisioned Microsoft accounts such as live.com and outlook.com. Users can also use Microsoft accounts managed by their institutions. This new feature not only makes it easier to log in to Dataverse but will also streamline the interaction between any external tools that utilize Azure services that require login. + +### Add Data and Host Dataverse + +More workflows to add data have been added across the UI, including a new button on the My Data tab of the Account page, as well as a link in the Dataverse navbar, which will display on every page. This will provider users much easier access to start depositing data. By default, the Host Dataverse will be the installation root dataverse for these new Add Data workflows, but there is now a dropdown component allowing creators to select a dataverse you have proper permissions to create a new dataverse or dataset in. + +### Primefaces 7 + +Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements. + +### Integration Test Pipeline and Test Health Reporting + +As part of the Dataverse Community's ongoing efforts to provide more robust automated testing infrastructure, and in support of the project's desire to have the develop branch constantly in a "release ready" state, API-based integration tests are now run every time a branch is merged to develop. The status of the last test run is available as a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository. + +### Make Data Count Metrics Updates + +A new configuration option has been added that allows Make Data Count metrics to be collected, but not reflected in the front end. This option was designed to allow installations to collect and verify metrics for a period before turning on the display to users. + +### Search API Enhancements + +The Dataverse Search API will now display unpublished content when an API token is passed (and appropriate permissions exist). + +### Additional Dataset Author Identifiers + +The following dataset author identifiers are now supported: + +- DAI: https://en.wikipedia.org/wiki/Digital_Author_Identifier +- ResearcherID: http://researcherid.com +- ScopusID: https://www.scopus.com + +## Major Use Cases + +Newly-supported use cases in this release include: + +- Users can view previews of several common file types, eliminating the need to download or explore a file just to get a quick look. +- Users can log in using self-provisioned Microsoft accounts and also can log in using Microsoft accounts managed by an organization. +- Dataverse administrators can now revoke and regenerate API tokens with an API call. +- Users will receive notifications when their ingests complete, and will be informed if the ingest was a success or failure. +- Dataverse developers will receive feedback about the health of the develop branch after their pull request was merged. +- Dataverse tool developers will be able to query the Dataverse API for unpublished data as well as published data. +- Dataverse administrators will be able to collect Make Data Count metrics without turning on the display for users. +- Users with a DAI, ResearcherID, or ScopusID and use these author identifiers in their datasets. + +## Notes for Dataverse Installation Administrators + +### API Token Management + +- You can now delete a user's API token, recreate a user's API token, and find a token's expiration date. See the Native API guide for more information. + +### New JVM Options + +[:mdcbaseurlstring](http://guides.dataverse.org/en/4.18/installation/config.html#mdcbaseurlstring) allows dataverse administrators to use a test base URL for Make Data Count. + +### New Database Settings + +[:DisplayMDCMetrics](http://guides.dataverse.org/en/4.18/installation/config.html#DisplayMDCMetrics) can be set to false to disable display of MDC metrics. + +## Notes for Tool Developers and Integrators + +### Preview Mode + +Tool Developers can now add the `hasPreviewMode` parameter to their file level external tools. This setting provides an embedded, simplified view of the tool on the file pages for any installation that installs the tool. See Building External Tools for more information. + +### API Token Management + +If your tool writes content back to Dataverse, you can now take advantage of administrative endpoints that delete and re-create API tokens. You can also use an endpoint that provides the expiration date of a specific API token. See the Native API guide for more information. + +### View Unpublished Data Using Search API + +If you pass a token, the search API output will include unpublished content. + +## Complete List of Changes + +For the complete list of code changes in this release, see the 4.18 milestone in Github. + +For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our Installation Guide. + +## Upgrade + +1. Undeploy the previous version. + +- <glassfish install path>/glassfish4/bin/asadmin list-applications +- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse + +2. Stop glassfish and remove the generated directory, start. + +- service glassfish stop +- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated +- service glassfish start + +3. Deploy this version. + +- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.war + +4. Restart glassfish. + +5. Update Citation Metadata Block + +- `wget https://github.com/IQSS/dataverse/releases/download/v4.18/citation.tsv` +- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"` diff --git a/doc/release-notes/4.18.1-release-notes.md b/doc/release-notes/4.18.1-release-notes.md new file mode 100644 index 00000000000..99db66464a8 --- /dev/null +++ b/doc/release-notes/4.18.1-release-notes.md @@ -0,0 +1,45 @@ +# Dataverse 4.18.1 + +This release provides a fix for a regression introduced in 4.18 and implements a few other small changes. + +## Release Highlights + +### Proper Validation Messages + +When creating or editing dataset metadata, users were not receiving field-level indications about what entries failed validation and were only receiving a message at the top of the page. This fix restores field-level indications. + +## Major Use Cases + +Use cases in this release include: + +- Users will receive the proper messaging when dataset metadata entries are not valid. +- Users can now view the expiration date of an API token and revoke a token on the API Token tab of the account page. + +## Complete List of Changes + +For the complete list of code changes in this release, see the 4.18.1 milestone in Github. + +For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our Installation Guide. + +## Upgrade + +1. Undeploy the previous version. + +- <glassfish install path>/glassfish4/bin/asadmin list-applications +- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse + +2. Stop glassfish and remove the generated directory, start. + +- service glassfish stop +- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated +- service glassfish start + +3. Deploy this version. + +- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.1.war + +4. Restart glassfish. diff --git a/doc/release-notes/4.19-release-notes.md b/doc/release-notes/4.19-release-notes.md new file mode 100644 index 00000000000..70c8711582c --- /dev/null +++ b/doc/release-notes/4.19-release-notes.md @@ -0,0 +1,125 @@ +# Dataverse 4.19 + +This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +## Release Highlights + +### Open ID Connect Support + +Dataverse now provides basic support for any OpenID Connect (OIDC) compliant authentication provider. + +Prior to supporting this standard, new authentication methods needed to be added by pull request. OIDC support provides a standardized way for authentication, sharing user information, and more. You are able to use any compliant provider just by loading a configuration file, without touching the codebase. While the usual prominent providers like Google and others feature OIDC support there are plenty of other options to easily attach your installation to a custom authentication provider, using enterprise grade software. + +See the [OpenID Connect Login Options documentation](http://guides.dataverse.org/en/4.19/installation/oidc.html) in the Installation Guide for more details. + +This is to be extended with support for attribute mapping, group syncing and more in future versions of the code. + +### Python Installer + +We are introducing a new installer script, written in Python. It is intended to eventually replace the old installer (written in Perl). For now it is being offered as an (experimental) alternative. + +See [README_python.txt](https://github.com/IQSS/dataverse/blob/v4.19/scripts/installer/README_python.txt) in scripts/installer and/or in the installer bundle for more information. + +## Major Use Cases + +Newly-supported use cases in this release include: + +- Dataverse installation administrators will be able to experiment with a Python Installer (Issue #3937, PR #6484) +- Dataverse installation administrators will be able to set up an OIDC-compliant login options by editing a configuration file and with no need for a code change (Issue #6432, PR #6433) +- Following setup by a Dataverse administration, users will be able to log in using OIDC-compliant methods (Issue #6432, PR #6433) +- Users of the Search API will see additional fields in the JSON output (Issues #6300, #6396, PR #6441) +- Users loading the support form will now be presented with the math challenge as expected and will be able to successfully send an email to support (Issue #6307, PR #6462) +- Users of https://mybinder.org can now spin up Jupyter Notebooks and other computational environments from Dataverse DOIs (Issue #4714, PR #6453) + +## Notes for Dataverse Installation Administrators + +### Security vulnerability in Solr + +A serious security issue has recently been identified in multiple versions of Solr search engine, including v.7.3 that Dataverse is currently using. Follow the instructions below to verify that your installation is safe from a potential attack. You can also consult the following link for a detailed description of the issue: + +RCE in Solr via Velocity Template. + +The vulnerability allows an intruder to execute arbitrary code on the system running Solr. Fortunately, it can only be exploited if Solr API access point is open to direct access from public networks (aka, "the outside world"), which is NOT needed in a Dataverse installation. + +We have always recommended having Solr (port 8983) firewalled off from public access in our installation guides. But we recommend that you double-check your firewall settings and verify that the port is not accessible from outside networks. The simplest quick test is to try the following URL in your browser: + + `http://:8983` + +and confirm that you get "access denied" or that it times out, etc. + +In most cases, when Solr runs on the same server as the Dataverse web application, you will only want the port accessible from localhost. We also recommend that you add the following arguments to the Solr startup command: `-j jetty.host=127.0.0.1`. This will make Solr accept connections from localhost only; adding redundancy, in case of the firewall failure. + +In a case where Solr needs to run on a different host, make sure that the firewall limits access to the port only to the Dataverse web host(s), by specific ip address(es). + +We would also like to reiterate that it is simply never a good idea to run Solr as root! Running the process as a non-privileged user would substantially minimize any potential damage even in the event that the instance is compromised. + +### Citation and Geospatial Metadata Block Updates + +We updated two metadata blocks in this release. Updating these metadata blocks is mentioned in the step-by-step upgrade instructions below. + +### Run ReExportall + +We made changes to the JSON Export in this release (#6246). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below. + +### BinderHub + +https://mybinder.org now supports spinning up Jupyter Notebooks and other computational environments from Dataverse DOIs. + +### Widgets update for OpenScholar + +We updated the code for widgets so that they will keep working in OpenScholar sites after the upcoming upgrade OpenScholar upgrade to Drupal 8. If users of your dataverse have embedded widgets on an Openscholar site that upgrades to Drupal 8, you will need to run this Dataverse version (or later) for the widgets to keep working. + +### Payara tech preview + +Dataverse 4 has always run on Glassfish 4.1 but changes in this release (PR #6523) should open the door to upgrading to Payara 5 eventually. Production installations of Dataverse should remain on Glassfish 4.1 but feedback from any experiments running Dataverse on Payara 5 is welcome via the [usual channels](https://dataverse.org/contact). + +## Notes for Tool Developers and Integrators + +### Search API + +The boolean parameter `query_entities` has been removed from the Search API. The former "true" behavior of "whether entities are queried via direct database calls (for developer use)" is now always true. + +Additional fields are now available via the Search API, mostly related to information about specific dataset versions. + +## Complete List of Changes + +For the complete list of code changes in this release, see the 4.19 milestone in Github. + +For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our Installation Guide. + +## Upgrade + +1. Undeploy the previous version. + +- <glassfish install path>/glassfish4/bin/asadmin list-applications +- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse + +2. Stop glassfish and remove the generated directory, start. + +- service glassfish stop +- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated +- service glassfish start + +3. Deploy this version. + +- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.19.war + +4. Restart glassfish. + +5. Update Citation Metadata Block + +- `wget https://github.com/IQSS/dataverse/releases/download/v4.19/citation.tsv` +- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"` + +6. Update Geospatial Metadata Block + +- `wget https://github.com/IQSS/dataverse/releases/download/v4.19/geospatial.tsv` +- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @geospatial.tsv -H "Content-type: text/tab-separated-values"` + +7. (Optional) Run ReExportall to update JSON Exports + + diff --git a/doc/release-notes/4.20-release-notes.md b/doc/release-notes/4.20-release-notes.md new file mode 100644 index 00000000000..e29953db101 --- /dev/null +++ b/doc/release-notes/4.20-release-notes.md @@ -0,0 +1,224 @@ +# Dataverse 4.20 + +This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +## Release Highlights + +### Multiple Store Support + +Dataverse can now be configured to store files in more than one place at the same time (multiple file, s3, and/or swift stores). + +General information about this capability can be found below and in the Configuration Guide - File Storage section. + +### S3 Direct Upload support + +S3 stores can now optionally be configured to support direct upload of files, as one option for supporting upload of larger files. In the current implementation, each file is uploaded in a single HTTP call. For AWS, this limits file size to 5 GB. With Minio the theoretical limit should be 5 TB and 50+ GB file uploads have been tested successfully. (In practice other factors such as network timeouts may prevent a successful upload a multi-TB file and minio instances may be configured with a < 5 TB single HTTP call limit.) No other S3 service providers have been tested yet. Their limits should be the lower of the maximum object size allowed and any single HTTP call upload limit. + +General information about this capability can be found in the Big Data Support Guide with specific information about how to enable it in the Configuration Guide - File Storage section. + +To support large data uploads, installations can now configure direct upload to S3, bypassing the application server. This will allow for larger uploads over a more resilient transfer method. + +General information about this capability can be found below and in the Configuration Guide. + +### Integration Test Coverage Reporting + +The percentage of code covered by the API-based integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository. + +### New APIs + +New APIs for Role Management and Dataset Size have been added. Previously, managing roles at the dataset and file level was only possible through the UI. API users can now also retrieve the size of a dataset through an API call, with specific parameters depending on the type of information needed. + +More information can be found in the API Guide. + +## Major Use Cases + +Newly-supported use cases in this release include: + +- Users will now be able to see the number of linked datasets and dataverses accurately reflected in the facet counts on the Dataverse search page. (Issue #6564, PR #6262) +- Users will be able to upload large files directly to S3. (Issue #6489, PR #6490) +- Users will be able to see the PIDs of datasets and files in the Guestbook export. (Issue #6534, PR #6628) +- Administrators will be able to configure multiple stores per Dataverse installation, which allow dataverse-level setting of storage location, upload size limits, and supported data transfer methods (Issue #6485, PR #6488) +- Administrators and integrators will be able to manage roles using a new API. (Issue #6290, PR #6622) +- Administrators and integrators will be able to determine a dataset's size. (Issue #6524, PR #6609) +- Integrators will now be able to retrieve the number of files in a dataset as part of a single API call instead of needing to count the number of files in the response. (Issue #6601, PR #6623) + +## Notes for Dataverse Installation Administrators + +### Potential Data Integrity Issue + +We recently discovered a *potential* data integrity issue in Dataverse databases. One manifests itself as duplicate DataFile objects created for the same uploaded file (https://github.com/IQSS/dataverse/issues/6522); the other as duplicate DataTable (tabular metadata) objects linked to the same DataFile (https://github.com/IQSS/dataverse/issues/6510). This issue impacted approximately .03% of datasets in Harvard's Dataverse. + +To see if any datasets in your installation have been impacted by this data integrity issue, we've provided a diagnostic script here: + +https://github.com/IQSS/dataverse/raw/develop/scripts/issues/6510/check_datafiles_6522_6510.sh + +The script relies on the PostgreSQL utility psql to access the database. You will need to edit the credentials at the top of the script to match your database configuration. + +If neither of the two issues is present in your database, you will see a message "... no duplicate DataFile objects in your database" and "no tabular files affected by this issue in your database". + +If either, or both kinds of duplicates are detected, the script will provide further instructions. We will need you to send us the produced output. We will then assist you in resolving the issues in your database. + +### Multiple Store Support Changes + +**Existing installations will need to make configuration changes to adopt this version, regardless of whether additional stores are to be added or not.** + +Multistore support requires that each store be assigned a label, id, and type - see the Configuration Guide for a more complete explanation. For an existing store, the recommended upgrade path is to assign the store id based on it's type, i.e. a 'file' store would get id 'file', an 's3' store would have the id 's3'. + +With this choice, no manual changes to datafile 'storageidentifier' entries are needed in the database. If you do not name your existing store using this convention, you will need to edit the database to maintain access to existing files. + +The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade: +For a file store: + + ./asadmin create-jvm-options "\-Ddataverse.files.file.type=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.label=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.directory=" + +For a s3 store: + + ./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3" + ./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3" + ./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=" + ./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=" + +Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured. + +Once these options are set, restarting the Glassfish service is all that is needed to complete the change. + +Note that the "\-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above. + +Also note that the :MaxFileUploadSizeInBytes property has a new option to provide independent limits for each store instead of a single value for the whole installation. The default is to apply any existing limit defined by this property to all stores. + +### Direct S3 Upload Changes + +Direct upload to S3 is enabled per store by one new jvm option: + + ./asadmin create-jvm-options "\-Ddataverse.files..upload-redirect=true" + +The existing :MaxFileUploadSizeInBytes property and ```dataverse.files..url-expiration-minutes``` jvm option for the same store also apply to direct upload. + +Direct upload via the Dataverse web interface is transparent to the user and handled automatically by the browser. Some minor differences in file upload exist: directly uploaded files are not unzipped and Dataverse does not scan their content to help in assigning a MIME type. Ingest of tabular files and metadata extraction from FITS files will occur, but can be turned off for files above a specified size limit through the new dataverse.files..ingestsizelimit jvm option. + +API calls to support direct upload also exist, and, if direct upload is enabled for a store in Dataverse, the latest DVUploader (v1.0.8) provides a'-directupload' flag that enables its use. + +### Solr Update + +With this release we upgrade to the latest available stable release in the Solr 7.x branch. We recommend a fresh installation of Solr 7.7.2 (the index will be empty) +followed by an "index all". + +Before you start the "index all", Dataverse will appear to be empty because +the search results come from Solr. As indexing progresses, results will appear +until indexing is complete. + +### Dataverse Linking Fix + +The fix implemented for #6262 will display the datasets contained in linked dataverses in the linking dataverse. The full reindex described above will correct these counts. Going forward, this will happen automatically whenever a dataverse is linked. + +### Google Analytics Download Tracking Bug + +The button tracking capability discussed in the installation guide (http://guides.dataverse.org/en/4.20/installation/config.html#id88) relies on an analytics-code.html file that must be configured using the :WebAnalyticsCode setting. The example file provided in the installation guide is no longer compatible with recent Dataverse releases (>v4.16). Installations using this feature should update their analytics-code.html file by following the installation instructions using the updated example file. Alternately, sites can modify their existing files to include the one-line change made in the example file at line 120. + +### Run ReExportall + +We made changes to the JSON Export in this release (Issue 6650, PR #6669). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below. + +### New JVM Options and Database Settings + +## New JVM Options for file storage drivers + +- The JVM option dataverse.files.file.directory= controls where temporary files are stored (in the /temp subdir of the defined directory), independent of the location of any 'file' store defined above. +- The JVM option dataverse.files..upload-redirect enables direct upload of files added to a dataset to the S3 bucket. (S3 stores only!) +- The JVM option dataverse.files..MaxFileUploadSizeInBytes controls the maximum size of file uploads allowed for the given file store. +- The JVM option dataverse.files..ingestsizelimit controls the maximum size of files for which ingest will be attempted, for the given file store. + +## New Database Settings for Shibboleth + +- The database setting :ShibAffiliationAttribute can now be set to prevent affiliations for Shibboleth users from being reset upon each log in. + +## Notes for Tool Developers and Integrators + +### Integration Test Coverage Reporting + +API-based integration tests are run every time a branch is merged to develop and the percentage of code covered by these integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository. + +### Guestbook Column Changes + +Users of downloaded guestbooks should note that two new columns have been added: + +- Dataset PID +- File PID + +If you are expecting column in the CSV file to be in a particular order, you will need to make adjustments. + +Old columns: Guestbook, Dataset, Date, Type, File Name, File Id, User Name, Email, Institution, Position, Custom Questions + +New columns: Guestbook, Dataset, Dataset PID, Date, Type, File Name, File Id, File PID, User Name, Email, Institution, Position, Custom Questions + +### API Changes + +As reported in #6570, the affiliation for dataset contacts has been wrapped in parentheses in the JSON output from the Search API. These parentheses have now been removed. This is a backward incompatible change but it's expected that this will not cause issues for integrators. + +### Role Name Change + +The role alias provided in API responses has changed, so if anything was hard-coded to "editor" instead of "contributor" it will need to be updated. + +## Complete List of Changes + +For the complete list of code changes in this release, see the 4.20 milestone in Github. + +For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our Installation Guide. + +## Upgrade + +1. Undeploy the previous version. + +- <glassfish install path>/glassfish4/bin/asadmin list-applications +- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse + +2. Stop glassfish and remove the generated directory, start. + +- service glassfish stop +- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated +- service glassfish start + +3. Install and configure Solr v7.7.2 + +See http://guides.dataverse.org/en/4.20/installation/prerequisites.html#installing-solr + +4. Deploy this version. + +- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.20.war + +5. The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade: +For a file store: + + ./asadmin create-jvm-options "\-Ddataverse.files.file.type=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.label=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.directory=" + +For a s3 store: + + ./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3" + ./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3" + ./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=" + ./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=" + +Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured. + +6. Restart glassfish. + +7. Update Citation Metadata Block + +- `wget https://github.com/IQSS/dataverse/releases/download/4.20/citation.tsv` +- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"` + +8. Kick off full reindex + +http://guides.dataverse.org/en/4.20/admin/solr-search-index.html + +9. (Recommended) Run ReExportall to update JSON Exports + + diff --git a/doc/release-notes/5.0-release-notes.md b/doc/release-notes/5.0-release-notes.md new file mode 100644 index 00000000000..f87d428f30a --- /dev/null +++ b/doc/release-notes/5.0-release-notes.md @@ -0,0 +1,353 @@ +# Dataverse 5.0 + +This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +Please note that this is a major release and these are long release notes. We offer no apologies. :) + +## Release Highlights + +### Continued Dataset and File Redesign: Dataset and File Button Redesign, Responsive Layout + +The buttons available on the Dataset and File pages have been redesigned. This change is to provide more scalability for future expanded options for data access and exploration, and to provide a consistent experience between the two pages. The dataset and file pages have also been redesigned to be more responsive and function better across multiple devices. + +This is an important step in the incremental process of the Dataset and File Redesign project, following the release of on-page previews, filtering and sorting options, tree view, and other enhancements. Additional features in support of these redesign efforts will follow in later 5.x releases. + +### Payara 5 + +A major upgrade of the application server provides security updates, access to new features like MicroProfile Config API, and will enable upgrades to other core technologies. + +Note that moving from Glassfish to Payara will be required as part of the move to Dataverse 5. + +### Download Dataset + +Users can now more easily download all files in Dataset through both the UI and API. If this causes server instability, it's suggested that Dataverse Installation Administrators take advantage of the new Standalone Zipper Service described below. + +#### Download All Option on the Dataset Page + +In previous versions of Dataverse, downloading all files from a dataset meant several clicks to select files and initiate the download. The Dataset Page now includes a Download All option for both the original and archival formats of the files in a dataset under the "Access Dataset" button. + +#### Download All Files in a Dataset by API + +In previous versions of Dataverse, downloading all files from a dataset via API was a two step process: + +- Find all the database ids of the files. +- Download all the files, using those ids (comma-separated). + +Now you can download all files from a dataset (assuming you have access to them) via API by passing the dataset persistent ID (PID such as DOI or Handle) or the dataset's database id. Versions are also supported, and you can pass :draft, :latest, :latest-published, or numbers (1.1, 2.0) similar to the "download metadata" API. + +### A Multi-File, Zipped Download Optimization + +In this release we are offering an experimental optimization for the multi-file, download-as-zip functionality. If this option is enabled, instead of enforcing size limits, we attempt to serve all the files that the user requested (that they are authorized to download), but the request is redirected to a standalone zipper service running as a cgi executable. Thus moving these potentially long-running jobs completely outside the Application Server (Payara); and preventing service threads from becoming locked serving them. Since zipping is also a CPU-intensive task, it is possible to have this service running on a different host system, thus freeing the cycles on the main Application Server. The system running the service needs to have access to the database as well as to the storage filesystem, and/or S3 bucket. + +Please consult the scripts/zipdownload/README.md in the Dataverse 5 source tree. + +The components of the standalone "zipper tool" can also be downloaded +here: + +https://github.com/IQSS/dataverse/releases/download/v5.0/zipper.zip + +### Updated File Handling + +Files without extensions can now be uploaded through the UI. This release also changes the way Dataverse handles duplicate (filename or checksum) files in a dataset. Specifically: + +- Files with the same checksum can be included in a dataset, even if the files are in the same directory. +- Files with the same filename can be included in a dataset as long as the files are in different directories. +- If a user uploads a file to a directory where a file already exists with that directory/filename combination, Dataverse will adjust the file path and names by adding "-1" or "-2" as applicable. This change will be visible in the list of files being uploaded. +- If the directory or name of an existing or newly uploaded file is edited in such a way that would create a directory/filename combination that already exists, Dataverse will display an error. +- If a user attempts to replace a file with another file that has the same checksum, an error message will be displayed and the file will not be able to be replaced. +- If a user attempts to replace a file with a file that has the same checksum as a different file in the dataset, a warning will be displayed. +- Files without extensions can now be uploaded through the UI. + +### Pre-Publish DOI Reservation with DataCite + +Dataverse installations using DataCite will be able to reserve the persistent identifiers for datasets with DataCite ahead of publishing time. This allows the DOI to be reserved earlier in the data sharing process and makes the step of publishing datasets simpler and less error-prone. + +### Primefaces 8 + +Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements. + +## Major Use Cases + +Newly-supported use cases in this release include: + +- Users will be presented with a new workflow around dataset and file access and exploration. (Issue #6684, PR #6909) +- Users will experience a UI appropriate across a variety of device sizes. (Issue #6684, PR #6909) +- Users will be able to download an entire dataset without needing to select all the files in that dataset. (Issue #6564, PR #6262) +- Users will be able to download all files in a dataset with a single API call. (Issue #4529, PR #7086) +- Users will have DOIs reserved for their datasets upon dataset create instead of at publish time. (Issue #5093, PR #6901) +- Users will be able to upload files without extensions. (Issue #6634, PR #6804) +- Users will be able to upload files with the same name in a dataset, as long as a those files are in different file paths. (Issue #4813, PR #6924) +- Users will be able to upload files with the same checksum in a dataset. (Issue #4813, PR #6924) +- Users will be less likely to encounter locks during the publishing process due to PID providers being unavailable. (Issue #6918, PR #7118) +- Users will now have their files validated during publish, and in the unlikely event that anything has happened to the files between deposit and publish, they will be able to take corrective action. (Issue #6558, PR #6790) +- Administrators will likely see more success with Harvesting, as many minor harvesting issues have been resolved. (Issues #7127, #7128, #4597, #7056, #7052, #7023, #7009, and #7003) +- Administrators can now enable an external zip service that frees up application server resources and allows the zip download limit to be increased. (Issue #6505, PR #6986) +- Administrators can now create groups based on users' email domains. (Issue #6936, PR #6974) +- Administrators can now set date facets to be organized chronologically. (Issue #4977, PR #6958) +- Administrators can now link harvested datasets using an API. (Issue #5886, PR #6935) +- Administrators can now destroy datasets with mapped shapefiles. (Issue #4093, PR #6860) + +## Notes for Dataverse Installation Administrators + +### Glassfish to Payara + +This upgrade requires a few extra steps. See the detailed upgrade instructions below. + +### Dataverse Installations Using DataCite: Upgrade Action Required + +If you are using DataCite as your DOI provider you must add a new JVM option called "doi.dataciterestapiurlstring" with a value of "https://api.datacite.org" for production environments and "https://api.test.datacite.org" for test environments. More information about this JVM option can be found in the [Installation Guide](http://guides.dataverse.org/en/5.0/installation/). + +"doi.mdcbaseurlstring" should be deleted if it was previously set. + +### Dataverse Installations Using DataCite: Upgrade Action Recommended + +For installations that are using DataCite, Dataverse v5.0 introduces a change in the process of registering the Persistent Identifier (DOI) for a dataset. Instead of registering it when the dataset is published for the first time, Dataverse will try to "reserve" the DOI when it's created (by registering it as a "draft", using DataCite terminology). When the user publishes the dataset, the DOI will be publicized as well (by switching the registration status to "findable"). This approach makes the process of publishing datasets simpler and less error-prone. + +New APIs have been provided for finding any unreserved DataCite-issued DOIs in your Dataverse, and for reserving them (see below). While not required - the user can still attempt to publish a dataset with an unreserved DOI - having all the identifiers reserved ahead of time is recommended. If you are upgrading an installation that uses DataCite, we specifically recommend that you reserve the DOIs for all your pre-existing unpublished drafts as soon as Dataverse v5.0 is deployed, since none of them were registered at create time. This can be done using the following API calls: + +- `/api/pids/unreserved` will report the ids of the datasets +- `/api/pids/:persistentId/reserve` reserves the assigned DOI with DataCite (will need to be run on every id reported by the the first API). + +See the [Native API Guide](http://guides.dataverse.org/en/5.0/api/native-api.html) for more information. + +Scripted, the whole process would look as follows (adjust as needed): + +``` + API_TOKEN='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' + + curl -s -H "X-Dataverse-key:$API_TOKEN" http://localhost:8080/api/pids/unreserved | + # the API outputs JSON; note the use of jq to parse it: + jq '.data.count[].pid' | tr -d '"' | + while read doi + do + curl -s -H "X-Dataverse-key:$API_TOKEN" -X POST http://localhost:8080/api/pids/:persistentId/reserve?persistentId=$doi + done +``` + +Going forward, once all the DOIs have been reserved for the legacy drafts, you may still get an occasional dataset with an unreserved identifier. DataCite service instability would be a potential cause. There is no reason to expect that to happen often, but it is not impossible. You may consider running the script above (perhaps with some extra diagnostics added) regularly, from a cron job or otherwise, to address this preemptively. + +### Terms of Use Display Updates + +In this release we’ve fixed an issue that would cause the Application Terms of Use to not display when the user's language is set to a language that does not match one of the languages for which terms were created and registered for that Dataverse installation. Instead of the expected Terms of Use, users signing up could receive the “There are no Terms of Use for this Dataverse installation” message. This could potentially result in some users signing up for an account without having the proper Terms of Use displayed. This will only affect installations that use the :ApplicationTermsOfUse setting. + +Please note that there is not currently a native workflow in Dataverse to display updated Terms of Use to a user or to force re-agreement. This would only potentially affect users that have signed up since the upgrade to 4.17 (or a following release if 4.17 was skipped). + +### Datafiles Validation when Publishing Datasets + +When a user requests to publish a dataset, Dataverse will now attempt to validate the physical files in the dataset, by recalculating the checksums and verifying them against the values in the database. The goal is to prevent any corrupted files in published datasets. Most of all the instances of actual damage to physical files that we've seen in the past happened while the datafiles were still in the Draft state. (Physical files become essentially read-only once published). So this is the logical place to catch any such issues. + +If any files in the dataset fail the validation, the dataset does not get published, and the user is notified that they need to contact their Dataverse support in order to address the issue before another attempt to publish can be made. See the "Troubleshooting" section of the Guide on how to fix such problems. + +This validation will be performed asynchronously, the same way as the registration of the file-level persistent ids. Similarly to the file PID registration, this validation process can be disabled on your system, with the setting `:FileValidationOnPublishEnabled`. (A Dataverse admin may choose to disable it if, for example, they are already running an external auditing system to monitor the integrity of the files in their Dataverse, and would prefer the publishing process to take less time). See the Configuration section of the [Installation Guide](http://guides.dataverse.org/en/5.0/installation/config.rst). + +Please note that we are not aware of any bugs in the current versions of Dataverse that would result in damage to users' files. But you may have some legacy files in your archive that were affected by some issue in the past, or perhaps affected by something outside Dataverse, so we are adding this feature out of abundance of caution. An example of a problem we've experienced in the early versions of Dataverse was a possible scenario where a user actually attempted to delete a Draft file from an unpublished version, where the database transaction would fail for whatever reason, but only after the physical file had already been deleted from the filesystem. Thus resulting in a datafile entry remaining in the dataset, but with the corresponding physical file missing. The fix for this case, since the user wanted to delete the file in the first place, is simply to confirm it and purge the datafile entity from the database. + +### The Setting :PIDAsynchRegFileCount is Deprecated as of 5.0 + +It used to specify the number of datafiles in the dataset to warrant adding a lock during publishing. As of v5.0 all datasets get locked for the duration of the publishing process. The setting will be ignored if present. + +### Location Changes for Related Projects + +The dataverse-ansible and dataverse-previewers repositories have been moved to the GDCC Organization on GitHub. If you have been referencing the dataverse-ansible repository from IQSS and the dataverse-previewers from QDR, please instead use them from their new locations: + + + + +### Harvesting Improvements + +Many updates have been made to address common Harvesting failures. You may see Harvests complete more often and have a higher success rate on a dataset-by-dataset basis. + +### New JVM Options and Database Settings + +Several new JVM options and DB Settings have been added in this release. More documentation about each of these settings can be found in the Configuration section of the [Installation Guide](http://guides.dataverse.org/en/5.0/installation/config.rst). + +#### New JVM Options + +- doi.dataciterestapiurlstring: Set with a value of "https://api.datacite.org" for production environments and "https://api.test.datacite.org" for test environments. Must be set if you are using DataCite as your DOI provider. +- dataverse.useripaddresssourceheader: If set, specifies an HTTP Header such as X-Forwarded-For to use to retrieve the user's IP address. This setting is useful in cases such as running Dataverse behind load balancers where the default option of getting the Remote Address from the servlet isn't correct (e.g. it would be the load balancer IP address). Note that unless your installation always sets the header you configure here, this could be used as a way to spoof the user's address. See the Configuration section of the [Installation Guide](http://guides.dataverse.org/en/5.0/installation/config.rst) for more information about proper use and security concerns. +- http.request-timeout-seconds: To facilitate large file upload and download, the Dataverse installer bumps the Payara **server-config.network-config.protocols.protocol.http-listener-1.http.request-timeout-seconds** setting from its default 900 seconds (15 minutes) to 1800 (30 minutes). + +#### New Database Settings + +- :CustomZipDownloadServiceUrl: If defined, this is the URL of the zipping service outside the main application server where zip downloads should be directed (instead of /api/access/datafiles/). +- :ShibAttributeCharacterSetConversionEnabled: By default, all attributes received from Shibboleth are converted from ISO-8859-1 to UTF-8. You can disable this behavior by setting to false. +- :ChronologicalDateFacets: Facets with Date/Year are sorted chronologically by default, with the most recent value first. To have them sorted by number of hits, e.g. with the year with the most results first, set this to false. +- :NavbarGuidesUrl: Set to a fully-qualified URL which will be used for the "User Guide" link in the navbar. +- :FileValidationOnPublishEnabled: Toggles validation of the physical files in the dataset when it's published, by recalculating the checksums and comparing against the values stored in the DataFile table. By default this setting is absent and Dataverse assumes it to be true. If enabled, the validation will be performed asynchronously, similarly to how we handle assigning persistent identifiers to datafiles, with the dataset locked for the duration of the publishing process. + +### Custom Analytics Code Changes + +You should update your custom analytics code to implement necessary changes for tracking updated dataset and file buttons. There was also a fix to the analytics code that will now properly track downloads for tabular files. + +For more information, see the documentation and sample analytics code snippet provided in [Installation Guide > Configuration > Web Analytics Code](http://guides.dataverse.org/en/5.0/installation/config.html#web-analytics-code) to reflect the changes implemented in this version (#6938/#6684). + +### Tracking Users' IP Addresses Behind an Address-Masking Proxy + +It is now possible to collect real user IP addresses in MDC logs and/or set up an IP group on a system running behind a proxy/load balancer that hides the addresses of incoming requests. See "Recording User IP Addresses" in the Configuration section of the [Installation Guide](http://guides.dataverse.org/en/5.0/installation/config.rst). + +### Reload Astrophysics Metadata Block (if used) + +Tooltips have been updated for the Astrophysics Metadata Block. If you'd like these updated Tooltips to be displayed to users of your installation, you should update the Astrophysics Metadata Block: + +`curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @astrophysics.tsv -H "Content-type: text/tab-separated-values"` + +We've included this in the step-by-step instructions below. + +### Run ReExportall + +We made changes to the JSON Export in this release. If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process following the steps in [Admin Guide](http://guides.dataverse.org/en/5.0/admin/metadataexport.html?highlight=export#batch-exports-through-the-api) + +We've included this in the step-by-step instructions below. + +## Notes for Tool Developers and Integrators + +## Complete List of Changes + +For the complete list of code changes in this release, see the [5.0 Milestone](https://github.com/IQSS/dataverse/milestone/89?closed=1) in Github. + +For help with upgrading, installing, or general questions please post to the [Dataverse Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our [Installation Guide](http://guides.dataverse.org/en/5.0/installation/) + +## Upgrade Instructions + +### Prerequisite: Retroactively store original file size + +Starting with release 4.10 the size of the saved original file (for an ingested tabular datafile) is stored in the database. We provided the following API that retrieve and permanently store the sizes for any already existing saved originals: + +`/api/admin/datafiles/integrity/fixmissingoriginalsizes` + +(See the documentation note in the Native API guide, under "[Datafile Integrity](https://guides.dataverse.org/en/5.0/api/native-api.html#datafile-integrity)"). + +To check your installation, issue this command: + + `curl http://localhost:8080/api/admin/datafiles/integrity/fixmissingoriginalsizes` + +### Upgrade from Glassfish 4.1 to Payara 5 + +The instructions below describe the upgrade procedure based on moving an existing glassfish4 domain directory under Payara. We recommend this method instead of setting up a brand-new Payara domain using the installer because it appears to be the easiest way to recreate your current configuration and preserve all your data. + +1. Download Payara, v5.2020.2 as of this writing: + + `curl -L -O https://github.com/payara/Payara/releases/download/payara-server-5.2020.2/payara-5.2020.2.zip` + `sha256sum payara-5.2020.2.zip` + 1f5f7ea30901b1b4c7bcdfa5591881a700c9b7e2022ae3894192ba97eb83cc3e + +2. Unzip it somewhere (/usr/local is a safe bet) + + `sudo unzip payara-5.2020.2.zip -d /usr/local/` + +3. Copy the Postgres driver to /usr/local/payara5/glassfish/lib + + `sudo cp /usr/local/glassfish4/glassfish/lib/postgresql-42.2.9.jar /usr/local/payara5/glassfish/lib/` + +4. Move payara5/glassfish/domains/domain1 out of the way + + `sudo mv /usr/local/payara5/glassfish/domains/domain1 /usr/local/payara5/glassfish/domains/domain1.orig` + +5. Undeploy the Dataverse web application (if deployed; version 4.20 is assumed in the example below) + + `sudo /usr/local/glassfish4/bin/asadmin list-applications` + `sudo /usr/local/glassfish4/bin/asadmin undeploy dataverse-4.20` + +6. Stop Glassfish; copy domain1 to Payara + + `sudo /usr/local/glassfish4/bin/asadmin stop-domain` + `sudo cp -ar /usr/local/glassfish4/glassfish/domains/domain1 /usr/local/payara5/glassfish/domains/` + +7. Remove the cache directories + + `sudo rm -rf /usr/local/payara5/glassfish/domains/domain1/generated/` + `sudo rm -rf /usr/local/payara5/glassfish/domains/domain1/osgi-cache/` + +8. Make the following changes in **domain.xml**: + + Replace the `-XX:PermSize` and `-XX:MaxPermSize` JVM options with `-XX:MetaspaceSize` and `-XX:MaxMetaspaceSize` + + ``` + -XX:MetaspaceSize=256m + -XX:MaxMetaspaceSize=512m + ``` + + Add the below JVM options beneath the -Ddataverse settings: + + ``` + -Dfish.payara.classloading.delegate=false + -XX:+UseG1GC + -XX:+UseStringDeduplication + -XX:+DisableExplicitGC + ``` + + Replace the following element: + + ``` + + + + + ``` + + with + + ``` + + + + ``` + +9. Change any full pathnames matching `/usr/local/glassfish4/...` to `/usr/local/payara5/...` or whatever it is in your case. Specifically check the `-Ddataverse.files.directory` and `-Ddataverse.files.file.directory` JVM options. + +10. In domain1/config/jhove.conf, change the hard-coded /usr/local/glassfish4 path, as above. + + (Optional): If you renamed your service account from glassfish to payara or appserver, update the ownership permissions. The Installation Guide recommends a service account of `dataverse`: + + `sudo chown -R dataverse /usr/local/payara5/glassfish/domains/domain1` + `sudo chown -R dataverse /usr/local/payara5/glassfish/lib` + +11. You will also need to check that the service account has write permission on the files directory, if they are located outside the old Glassfish domain. And/or make sure the service account has the correct AWS credentials, if you are using S3 for storage. + +12. Finally, start Payara: + + `sudo -u dataverse /usr/local/payara5/bin/asadmin start-domain` + +13. Deploy the Dataverse 5 warfile: + + `sudo -u dataverse /usr/local/payara5/bin/asadmin deploy /path/to/dataverse-5.0.war` + +14. Then restart Payara: + + `sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain` + `sudo -u dataverse /usr/local/payara5/bin/asadmin start-domain` + +### Additional Upgrade Steps + +1. Update Astrophysics Metadata Block (if used) + + `wget https://github.com/IQSS/dataverse/releases/download/v5.0/astrophysics.tsv` + `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @astrophysics.tsv -H "Content-type: text/tab-separated-values"` + +2. (Recommended) Run ReExportall to update JSON Exports + + + +3. (Required for installations using DataCite) Add the JVM option doi.dataciterestapiurlstring + + For production environments: + + `/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https\://api.datacite.org"` + + For test environments: + + `/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https\://api.test.datacite.org"` + + The JVM option `doi.mdcbaseurlstring` should be deleted if it was previously set, for example: + + `/usr/local/payara5/bin/asadmin delete-jvm-options "\-Ddoi.mdcbaseurlstring=https\://api.test.datacite.org"` + +4. (Recommended for installations using DataCite) Pre-register DOIs + + Execute the script described in the section "Dataverse Installations Using DataCite: Upgrade Action Recommended" earlier in the Release Note. + + Please consult the earlier sections of the Release Note for any additional configuration options that may apply to your installation. diff --git a/doc/release-notes/5.1-release-notes.md b/doc/release-notes/5.1-release-notes.md new file mode 100644 index 00000000000..3d106b2df7b --- /dev/null +++ b/doc/release-notes/5.1-release-notes.md @@ -0,0 +1,99 @@ +# Dataverse 5.1 + +This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +## Release Highlights + +### Large File Upload for Installations Using AWS S3 + +The added support for multipart upload through the API and UI (Issue #6763) will allow files larger than 5 GB to be uploaded to Dataverse when an installation is running on AWS S3. Previously, only non-AWS S3 storage configurations would allow uploads larger than 5 GB. + +### Dataset-Specific Stores + +In previous releases, configuration options were added that allow each dataverse to have a specific store enabled. This release adds even more granularity, with the ability to set a dataset-level store. + +## Major Use Cases + +Newly-supported use cases in this release include: + +- Users can now upload files larger than 5 GB on installations running AWS S3 (Issue #6763, PR #6995) +- Administrators will now be able to specify a store at the dataset level in addition to the Dataverse level (Issue #6872, PR #7272) +- Users will have their dataset's directory structure retained when uploading a dataset with shapefiles (Issue #6873, PR #7279) +- Users will now be able to download zip files through the experimental Zipper service when the set of downloaded files have duplicate names (Issue [#80](https://github.com/IQSS/dataverse.harvard.edu/issues/80), PR #7276) +- Users will now be able to download zip files with the proper file structure through the experiment Zipper service (Issue #7255, PR #7258) +- Administrators will be able to use new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause stale search results to not load. (Issue #4225, PR #7211) + +## Notes for Dataverse Installation Administrators + +### New API for setting a Dataset-level Store + +- This release adds a new API for setting a dataset-specific store. Learn more in the Managing Dataverse and Datasets section of the [Admin Guide](http://guides.dataverse.org/en/5.1/admin/solr-search-index.html). + +### Multipart Upload Storage Monitoring, Recommended Use for Multipart Upload + +Charges may be incurred for storage reserved for multipart uploads that are not completed or cancelled. Administrators may want to do periodic manual or automated checks for open multipart uploads. Learn more in the Big Data Support section of the [Developers Guide](http://guides.dataverse.org/en/5.1/developer/big-data-support.html). + +While multipart uploads can support much larger files, and can have advantages in terms of robust transfer and speed, they are more complex than single part direct uploads. Administrators should consider taking advantage of the options to limit use of multipart uploads to specific users by using multiple stores and configuring access to stores with high file size limits to specific Dataverses (added in 4.20) or Datasets (added in this release). + +### New APIs for keeping Solr records in sync + +This release adds new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause search results to not load. Learn more in the Solr section of the [Admin Guide](http://guides.dataverse.org/en/5.1/admin/solr-search-index.html). + +### Documentation for Purging the Ingest Queue + +At times, it may be necessary to cancel long-running Ingest jobs in the interest of system stability. The Troubleshooting section of the [Admin Guide](http://guides.dataverse.org/en/5.1/admin/) now has specific steps. + +### Biomedical Metadata Block Updated + +The Life Science Metadata block (biomedical.tsv) was updated. "Other Design Type", "Other Factor Type", "Other Technology Type", "Other Technology Platform" boxes were added. See the "Additional Upgrade Steps" below if you use this in your installation. + +## Notes for Tool Developers and Integrators + +### Spaces in File Names + +Dataverse Installations using S3 storage will no longer replace spaces in file names of downloaded files with the + character. If your tool or integration has any special handling around this, you may need to make further adjustments to maintain backwards compatibility while also supporting Dataverse installations on 5.1+. + +## Complete List of Changes + +For the complete list of code changes in this release, see the [5.1 Milestone](https://github.com/IQSS/dataverse/milestone/90?closed=1) in Github. + +For help with upgrading, installing, or general questions please post to the [Dataverse Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our [Installation Guide](http://guides.dataverse.org/en/5.1/installation/) + +## Upgrade Instructions + +0. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the [Dataverse 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0). + +1. Undeploy the previous version. + +/payara/bin/asadmin list-applications +/payara/bin/asadmin undeploy dataverse + +2. Stop payara and remove the generated directory, start. + +- service payara stop +- remove the generated directory: rm -rf payara/payara/domains/domain1/generated +- service payara start + +3. Deploy this version. +/payara/bin/asadmin deploy dataverse-5.1.war + +4. Restart payara + +### Additional Upgrade Steps + +1. Update Biomedical Metadata Block (if used), Reload Solr, ReExportAll + + `wget https://github.com/IQSS/dataverse/releases/download/v5.1/biomedical.tsv` + `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @biomedical.tsv -H "Content-type: text/tab-separated-values"` + +- copy schema_dv_mdb_fields.xml and schema_dv_mdb_copies.xml to solr server, for example into /usr/local/solr/solr-7.7.2/server/solr/collection1/conf/ directory +- Restart Solr, or tell Solr to reload its configuration: + + `curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1"` + +- Run ReExportall to update JSON Exports + diff --git a/doc/release-notes/5.1.1-release-notes.md b/doc/release-notes/5.1.1-release-notes.md new file mode 100644 index 00000000000..739bd3da800 --- /dev/null +++ b/doc/release-notes/5.1.1-release-notes.md @@ -0,0 +1,63 @@ +# Dataverse 5.1.1 + +This minor release adds important scaling improvements for installations running on AWS S3. It is recommended that 5.1.1 be used in production instead of 5.1. + +## Release Highlights + +### Connection Pool Size Configuration Option, Connection Optimizations + +Dataverse 5.1 improved the efficiency of making S3 connections through use of an http connection pool. This release adds optimizations around closing streams and channels that may hold S3 http connections open and exhaust the connection pool. In parallel, this release increases the default pool size from 50 to 256 and adds the ability to increase the size of the connection pool, so a larger pool can be configured if needed. + +## Major Use Cases + +Newly-supported use cases in this release include: + +- Administrators of installations using S3 will be able to define the connection pool size, allowing better resource scaling for larger installations (Issue #7309, PR #7313) + +## Notes for Dataverse Installation Administrators + +### 5.1.1 vs. 5.1 for Production Use + +As mentioned above, we encourage 5.1.1 instead of 5.1 for production use. + +### New JVM Option for Connection Pool Size + +Larger installations may want to increase the number of open S3 connections allowed (default is 256). For example, to set the value to 4096: + +`./asadmin create-jvm-options "-Ddataverse.files..connection-pool-size=4096"` +(where `` is the identifier of your S3 file store (likely `"s3"`). The JVM Options section of the [Configuration Guide](http://guides.dataverse.org/en/5.1.1/installation/config/) has more information. + +### New S3 Bucket CORS setting for Direct Upload/Download + +When using S3 storage with direct upload and/or download enabled, one must now expose the ETag header as documented in the [updated cors.json example](https://guides.dataverse.org/en/5.1.1/developers/big-data-support.html?highlight=etag#s3-direct-upload-and-download). + +## Complete List of Changes + +For the complete list of code changes in this release, see the [5.1.1 Milestone](https://github.com/IQSS/dataverse/milestone/91?closed=1) in Github. + +For help with upgrading, installing, or general questions please post to the [Dataverse Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our [Installation Guide](http://guides.dataverse.org/en/5.1.1/installation/) + +## Upgrade Instructions + +0. These instructions assume that you've already successfully upgraded to Dataverse 5.1 following the instructions in the [Dataverse 5.1 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.1). + +1. Undeploy the previous version. + +`/bin/asadmin list-applications` +`/bin/asadmin undeploy dataverse<-version>` + +2. Stop payara and remove the generated directory, start. + +- `service payara stop` +- remove the generated directory: +`rm -rf /glassfish/domains/domain1/generated` +- `service payara start` + +3. Deploy this version. +`/bin/asadmin deploy dataverse-5.1.1.war` + +4. Restart payara diff --git a/doc/release-notes/5.10-release-notes.md b/doc/release-notes/5.10-release-notes.md new file mode 100644 index 00000000000..c13ae8a6b78 --- /dev/null +++ b/doc/release-notes/5.10-release-notes.md @@ -0,0 +1,344 @@ +# Dataverse Software 5.10 + +This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +## Release Highlights + +### Multiple License Support + +Users can now select from a set of configured licenses in addition to or instead of the previous Creative Commons CC0 choice or provide custom terms of use (if configured) for their datasets. Administrators can configure their Dataverse instance via API to allow any desired license as a choice and can enable or disable the option to allow custom terms. Administrators can also mark licenses as "inactive" to disallow future use while keeping that license for existing datasets. For upgrades, only the CC0 license will be preinstalled. New installations will have both CC0 and CC BY preinstalled. The [Configuring Licenses](https://guides.dataverse.org/en/5.10/installation/config.html#configuring-licenses) section of the Installation Guide shows how to add or remove licenses. + +**Note: Datasets in existing installations will automatically be updated to conform to new requirements that custom terms cannot be used with a standard license and that custom terms cannot be empty. Administrators may wish to manually update datasets with these conditions if they do not like the automated migration choices. See the "Notes for Dataverse Installation Administrators" section below for details.** + +This release also makes the license selection and/or custom terms more prominent when publishing and viewing a dataset and when downloading files. + +### Ingest and File Upload Messaging Improvements + +Messaging around ingest failure has been softened to prevent support tickets. In addition, messaging during file upload has been improved, especially with regard to showing size limits and providing links to the guides about tabular ingest. For screenshots and additional details see PR #8271. + +### Downloading of Guestbook Responses with Fewer Clicks + +A download button has been added to the page that lists guestbooks. This saves a click but you can still download responses from the "View Responses" page, as before. + +Also, links to the guides about guestbooks have been added in additional places. + +### Dynamically Request Arbitrary Metadata Fields from Search API + +The Search API now allows arbitrary metadata fields to be requested when displaying results from datasets. You can request all fields from metadata blocks or pick and choose certain fields. + +The new parameter is called `metadata_fields` and the Search API documentation contains details and examples: + +### Solr 8 Upgrade + +The Dataverse Software now runs on Solr 8.11.1, the latest available stable release in the Solr 8.x series. + +### PostgreSQL Upgrade + +A PostgreSQL upgrade is not required for this release but is planned for the next release. See below for details. + +## Major Use Cases and Infrastructure Enhancements + +Changes and fixes in this release include: + +- When creating or updating datasets, users can select from a set of licenses configured by the administrator (CC, CC BY, custom licenses, etc.) or provide custom terms (if the installation is configured to allow them). (Issue #7440, PR #7920) +- Users can get better feedback on tabular ingest errors and more information about size limits when uploading files. (Issue #8205, PR #8271) +- Users can more easily download guestbook responses and learn how guestbooks work. (Issue #8244, PR #8402) +- Search API users can specify additional metadata fields to be returned in the search results. (Issue #7863, PR #7942) +- The "Preview" tab on the file page can now show restricted files. (Issue #8258, PR #8265) +- Users wanting to upload files from GitHub to Dataverse can learn about a new GitHub Action called "Dataverse Uploader". (PR #8416) +- Users requesting access to files now get feedback that it was successful. (Issue #7469, PR #8341) +- Users may notice various accessibility improvements. (Issue #8321, PR #8322) +- Users of the Social Science metadata block can now add multiples of the "Collection Mode" field. (Issue #8452, PR #8473) +- Guestbooks now support multi-line text area fields. (Issue #8288, PR #8291) +- Guestbooks can better handle commas in responses. (Issue #8193, PR #8343) +- Dataset editors can now deselect a guestbook. (Issue #2257, PR #8403) +- Administrators with a large `actionlogrecord` table can read docs on archiving and then trimming it. (Issue #5916, PR #8292) +- Administrators can list locks across all datasets. (PR #8445) +- Administrators can run a version of Solr that doesn't include a version of log4j2 with serious known vulnerabilities. We trust that you have patched the version of Solr you are running now following the instructions that were sent out. An upgrade to the latest version is recommended for extra peace of mind. (PR #8415) +- Administrators can run a version of Dataverse that doesn't include a version of log4j with known vulnerabilities. (PR #8377) + +## Notes for Dataverse Installation Administrators + +### Updating for Multiple License Support + +#### Adding and Removing Licenses and How Existing Datasets Will Be Automatically Updated + +As part of installing or upgrading an existing installation, administrators may wish to add additional license choices and/or configure Dataverse to allow custom terms. Adding additional licenses is managed via API, as explained in the [Configuring Licenses](https://guides.dataverse.org/en/5.10/installation/config.html#configuring-licenses) section of the Installation Guide. Licenses are described via a JSON structure providing a name, URL, short description, and optional icon URL. Additionally licenses may be marked as active (selectable for new or updated datasets) or inactive (only allowed on existing datasets) and one license can be marked as the default. Custom Terms are allowed by default (backward compatible with the current option to select "No" to using CC0) and can be disabled by setting `:AllowCustomTermsOfUse` to false. + +Further, administrators should review the following automated migration of existing licenses and terms into the new license framework and, if desired, should manually find and update any datasets for which the automated update is problematic. +To understand the migration process, it is useful to understand how the multiple license feature works in this release: + +"Custom Terms", aka a custom license, are defined through entries in the following fields of the dataset "Terms" tab: + +- Terms of Use +- Confidentiality Declaration +- Special Permissions +- Restrictions +- Citation Requirements +- Depositor Requirements +- Conditions +- Disclaimer + +"Custom Terms" require, at a minimum, a non-blank entry in the "Terms of Use" field. Entries in other fields are optional. + +Since these fields are intended for terms/conditions that would potentially conflict with or modify the terms in a standard license, they are no longer shown when a standard license is selected. + +In earlier Dataverse releases, it was possible to select the CC0 license and have entries in the fields above. It was also possible to say "No" to using CC0 and leave all of these terms fields blank. + +The automated process will update existing datasets as follows. + +- "CC0 Waiver" and no entries in the fields above -> CC0 License (no change) +- No CC0 Waiver and an entry in the "Terms of Use" field and possibly others fields listed above -> "Custom Terms" with the same entries in these fields (no change) +- CC0 Waiver and an entry in some of the fields listed -> 'Custom Terms' with the following text preprended in the "Terms of Use" field: "This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions:" +- No CC0 Waiver and an entry in a field(s) other than the "Terms of Use" field -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available with limited information on how it can be used. You may wish to communicate with the Contact(s) specified before use." +- No CC0 Waiver and no entry in any of the listed fields -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available without information on how it can be used. You should communicate with the Contact(s) specified before use." + +Administrators who have datasets where CC0 has been selected along with additional terms, or datasets where the Terms of Use field is empty, may wish to modify those datasets prior to upgrading to avoid the automated changes above. This is discussed next. + +#### Handling Datasets that No Longer Comply With Licensing Rules + +In most Dataverse installations, one would expect the vast majority of datasets to either use the CC0 Waiver or have non-empty Terms of Use. As noted above, these will be migrated without any issue. Administrators may however wish to find and manually update datasets that specified a CC0 license but also had terms (no longer allowed) or had no license and no terms of use (also no longer allowed) rather than accept the default migrations for these datasets listed above. + +##### Finding and Modifying Datasets with a CC0 License and Non-Empty Terms + +To find datasets with a CC0 license and non-empty terms: + +``` +select CONCAT('doi:', dvo.authority, '/', dvo.identifier), v.alias as dataverse_alias, case when versionstate='RELEASED' then concat(dv.versionnumber, '.', dv.minorversionnumber) else versionstate END as version, dv.id as datasetversion_id, t.id as termsofuseandaccess_id, t.termsofuse, t.confidentialitydeclaration, t.specialpermissions, t.restrictions, t.citationrequirements, t.depositorrequirements, t.conditions, t.disclaimer from dvobject dvo, termsofuseandaccess t, datasetversion dv, dataverse v where dv.dataset_id=dvo.id and dv.termsofuseandaccess_id=t.id and dvo.owner_id=v.id and t.license='CC0' and not (t.termsofuse is null and t.confidentialitydeclaration is null and t.specialpermissions is null and t.restrictions is null and citationrequirements is null and t.depositorrequirements is null and t.conditions is null and t.disclaimer is null); +``` + +The `datasetdoi` column will let you find and view the affected dataset in the Dataverse web interface. The `version` column will indicate which version(s) are relevant. The `dataverse_alias` will tell you which Dataverse collection the dataset is in (and may be useful if you want to adjust all datasets in a given collection). The `termsofuseandaccess_id` column indicates which specific entry in that table is associated with the dataset/version. The remaining columns show the values of any terms fields. + +There are two options to migrate such datasets: + +Option 1: Set all terms fields to null: + +``` +update termsofuseandaccess set termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id=; +``` + +or to change several at once: + +``` +update termsofuseandaccess set termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id in (); +``` + +Option 2: Change the Dataset version(s) to not use the CCO waiver and modify the Terms of Use (and/or other fields) as you wish to indicate that the CC0 waiver was previously selected: + +``` + update termsofuseandaccess set license='NONE', termsofuse=concat('New text. ', termsofuse) where id=; +``` + +or + +``` + update termsofuseandaccess set license='NONE', termsofuse=concat('New text. ', termsofuse) where id in (); +``` + +##### Finding and Modifying Datasets without a CC0 License and with Empty Terms + +To find datasets with a without a CC0 license and with empty terms: + +``` +select CONCAT('doi:', dvo.authority, '/', dvo.identifier), v.alias as dataverse_alias, case when versionstate='RELEASED' then concat(dv.versionnumber, '.', dv.minorversionnumber) else versionstate END as version, dv.id as datasetversion_id, t.id as termsofuseandaccess_id, t.termsofuse, t.confidentialitydeclaration, t.specialpermissions, t.restrictions, t.citationrequirements, t.depositorrequirements, t.conditions, t.disclaimer from dvobject dvo, termsofuseandaccess t, datasetversion dv, dataverse v where dv.dataset_id=dvo.id and dv.termsofuseandaccess_id=t.id and dvo.owner_id=v.id and (t.license='NONE' or t.license is null) and t.termsofuse is null; +``` + +As before, there are a couple options. + +Option 1: These datasets could be updated to use CC0: + +``` +update termsofuseandaccess set license='CC0', confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id=; +``` + +Option 2: Terms of Use could be added: + +``` +update termsofuseandaccess set termsofuse='New text. ' where id=; +``` + +In both cases, the same where id in (``); ending could be used to change multiple datasets/versions at once. + +#### Standardizing Custom Licenses + +If many datasets use the same set of Custom Terms, it may make sense to create and register a standard license including those terms. Doing this would include: + +- Creating and posting an external document that includes the custom terms, i.e. an HTML document with sections corresponding to the terms fields that are used. +- Defining a name, short description, URL (where it is posted), and optionally an icon URL for this license +- Using the Dataverse API to register the new license as one of the options available in your installation +- Using the API to make sure the license is active and deciding whether the license should also be the default +- Once the license is registered with Dataverse, making an SQL update to change datasets/versions using that license to reference it instead of having their own copy of those custom terms. + +The benefits of this approach are: + +- usability: the license can be selected for new datasets without allowing custom terms and without users having to cut/paste terms or collection administrators having to configure templates with those terms +- efficiency: custom terms are stored per dataset whereas licenses are registered once and all uses of it refer to the same object and external URL +- security: with the license terms maintained external to Dataverse, users cannot edit specific terms and curators do not need to check for edits + +Once a standardized version of your Custom Terms are registered as a license, an SQL update like the following can be used to have datasets use it: + +``` +UPDATE termsofuseandaccess +SET license_id = (SELECT license.id FROM license WHERE license.name = ''), termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null +WHERE termsofuseandaccess.termsofuse LIKE '%%'; +``` + +Note that this information is also available in the [Configuring Licenses](https://guides.dataverse.org/en/5.10/installation/config.html#configuring-licenses) section of the Installation Guide. Look for "Standardizing Custom Licenses". + +### PostgreSQL Version 10+ Required + +If you are still using PostgreSQL 9.x, now is the time to upgrade. PostgreSQL 9.x is now EOL (no longer supported, as of January 2022), and in the next version of the Dataverse Software we plan to upgrade the Flyway library (used for database migrations) to a version that will no longer work with versions prior to PostgreSQL 10. See PR #8296 for more on this upcoming Flyway upgrade. + +The Dataverse Software has been tested with PostgreSQL versions up to 13. The current stable version 13.5 is recommended. If that's not an option for reasons specific to your installation (for example, if PostgreSQL 13.5 is not available for the OS distribution you are using), any 10+ version should work. + +See the upgrade section below for more information. + +### Providing S3 Storage Credentials via MicroProfile Config + +With this release, you may use two new JVM options (`dataverse.files..access-key` and `dataverse.files..secret-key`) to pass an access key identifier and a secret access key for S3-based storage definitions without creating the files used by the AWS CLI tools (`~/.aws/config` & `~/.aws/credentials`). + +This has been added to ease setups using containers (Docker, Podman, Kubernetes, OpenShift) or testing and development installations. Find additional [documentation and a word of warning in the Installation Guide](https://guides.dataverse.org/en/5.10/installation/config.html#s3-mpconfig). + +## New JVM Options and DB Settings + +The following JVM settings have been added: + +- `dataverse.files..access-key` - S3 access key ID. +- `dataverse.files..secret-key` - S3 secret access key. + +See the [JVM Options](https://guides.dataverse.org/en/5.10/installation/config.html#jvm-options) section of the Installation Guide for more information. + +The following DB settings have been added: + +- `:AllowCustomTermsOfUse` (default: true) - allow users to provide Custom Terms instead of choosing one of the configured standard licenses. + +See the [Database Settings](https://guides.dataverse.org/en/5.10/installation/config.html#database-settings) section of the Guides for more information. + +## Notes for Developers and Integrators + +In the "Backward Incompatibilities" section below, note changes in the API regarding licenses and the native JSON format. + +## Backward Incompatibilities + +With the change to support multiple licenses, which can include cases where CC0 is not an option, and the decision to prohibit two previously possible cases (no license and no entry in the "Terms of Use" field, a standard license and entries in "Terms of Use", "Special Permissions", and related fields), this release contains changes to the display, API payloads, and export metadata that are not backward compatible. These include: + +- "CC0 Waiver" has been replaced by "CC0 1.0" (the short name specified by Creative Commons) in the web interface, API payloads, and export formats that include a license name. (Note that installation admins can alter the license name in the database to maintain the original "CC0 Waiver" text, if desired.) +- Schema.org metadata in page headers and the Schema.org JSON-LD metadata export now reference the license via URL (which should avoid the current warning from Google about an invalid license object in the page metadata). +- Metadata exports and import methods (including SWORD) use either the license name (e.g. in the JSON export) or URL (e.g. in the OAI_ORE export) rather than a hardcoded value of "CC0" or "CC0 Waiver" currently (if the CC0 license is available, its default name would be "CC0 1.0"). +- API calls (e.g. for import, migrate) that specify both a license and custom terms will be considered an error, as would having no license and an empty/blank value for "Terms of Use". +- Rollback. In general, one should not deploy an earlier release over a database that has been modified by deployment of a later release. (Make a db backup before upgrading and use that copy if you go back to a prior version.) Due to the nature of the db changes in this release, attempts to deploy an earlier version of Dataverse will fail unless the database is also restored to its pre-release state. + +Also, note that since CC0 Waiver is no longer a hardcoded option, text strings that reference it have been edited or removed from `Bundle.properties`. This means that the ability to provide translations of the CC0 license name/description has been removed. The initial release of multiple license functionality doesn't include an alternative mechanism to provide translations of license names/descriptions, so this is a regression in capability (see #8346). The instructions and help information about license and terms remains internationalizable, it is only the name/description of the licenses themselves that cannot yet be translated. + +An update in the metadata block Social Science changes the field CollectionMode to allow multiple values. This changes the way the field is encoded in the native JSON format. From + +``` +"typeName": "collectionMode", +"multiple": false, +"typeClass": "primitive", +"value": "some text" +``` + +to + +``` +"typeName": "collectionMode", +"multiple": true, +"typeClass": "primitive", +"value": ["some text", "more text"] +``` + +## Complete List of Changes + +For the complete list of code changes in this release, see the [5.10 Milestone](https://github.com/IQSS/dataverse/milestone/101?closed=1) in Github. + +For help with upgrading, installing, or general questions please post to the [Dataverse Community Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.10/installation/). Please also contact us to get added to the [Dataverse Project Map](https://guides.dataverse.org/en/5.10/installation/config.html#putting-your-dataverse-installation-on-the-map-at-dataverse-org) if you have not done so already. + +## Upgrade Instructions + +0\. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the [Dataverse Software 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0). After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10. + +If you are running Payara as a non-root user (and you should be!), **remember not to execute the commands below as root**. Use `sudo` to change to that user first. For example, `sudo -i -u dataverse` if `dataverse` is your dedicated application user. + +In the following commands we assume that Payara 5 is installed in `/usr/local/payara5`. If not, adjust as needed. + +`export PAYARA=/usr/local/payara5` + +(or `setenv PAYARA /usr/local/payara5` if you are using a `csh`-like shell) + +1\. Undeploy the previous version. + +- `$PAYARA/bin/asadmin list-applications` +- `$PAYARA/bin/asadmin undeploy dataverse<-version>` + +2\. Stop Payara and remove the generated directory + +- `service payara stop` +- `rm -rf $PAYARA/glassfish/domains/domain1/generated` + +3\. Start Payara + +- `service payara start` + +4\. Deploy this version. + +- `$PAYARA/bin/asadmin deploy dataverse-5.10.war` + +5\. Restart payara + +- `service payara stop` +- `service payara start` + +6\. Update the Social Science metadata block + +- `wget https://github.com/IQSS/dataverse/releases/download/v5.10/social_science.tsv` +- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @social_science.tsv -H "Content-type: text/tab-separated-values"` +- Note that this update also requires an updated Solr schema. We strongly recommend that you upgrade Solr as part of this release, by installing the latest stable release from scratch (see below). In the process you will configure it with the latest version of the schema as distributed with this Dataverse release, so no further steps will be needed. If you have already upgraded, or have some **very** good reason to stay on the old version a little longer, please refer to for information on updating your Solr schema in place. + +7\. Run ReExportall to update Exports + +Following the directions in the [Admin Guide](http://guides.dataverse.org/en/5.10/admin/metadataexport.html#batch-exports-through-the-api) + +8\. Upgrade Solr + +See "Additional Release Steps" below for how to upgrade Solr. + +## Additional Release Steps + +### Solr Upgrade + +With this release we upgrade to the latest available stable release in the Solr 8.x branch. We recommend a fresh installation of Solr (the index will be empty) followed by an "index all". + +Before you start the "index all", the Dataverse installation will appear to be empty because the search results come from Solr. As indexing progresses, partial results will appear until indexing is complete. + +See for more information. + +Please note that after you have followed the instruction above you will have Solr installed with the default schema that lists all the fields in the standard Dataverse metadata blocks. If your installation uses any custom metadata blocks, please refer to for information on updating your Solr schema to include these extra fields. + +### PostgreSQL Upgrade + +The tested and recommended way of upgrading an existing database is as follows: + +- Export your current database with ``pg_dumpall``. +- Install the new version of PostgreSQL (make sure it's running on the same port, so that no changes are needed in the Payara configuration). +- Re-import the database with ``psql``, as the user ``postgres``. + +It is strongly recommended to use the versions of the ``pg_dumpall`` and ``psql`` from the old and new versions of PostgreSQL, respectively. For example, the commands below were used to migrate a database running under PostgreSQL 9.6 to 13.5. Adjust the versions and the path names to match your environment. + +Back up/export: + +``/usr/pgsql-9.6/bin/pg_dumpall -U postgres > /tmp/backup.sql`` + +Restore/import: + +``/usr/pgsql-13/bin/psql -U postgres -f /tmp/backup.sql`` + +When upgrading the production database here at Harvard IQSS we were able to go from version 9.6 all the way to 13.3 without any issues. + +You may want to try these backup and restore steps on a test server to get an accurate estimate of how much downtime to expect with the final production upgrade. That of course will depend on the size of your database. + +Consult the PostgreSQL upgrade documentation for more information, for example . diff --git a/doc/release-notes/5.10.1-release-notes.md b/doc/release-notes/5.10.1-release-notes.md new file mode 100644 index 00000000000..a7f2250dac9 --- /dev/null +++ b/doc/release-notes/5.10.1-release-notes.md @@ -0,0 +1,96 @@ +# Dataverse Software 5.10.1 + +This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +## Release Highlights + +### Bug Fix for Request Access + +Dataverse Software 5.10 contains a bug where the "Request Access" button doesn't work from the file listing on the dataset page if the dataset contains custom terms. This has been fixed in PR #8555. + +### Bug Fix for Searching and Selecting Controlled Vocabulary Values + +Dataverse Software 5.10 contains a bug where the search option is no longer present when selecting from more than ten controlled vocabulary values. This has been fixed in PR #8521. + +## Major Use Cases and Infrastructure Enhancements + +Changes and fixes in this release include: + +- Users can use the "Request Access" button when the dataset has custom terms. (Issue #8553, PR #8555) +- Users can search when selecting from more than ten controlled vocabulary values. (Issue #8519, PR #8521) +- The default file categories ("Documentation", "Data", and "Code") can be redefined through the `:FileCategories` database setting. (Issue #8461, PR #8478) +- Documentation on troubleshooting Excel ingest errors was improved. (PR #8541) +- Internationalized controlled vocabulary values can now be searched. (Issue #8286, PR #8435) +- Curation labels can be internationalized. (Issue #8381, PR #8466) +- "NONE" is no longer accepted as a license using the SWORD API (since 5.10). See "Backward Incompatibilities" below for details. (Issue #8551, PR #8558). + +## Notes for Dataverse Installation Administrators + +### PostgreSQL Version 10+ Required Soon + +Because 5.10.1 is a bug fix release, an upgrade to PostgreSQL is not required. However, this upgrade is still coming in the next non-bug fix release. For details, please see the release notes for 5.10: https://github.com/IQSS/dataverse/releases/tag/v5.10 + +### Payara Upgrade + +You may notice that the Payara version used in the install scripts has been updated from 5.2021.5 to 5.2021.6. This was to address a bug where it was not possible to easily update the logging level. For existing installations, this release does not require upgrading Payara and a Payara upgrade is not part of the Upgrade Instructions below. For more information, see PR #8508. + +## New JVM Options and DB Settings + +The following DB settings have been added: + +- `:FileCategories` - The default list of the pre-defined file categories ("Documentation", "Data" and "Code") can now be redefined with a comma-separated list (e.g. `'Docs,Data,Code,Workflow'`). + +See the [Database Settings](https://guides.dataverse.org/en/5.10.1/installation/config.html#database-settings) section of the Guides for more information. + +## Notes for Developers and Integrators + +In the "Backward Incompatibilities" section below, note changes in the API regarding licenses and the SWORD API. + +## Backward Incompatibilities + +As of Dataverse 5.10, "NONE" is no longer supported as a valid license when creating a dataset using the SWORD API. The API Guide has been updated to reflect this. Additionally, if you specify an invalid license, a list of available licenses will be returned in the response. + +## Complete List of Changes + +For the complete list of code changes in this release, see the [5.10.1 Milestone](https://github.com/IQSS/dataverse/milestone/102?closed=1) in Github. + +For help with upgrading, installing, or general questions please post to the [Dataverse Community Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.10.1/installation/). Please also contact us to get added to the [Dataverse Project Map](https://guides.dataverse.org/en/5.10.1/installation/config.html#putting-your-dataverse-installation-on-the-map-at-dataverse-org) if you have not done so already. + +## Upgrade Instructions + +0\. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the [Dataverse Software 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0). After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.1. + +If you are running Payara as a non-root user (and you should be!), **remember not to execute the commands below as root**. Use `sudo` to change to that user first. For example, `sudo -i -u dataverse` if `dataverse` is your dedicated application user. + +In the following commands we assume that Payara 5 is installed in `/usr/local/payara5`. If not, adjust as needed. + +`export PAYARA=/usr/local/payara5` + +(or `setenv PAYARA /usr/local/payara5` if you are using a `csh`-like shell) + +1\. Undeploy the previous version. + +- `$PAYARA/bin/asadmin list-applications` +- `$PAYARA/bin/asadmin undeploy dataverse<-version>` + +2\. Stop Payara and remove the generated directory + +- `service payara stop` +- `rm -rf $PAYARA/glassfish/domains/domain1/generated` + +3\. Start Payara + +- `service payara start` + +4\. Deploy this version. + +- `$PAYARA/bin/asadmin deploy dataverse-5.10.1.war` + +5\. Restart payara + +- `service payara stop` +- `service payara start` diff --git a/doc/release-notes/5.11-release-notes.md b/doc/release-notes/5.11-release-notes.md new file mode 100644 index 00000000000..a51bcec2dac --- /dev/null +++ b/doc/release-notes/5.11-release-notes.md @@ -0,0 +1,208 @@ +# Dataverse Software 5.11 + +This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +## Release Highlights + +### Terms of Access or Request Access Required for Restricted Files + +Beginning in this release, datasets with restricted files must have either Terms of Access or Request Access enabled. This change is to ensure that for each file in a Dataverse installation there is a clear path to get to the data, either through requesting access to the data or to provide context about why requesting access is not enabled. + +Published datasets are not affected by this change. Datasets that are in draft and that have neither Terms of Access nor Request Access enabled must be updated to select one or the other (or both). Otherwise, datasets cannot be futher edited or published. Dataset authors will be able to tell if their dataset is affected by the presence of the following message at the top of their dataset (when they are logged in): + +"Datasets with restricted files are required to have Request Access enabled or Terms of Access to help people access the data. Please edit the dataset to confirm Request Access or provide Terms of Access to be in compliance with the policy." + +At this point, authors should click "Edit Dataset" then "Terms" and then check the box for "Request Access" or fill in "Terms of Access for Restricted Files" (or both). Afterwards, authors will be able to further edit metadata and publish. + +In the "Notes for Dataverse Installation Administrators" section, we have provided a query to help proactively identify datasets that need to be updated. + +See also Issue #8191 and PR #8308. + +### Muting Notifications + +Users can control which notifications they receive if the system is [configured to allow this](https://guides.dataverse.org/en/5.11/admin/user-administration.html#letting-users-manage-receiving-notifications). See also Issue #7492 and PR #8530. + +## Major Use Cases and Infrastructure Enhancements + +Changes and fixes in this release include: + +- Terms of Access or Request Access required for restricted files. (Issue #8191, PR #8308) +- Users can control which notifications they receive if the system is [configured to allow this](https://guides.dataverse.org/en/5.11/admin/user-administration.html#letting-users-manage-receiving-notifications). (Issue #7492, PR #8530) +- A 500 error was occuring when creating a dataset if a template did not have an associated "termsofuseandaccess". See "Legacy Templates Issue" below for details. (Issue #8599, PR #8789) +- Tabular ingest can be skipped via API. (Issue #8525, PR #8532) +- The "Verify Email" button has been changed to "Send Verification Email" and rather than sometimes showing a popup now always sends a fresh verification email (and invalidates previous verification emails). (Issue #8227, PR #8579) +- For Shibboleth users, the `emailconfirmed` timestamp is now set on login and the UI should show "Verified". (Issue #5663, PR #8579) +- Information about the license selection (or custom terms) is now available in the confirmation popup when contributors click "Submit for Review". Previously, this was only available in the confirmation popup for the "Publish" button, which contributors do not see. (Issue #8561, PR #8691) +- For installations configured to support multiple languages, controlled vocabulary fields that do not allow multiple entries (e.g. journalArticleType) are now indexed properly. (Issue #8595, PR #8601, PR #8624) +- Two-letter ISO-639-1 codes for languages are now supported, in metadata imports and harvesting. (Issue #8139, PR #8689) +- The API endpoint for listing notifications has been enhanced to show the subject, text, and timestamp of notifications. (Issue #8487, PR #8530) +- The API Guide has been updated to explain that the `Content-type` header is now (as of Dataverse 5.6) necessary to create datasets via native API. (Issue #8663, PR #8676) +- Admin API endpoints have been added to find and delete dataset templates. (Issue 8600, PR #8706) +- The BagIt file handler detects and transforms zip files with a BagIt package format into Dataverse data files, validating checksums along the way. See the [BagIt File Handler](https://guides.dataverse.org/en/5.11/installation//config.html#bagit-file-handler) section of the Installation Guide for details. (Issue #8608, PR #8677) +- For BagIt Export, the number of threads used when zipping data files into an archival bag is now configurable using the `:BagGeneratorThreads` database setting. (Issue #8602, PR #8606) +- PostgreSQL 14 can now be used (though we've tested mostly with 13). PostgreSQL 10+ is required. (Issue #8295, PR #8296) +- As always, widgets can be embedded in the ` + + .. group-tab:: IntelliJ + Choose "Run" or "Debug" in the toolbar. + + .. image:: img/intellij-payara-run-toolbar.png + + Watch the WAR build and the deployment unfold. + Note the "Update" action button (see config to change its behavior). + + .. image:: img/intellij-payara-run-output.png + + Manually hotswap classes in "Debug" mode via "Run" > "Debugging Actions" > "Reload Changed Classes". + + .. image:: img/intellij-payara-run-menu-reload.png + +Note: in the background, the bootstrap job will wait for Dataverse to be deployed and responsive. +When your IDE automatically opens the URL a newly deployed, not bootstrapped Dataverse application, it might take some more time and page refreshes until the job finishes. + +IDE Triggered Non-Code Re-Deployments +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Either redeploy the WAR (see above), use JRebel or look into copying files into the exploded WAR within the running container. +The steps below describe options to enable the later in different IDEs. + +.. tabs:: + .. group-tab:: IntelliJ + + This imitates the Netbeans builtin function to copy changes to files under ``src/main/webapp`` into a destination folder. + It is different in the way that it will copy the files into the running container deployment without using a bind mount. + + 1. Install the `File Watchers plugin `_ + 2. Import the :download:`watchers.xml <../../../../docker/util/intellij/watchers.xml>` file at *File > Settings > Tools > File Watchers* + 3. Once you have the deployment running (see above), editing files under ``src/main/webapp`` will be copied into the container after saving the edited file. + Note: by default, IDE auto-saves will not trigger the copy. + 4. Changes are visible once you reload the browser window. + + **IMPORTANT**: This tool assumes you are using the :ref:`ide-trigger-code-deploy` method to run Dataverse. + + **IMPORTANT**: This tool uses a Bash shell script and is thus limited to Mac and Linux OS. + +Exploring the Database +---------------------- + +See :ref:`db-name-creds` in the Developer Guide. + +Using a Debugger +---------------- + +The :doc:`base-image` enables usage of the `Java Debugging Wire Protocol `_ +for remote debugging if you set ``ENABLE_JDWP=1`` as environment variable for the application container. +The default configuration when executing containers with the commands listed at :ref:`dev-run` already enables this. + +There are a lot of tutorials how to connect your IDE's debugger to a remote endpoint. Please use ``localhost:9009`` +as the endpoint. Here are links to the most common IDEs docs on remote debugging: +`Eclipse `_, +`IntelliJ `_ + +Building Your Own Base Image +---------------------------- + +If you find yourself tasked with upgrading Payara, you will need to create your own base image before running the :ref:`container-dev-quickstart`. For instructions, see :doc:`base-image`. diff --git a/doc/sphinx-guides/source/container/img/intellij-compose-add-new-config.png b/doc/sphinx-guides/source/container/img/intellij-compose-add-new-config.png new file mode 100644 index 00000000000..cec9bb357fe Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-compose-add-new-config.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-compose-run.png b/doc/sphinx-guides/source/container/img/intellij-compose-run.png new file mode 100644 index 00000000000..e01744134f9 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-compose-run.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-compose-services.png b/doc/sphinx-guides/source/container/img/intellij-compose-services.png new file mode 100644 index 00000000000..1c500c54201 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-compose-services.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-compose-setup.png b/doc/sphinx-guides/source/container/img/intellij-compose-setup.png new file mode 100644 index 00000000000..42c2accf2b4 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-compose-setup.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-add-new-config.png b/doc/sphinx-guides/source/container/img/intellij-payara-add-new-config.png new file mode 100644 index 00000000000..d1c7a8f2777 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-add-new-config.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-add-server.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-add-server.png new file mode 100644 index 00000000000..54ffbd1b713 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-add-server.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-deployment.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-deployment.png new file mode 100644 index 00000000000..52adee056b5 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-deployment.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-server-behaviour.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-server-behaviour.png new file mode 100644 index 00000000000..5d23672e614 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-server-behaviour.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-server.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-server.png new file mode 100644 index 00000000000..614bda6f6d7 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-server.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-startup.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-startup.png new file mode 100644 index 00000000000..35b87148859 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-startup.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-plugin-install.png b/doc/sphinx-guides/source/container/img/intellij-payara-plugin-install.png new file mode 100644 index 00000000000..7c6896574de Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-plugin-install.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-run-menu-reload.png b/doc/sphinx-guides/source/container/img/intellij-payara-run-menu-reload.png new file mode 100644 index 00000000000..b1fd8bea260 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-run-menu-reload.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-run-output.png b/doc/sphinx-guides/source/container/img/intellij-payara-run-output.png new file mode 100644 index 00000000000..aa139485a9d Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-run-output.png differ diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-run-toolbar.png b/doc/sphinx-guides/source/container/img/intellij-payara-run-toolbar.png new file mode 100644 index 00000000000..2aecb27c5f3 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-run-toolbar.png differ diff --git a/doc/sphinx-guides/source/container/img/netbeans-compile.png b/doc/sphinx-guides/source/container/img/netbeans-compile.png new file mode 100644 index 00000000000..e429695ccb0 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/netbeans-compile.png differ diff --git a/doc/sphinx-guides/source/container/img/netbeans-run.png b/doc/sphinx-guides/source/container/img/netbeans-run.png new file mode 100644 index 00000000000..00f8af23cc5 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/netbeans-run.png differ diff --git a/doc/sphinx-guides/source/container/img/netbeans-servers-common.png b/doc/sphinx-guides/source/container/img/netbeans-servers-common.png new file mode 100644 index 00000000000..a9ded5dbec3 Binary files /dev/null and b/doc/sphinx-guides/source/container/img/netbeans-servers-common.png differ diff --git a/doc/sphinx-guides/source/container/img/netbeans-servers-java.png b/doc/sphinx-guides/source/container/img/netbeans-servers-java.png new file mode 100644 index 00000000000..2593cacc5ae Binary files /dev/null and b/doc/sphinx-guides/source/container/img/netbeans-servers-java.png differ diff --git a/doc/sphinx-guides/source/container/index.rst b/doc/sphinx-guides/source/container/index.rst new file mode 100644 index 00000000000..38641cce642 --- /dev/null +++ b/doc/sphinx-guides/source/container/index.rst @@ -0,0 +1,14 @@ +Container Guide +=============== + +**Contents:** + +.. toctree:: + :maxdepth: 2 + + intro + running/index + dev-usage + base-image + app-image + configbaker-image diff --git a/doc/sphinx-guides/source/container/intro.rst b/doc/sphinx-guides/source/container/intro.rst new file mode 100644 index 00000000000..5099531dcc9 --- /dev/null +++ b/doc/sphinx-guides/source/container/intro.rst @@ -0,0 +1,28 @@ +Introduction +============ + +Dataverse in containers! + +.. contents:: |toctitle| + :local: + +Intended Audience +----------------- + +This guide is intended for anyone who wants to run Dataverse in containers. This is potentially a wide audience, from sysadmins interested in running Dataverse in production in containers (not recommended yet) to contributors working on a bug fix (encouraged!). See :doc:`running/index` for various scenarios and please let us know if your use case is not covered. + +.. _getting-help-containers: + +Getting Help +------------ + +Please ask in #containers at https://chat.dataverse.org + +Alternatively, you can try one or more of the channels under :ref:`support`. + +.. _helping-containers: + +Helping with the Containerization Effort +---------------------------------------- + +In 2023 the Containerization Working Group started meeting regularly. All are welcome to join! We talk in #containers at https://chat.dataverse.org and have a regular video call. For details, please visit https://ct.gdcc.io diff --git a/doc/sphinx-guides/source/container/running/backend-dev.rst b/doc/sphinx-guides/source/container/running/backend-dev.rst new file mode 100644 index 00000000000..8b2dab956ad --- /dev/null +++ b/doc/sphinx-guides/source/container/running/backend-dev.rst @@ -0,0 +1,10 @@ +Backend Development +=================== + +.. contents:: |toctitle| + :local: + +Intro +----- + +See :doc:`../dev-usage`. diff --git a/doc/sphinx-guides/source/container/running/demo.rst b/doc/sphinx-guides/source/container/running/demo.rst new file mode 100644 index 00000000000..d4afee8a18a --- /dev/null +++ b/doc/sphinx-guides/source/container/running/demo.rst @@ -0,0 +1,350 @@ +Demo or Evaluation +================== + +In the following tutorial we'll walk through spinning up Dataverse in containers for demo or evaluation purposes. + +.. contents:: |toctitle| + :local: + +Quickstart +---------- + +First, let's confirm that we can get Dataverse running on your system. + +- Download :download:`compose.yml <../../../../../docker/compose/demo/compose.yml>` +- Run ``docker compose up`` in the directory where you put ``compose.yml`` +- Visit http://localhost:8080 and try logging in: + + - username: dataverseAdmin + - password: admin1 + +If you can log in, great! Please continue through the tutorial. If you have any trouble, please consult the sections below on troubleshooting and getting help. + +Stopping and Starting the Containers +------------------------------------ + +Let's practice stopping the containers and starting them up again. Your data, stored in a directory called ``data``, will remain intact + +To stop the containers hit ``Ctrl-c`` (hold down the ``Ctrl`` key and then hit the ``c`` key). + +To start the containers, run ``docker compose up``. + +.. _starting-over: + +Deleting Data and Starting Over +------------------------------- + +Again, data related to your Dataverse installation such as the database is stored in a directory called ``data`` that gets created in the directory where you ran ``docker compose`` commands. + +You may reach a point during your demo or evaluation that you'd like to start over with a fresh database. Simply make sure the containers are not running and then remove the ``data`` directory. Now, as before, you can run ``docker compose up`` to spin up the containers. + +Setting Up for a Demo +--------------------- + +Now that you are familiar with the basics of running Dataverse in containers, let's move on to a better setup for a demo or evaluation. + +Starting Fresh +++++++++++++++ + +For this exercise, please start fresh by stopping all containers and removing the ``data`` directory. + +.. _demo-persona: + +Creating and Running a Demo Persona ++++++++++++++++++++++++++++++++++++ + +Previously we used the "dev" persona to bootstrap Dataverse, but for security reasons, we should create a persona more suited to demos and evaluations. + +Edit the ``compose.yml`` file and look for the following section. + +.. code-block:: bash + + bootstrap: + container_name: "bootstrap" + image: gdcc/configbaker:latest + restart: "no" + environment: + - TIMEOUT=3m + command: + - bootstrap.sh + - dev + #- demo + #volumes: + # - ./demo:/scripts/bootstrap/demo + networks: + - dataverse + +Comment out "dev" and uncomment "demo". + +Uncomment the "volumes" section. + +Create a directory called "demo" and copy :download:`init.sh <../../../../../modules/container-configbaker/scripts/bootstrap/demo/init.sh>` into it. You are welcome to edit this demo init script, customizing the final message, for example. + +Note that the init script contains a key for using the admin API once it is blocked. You should change it in the script from "unblockme" to something only you know. + +Now run ``docker compose up``. The "bootstrap" container should exit with the message from the init script and Dataverse should be running on http://localhost:8080 as before during the quickstart exercise. + +One of the main differences between the "dev" persona and our new "demo" persona is that we are now running the setup-all script without the ``--insecure`` flag. This makes our installation more secure, though it does block "admin" APIs that are useful for configuration. + +Smoke Testing +------------- + +At this point, please try the following basic operations within your installation: + +- logging in as dataverseAdmin (password "admin1") +- publishing the "root" collection (dataverse) +- creating a collection +- creating a dataset +- uploading a data file +- publishing the dataset + +If anything isn't working, please see the sections below on troubleshooting, giving feedback, and getting help. + +Further Configuration +--------------------- + +Now that we've verified through a smoke test that basic operations are working, let's configure our installation of Dataverse. + +Please refer to the :doc:`/installation/config` section of the Installation Guide for various configuration options. + +Below we'll explain some specifics for configuration in containers. + +JVM Options/MicroProfile Config ++++++++++++++++++++++++++++++++ + +:ref:`jvm-options` can be configured under ``JVM_ARGS`` in the ``compose.yml`` file. Here's an example: + +.. code-block:: bash + + environment: + JVM_ARGS: -Ddataverse.files.storage-driver-id=file1 + +Some JVM options can be configured as environment variables. For example, you can configure the database host like this: + +.. code-block:: bash + + environment: + DATAVERSE_DB_HOST: postgres + +We are in the process of making more JVM options configurable as environment variables. Look for the term "MicroProfile Config" in under :doc:`/installation/config` in the Installation Guide to know if you can use them this way. + +There is a final way to configure JVM options that we plan to deprecate once all JVM options have been converted to MicroProfile Config. Look for "magic trick" under "tunables" at :doc:`../app-image` for more information. + +Database Settings ++++++++++++++++++ + +Generally, you should be able to look at the list of :ref:`database-settings` and configure them but the "demo" persona above secured your installation to the point that you'll need an "unblock key" to access the "admin" API and change database settings. + +In the example below of configuring :ref:`:FooterCopyright` we use the default unblock key of "unblockme" but you should use the key you set above. + +``curl -X PUT -d ", My Org" "http://localhost:8080/api/admin/settings/:FooterCopyright?unblock-key=unblockme"`` + +One you make this change it should be visible in the copyright in the bottom left of every page. + +Root Collection Customization (Alias, Name, etc.) ++++++++++++++++++++++++++++++++++++++++++++++++++ + +Before running ``docker compose up`` for the first time, you can customize the root collection by placing a JSON file in the right place. + +First, in the "demo" directory you created (see :ref:`demo-persona`), create a subdirectory called "config": + +``mkdir demo/config`` + +Next, download :download:`dataverse-complete.json <../../_static/api/dataverse-complete.json>` and put it in the "config" directory you just created. The contents of your "demo" directory should look something like this: + +.. code-block:: bash + + % find demo + demo + demo/config + demo/config/dataverse-complete.json + demo/init.sh + +Edit ``dataverse-complete.json`` to have the values you want. You'll want to refer to :ref:`update-dataverse-api` in the API Guide to understand the format. In that documentation you can find optional parameters as well. + +To test your JSON file, run ``docker compose up``. Again, this only works when you are running ``docker compose up`` for the first time. (You can always start over. See :ref:`starting-over`.) + +Multiple Languages +++++++++++++++++++ + +Generally speaking, you'll want to follow :ref:`i18n` in the Installation Guide to set up multiple languages. (You need to create your own "languages.zip" file, for example.) Here will give you guidance specific to this demo tutorial. We'll be setting up a toggle between English and French. + +First, edit the ``compose.yml`` file and uncomment the following line: + +.. code-block:: text + + #-Ddataverse.lang.directory=/dv/lang + +Next, upload "languages.zip" to the "loadpropertyfiles" API endpoint as shown below. This will place files ending in ".properties" into the ``/dv/lang`` directory configured above. + +Please note that we are using a slight variation on the command in the instructions above, adding the unblock key we created above: + +``curl "http://localhost:8080/api/admin/datasetfield/loadpropertyfiles?unblock-key=unblockme" -X POST --upload-file /tmp/languages/languages.zip -H "Content-Type: application/zip"`` + +Next, set up the UI toggle between English and French, again using the unblock key: + +``curl "http://localhost:8080/api/admin/settings/:Languages?unblock-key=unblockme" -X PUT -d '[{"locale":"en","title":"English"},{"locale":"fr","title":"Français"}]'`` + +Stop and start the Dataverse container in order for the language toggle to work. + +PID Providers ++++++++++++++ + +Dataverse supports multiple Persistent ID (PID) providers. The ``compose.yml`` file uses the Permalink PID provider. Follow :ref:`pids-configuration` to reconfigure as needed. + +.. _file-previewers-ct: + +File Previewers ++++++++++++++++ + +By default, all available file previewers are enabled (see :ref:`file-previews` in the User Guide for details). Specifically, we enable all the previewers that are available in the `trivadis/dataverse-previewers-provider `_ image (`code `_). You can run the following command to see a list of available previewers: + +``docker run --rm trivadis/dataverse-deploy-previewers:latest previewers`` + +You should expect to see output like this: + +.. code-block:: text + + name description + ---------------------------- + text Read the text file. + html View the html file. + ... + +If you want to specify fewer previewers, you can edit the ``compose.yml`` file. Uncomment "INCLUDE_PREVIEWERS" and list the previewers you want, separated by commas, like this: + +``INCLUDE_PREVIEWERS=text,html,pdf,csv`` + + +.. _additional-metadata-blocks: + +Additional Metadata Blocks +++++++++++++++++++++++++++ + +Metadata fields such as "Title" are part of a metadata block such as "Citation". See :ref:`metadata-references` in the User Guide for the metadata blocks that ship with Dataverse. + +At a high level, we will be loading a metadata block and then adjusting our Solr config to know about it. + +Care should be taken when adding additional metadata blocks. There is no way to `preview `_ or `delete `_ a metadata block so please use a throwaway environment. + +:ref:`metadata-references` lists some experimental metadata blocks. In the example below, we'll use the CodeMeta block. + +First, download a metadata block or create one by following :doc:`/admin/metadatacustomization` in the Admin Guide. + +Load the metadata block like this: + +``curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file codemeta.tsv`` + +Next, reconfigure Solr to know about the new metadata block. + +You can back up your existing Solr schema like this: + +``cp docker-dev-volumes/solr/data/data/collection1/conf/schema.xml docker-dev-volumes/solr/data/data/collection1/conf/schema.xml.orig`` + +You can see the existing fields Solr knows about like this: + +``curl http://localhost:8983/solr/collection1/schema/fields`` + +Update your Solr schema with the following command: + +``curl http://localhost:8080/api/admin/index/solr/schema | docker run -i --rm -v ./docker-dev-volumes/solr/data:/var/solr gdcc/configbaker:unstable update-fields.sh /var/solr/data/collection1/conf/schema.xml`` + +Then, reload Solr: + +``curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1"`` + +You can get a diff of your old and new Solr schema like this: + +``diff docker-dev-volumes/solr/data/data/collection1/conf/schema.xml.orig docker-dev-volumes/solr/data/data/collection1/conf/schema.xml`` + +You should be able to see the new fields from the metadata block you added in the following output: + +``curl http://localhost:8983/solr/collection1/schema/fields`` + +At this point you can proceed with testing the metadata block in the Dataverse UI. First you'll need to enable it for a collection (see :ref:`general-information` in the User Guide section about collection). Afterwards, create a new dataset, save it, and then edit the metadata for that dataset. Your metadata block should appear. + +Next Steps +---------- + +From here, you are encouraged to continue poking around, configuring, and testing. You probably spend a lot of time reading the :doc:`/installation/config` section of the Installation Guide. + +Please consider giving feedback using the methods described below. Good luck with your demo! + +About the Containers +-------------------- + +Now that you've gone through the tutorial, you might be interested in the various containers you've spun up and what they do. + +Container List +++++++++++++++ + +If you run ``docker ps``, you'll see that multiple containers are spun up in a demo or evaluation. Here are the most important ones: + +- dataverse +- postgres +- solr +- smtp +- bootstrap + +Most are self-explanatory, and correspond to components listed under :doc:`/installation/prerequisites` in the (traditional) Installation Guide, but "bootstrap" refers to :doc:`../configbaker-image`. + +Additional containers are used in development (see :doc:`../dev-usage`), but for the purposes of a demo or evaluation, fewer moving (sometimes pointy) parts are included. + +Tags and Versions ++++++++++++++++++ + +The compose file references a tag called "latest", which corresponds to the latest released version of Dataverse. +This means that if a release of Dataverse comes out while you are demo'ing or evaluating, the version of Dataverse you are using could change if you do a ``docker pull``. +Feel free to change it to a specific version to avoid this. +For more on available tags, see supported tags section for :ref:`Application ` and :ref:`Config Baker ` images. + +Once Dataverse is running, you can check which version you have through the normal methods: + +- Check the bottom right in a web browser. +- Check http://localhost:8080/api/info/version via API. + +Troubleshooting +--------------- + +Hardware and Software Requirements +++++++++++++++++++++++++++++++++++ + +- 8 GB RAM (if not much else is running) +- Mac, Linux, or Windows (experimental) +- Docker + +Windows support is experimental but we are very interested in supporting Windows better. Please report bugs (see :ref:`helping-containers`). + +Bootstrapping Did Not Complete +++++++++++++++++++++++++++++++ + +In the compose file, try increasing the timeout for the bootstrap container: + +.. code-block:: bash + + environment: + - TIMEOUT=10m + +As described above, you'll want to stop containers, delete data, and start over with ``docker compose up``. To make sure the increased timeout is in effect, you can run ``docker logs bootstrap`` and look for the new value in the output: + +``Waiting for http://dataverse:8080 to become ready in max 10m.`` + +Wrapping Up +----------- + +Deleting the Containers and Data +++++++++++++++++++++++++++++++++ + +If you no longer need the containers because your demo or evaluation is finished and you want to reclaim disk space, run ``docker compose down`` in the directory where you put ``compose.yml``. + +You might also want to delete the ``data`` directory, as described above. + +Giving Feedback +--------------- + +Your feedback is extremely valuable to us! To let us know what you think, please see :ref:`helping-containers`. + +Getting Help +------------ + +Please do not be shy about reaching out for help. We very much want you to have a pleasant demo or evaluation experience. For ways to contact us, please see :ref:`getting-help-containers`. diff --git a/doc/sphinx-guides/source/container/running/frontend-dev.rst b/doc/sphinx-guides/source/container/running/frontend-dev.rst new file mode 100644 index 00000000000..88d40c12053 --- /dev/null +++ b/doc/sphinx-guides/source/container/running/frontend-dev.rst @@ -0,0 +1,10 @@ +Frontend Development +==================== + +.. contents:: |toctitle| + :local: + +Intro +----- + +The frontend (web interface) of Dataverse is being decoupled from the backend. This evolving codebase has its own repo at https://github.com/IQSS/dataverse-frontend which includes docs and scripts for running the backend of Dataverse in Docker. diff --git a/doc/sphinx-guides/source/container/running/github-action.rst b/doc/sphinx-guides/source/container/running/github-action.rst new file mode 100644 index 00000000000..ae42dd494d1 --- /dev/null +++ b/doc/sphinx-guides/source/container/running/github-action.rst @@ -0,0 +1,18 @@ +GitHub Action +============= + +.. contents:: |toctitle| + :local: + +Intro +----- + +A GitHub Action is under development that will spin up a Dataverse instance within the context of GitHub CI workflows: https://github.com/gdcc/dataverse-action + +Use Cases +--------- + +Use cases for the GitHub Action include: + +- Testing :doc:`/api/client-libraries` that interact with Dataverse APIs +- Testing :doc:`/admin/integrations` of third party software with Dataverse diff --git a/doc/sphinx-guides/source/container/running/index.rst b/doc/sphinx-guides/source/container/running/index.rst new file mode 100755 index 00000000000..a02266f7cba --- /dev/null +++ b/doc/sphinx-guides/source/container/running/index.rst @@ -0,0 +1,13 @@ +Running Dataverse in Docker +=========================== + +Contents: + +.. toctree:: + + production + demo + metadata-blocks + github-action + frontend-dev + backend-dev diff --git a/doc/sphinx-guides/source/container/running/metadata-blocks.rst b/doc/sphinx-guides/source/container/running/metadata-blocks.rst new file mode 100644 index 00000000000..fcc80ce1909 --- /dev/null +++ b/doc/sphinx-guides/source/container/running/metadata-blocks.rst @@ -0,0 +1,15 @@ +Editing Metadata Blocks +======================= + +.. contents:: |toctitle| + :local: + +Intro +----- + +The Admin Guide has a section on :doc:`/admin/metadatacustomization` and suggests running Dataverse in containers (Docker) for this purpose. + +Status +------ + +For now, please see :doc:`demo`, which should also provide a suitable Dockerized Dataverse environment. diff --git a/doc/sphinx-guides/source/container/running/production.rst b/doc/sphinx-guides/source/container/running/production.rst new file mode 100644 index 00000000000..4fe16447d7e --- /dev/null +++ b/doc/sphinx-guides/source/container/running/production.rst @@ -0,0 +1,39 @@ +Production (Future) +=================== + +.. contents:: |toctitle| + :local: + +Status +------ + +The images described in this guide are not yet recommended for production usage, but we think we are close. (Tagged releases are done; see the "supported image tags" section for :ref:`Application ` and :ref:`Config Baker ` images.) For now, please see :doc:`demo`. + +We'd like to make the following improvements: + +- More docs on setting up additional features + + - How to set up Rserve. + +- Go through all the features in docs and check what needs to be done differently with containers + + - Check ports, for example. + +To join the discussion on what else might be needed before declaring images ready for production, please comment on https://dataverse.zulipchat.com/#narrow/stream/375812-containers/topic/containers.20for.20production/near/434979159 + +You are also very welcome to join our meetings. See "how to help" below. + +Limitations +----------- + +- Multiple apps servers are not supported. See :ref:`multiple-app-servers` for more on this topic. + +How to Help +----------- + +You can help the effort to support these images in production by trying them out (see :doc:`demo`) and giving feedback (see :ref:`helping-containers`). + +Alternatives +------------ + +Until the images are ready for production, please use the traditional installation method described in the :doc:`/installation/index`. diff --git a/doc/sphinx-guides/source/contributor/code.md b/doc/sphinx-guides/source/contributor/code.md new file mode 100644 index 00000000000..c7154d14169 --- /dev/null +++ b/doc/sphinx-guides/source/contributor/code.md @@ -0,0 +1,59 @@ +# Contributing Code + +We love code contributions! There are lots of ways you can help. + +```{contents} Contents: +:local: +:depth: 3 +``` + +## Finding an Issue to Work On + +New contributors often wonder what issues they should work on first. + +### Many Codebases, Many Languages + +The primary codebase and issue tracker for Dataverse is . It's mostly backend code written in Java. However, there are many other codebases you can work on in a variety of languages. Here are a few that are especially active: + +- (Java) +- (React) +- (TypeScript) +- (Javascript) +- (Python) +- (Rust) +- (Ansible) +- (Javascript) + +If nothing above sparks joy, you can find more projects to work on under {doc}`/api/client-libraries`, {doc}`/api/external-tools`, {ref}`related-projects`, and {doc}`/api/apps`. + +### Picking a Good First Issue + +Once you've decided which codebase suits you, you should try to identify an issue to work on. Some codebases use a label like "good first issue" to suggest issues for newcomers. + +For the main codebase, please see {ref}`finding-github-issues-to-work-on` which includes information on labels like "good first issue". + +Other codebases may use different labels. Check the README or other documentation for that codebase. + +If there is a linked pull request that is trying to close the issue, you should probably find another issue. + +If you are having trouble finding an issue or have any questions at all, please do not hesitate to reach out. See {ref}`getting-help-developers`. + +## Making a Pull Request + +For the main codebase, please see {ref}`how-to-make-a-pull-request`. + +For other codebases, consult the README. + +## Reviewing Code + +Reviewing code is a great way to learn about a codebase. For any codebase you can browse open pull requests, of course, but for the primary codebases, you can take a look at the "Ready for Review" and "In Review" columns at . + +You are welcome to review code informally or to leave an actual review. We're interested in what you think. + +## Reproducing Bugs + +At times, bugs are reported that we haven't had time to confirm. You can help out by reproducing bugs and commenting on the issue the results you find. + +## Getting Help + +If you have any questions at all, please do not hesitate to reach out. See {ref}`getting-help-developers`. diff --git a/doc/sphinx-guides/source/contributor/documentation.md b/doc/sphinx-guides/source/contributor/documentation.md new file mode 100644 index 00000000000..2a8d6794921 --- /dev/null +++ b/doc/sphinx-guides/source/contributor/documentation.md @@ -0,0 +1,207 @@ +# Writing Documentation + +Thank you for your interest in contributing documentation to Dataverse! Good documentation is absolutely critical to the success of software. + +```{contents} Contents: +:local: +:depth: 3 +``` + +## Overview + +The Dataverse guides are written using [Sphinx](https://sphinx-doc.org). + +The source files are stored under [doc/sphinx-guides](https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides) in the main "dataverse" repo on GitHub. + +Historically, guides have been written in the default Sphinx format, [reStructuredText][] (.rst), but newer guides such as the {doc}`/contributor/index` are written in [Markdown][] (.md). + +[reStructuredText]: https://en.wikipedia.org/wiki/ReStructuredText +[Markdown]: https://en.wikipedia.org/wiki/Markdown + +Below we'll present a technique for making quick edits to the guides using GitHub's web editor ("quick fix"). We'll also describe how to install Sphinx locally for more significant edits. + +Finally, we'll provide some guidelines on writing content. + +We could use some help on writing this very page and helping newcomers get started. Please don't be shy about suggesting improvements! You can open an issue at , post to , write to the [mailing list](https://groups.google.com/g/dataverse-community), or suggest a change with a pull request. + +## Quick Fix + +If you find a typo or a small error in the documentation you can fix it using GitHub's online web editor. Generally speaking, we will be following [GitHub's guidance on editing files in another user's repository](https://docs.github.com/en/repositories/working-with-files/managing-files/editing-files#editing-files-in-another-users-repository). + +- Navigate to where you will see folders for each of the guides: [admin][], [api][], [container][], etc. +- Find the file you want to edit under one of the folders above. +- Click the pencil icon in the upper-right corner. If this is your first contribution to Dataverse, the hover text over the pencil icon will say "Fork this project and edit this file". +- Make changes to the file and preview them. +- In the **Commit changes** box, enter a description of the changes you have made and click **Propose file change**. +- Under the **Write** tab, delete the long welcome message and write a few words about what you fixed. +- Click **Create Pull Request**. + +That's it! Thank you for your contribution! Your pull request will be added manually to the main Dataverse project board at and will go through code review and QA before it is merged into the "develop" branch. Along the way, developers might suggest changes or make them on your behalf. Once your pull request has been merged you will be listed as a contributor at ! 🎉 + +Please see for an example of a quick fix that was merged (the "Files changed" tab shows how a typo was fixed). + +Preview your documentation changes which will be built automatically as part of your pull request in Github. It will show up as a check entitled: `docs/readthedocs.org:dataverse-guide — Read the Docs build succeeded!`. For example, this PR built to . + +If you would like to read more about the Dataverse's use of GitHub, please see the {doc}`/developers/version-control` section of the Developer Guide. For bug fixes and features we request that you create an issue before making a pull request but this is not at all necessary for quick fixes to the documentation. + +[admin]: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/admin +[api]: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/api +[container]: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/container + +## Building the Guides with Sphinx + +While the "quick fix" technique shown above should work fine for minor changes, especially for larger changes, we recommend installing Sphinx on your computer or using a Sphinx Docker container to build the guides locally so you can get an accurate preview of your changes. + +In case you decide to use a Sphinx Docker container to build the guides, you can skip the next two installation sections, but you will need to have Docker installed. + +### Installing Sphinx + +First, make a fork of and clone your fork locally. Then change to the ``doc/sphinx-guides`` directory. + +``cd doc/sphinx-guides`` + +Create a Python virtual environment, activate it, then install dependencies: + +``python3 -m venv venv`` + +``source venv/bin/activate`` + +``pip install -r requirements.txt`` + +### Installing GraphViz + +In some parts of the documentation, graphs are rendered as images using the Sphinx GraphViz extension. + +Building the guides requires the ``dot`` executable from GraphViz. + +This requires having [GraphViz](https://graphviz.org) installed and either having ``dot`` on the path or +[adding options to the `make` call](https://groups.google.com/forum/#!topic/sphinx-users/yXgNey_0M3I). + +On a Mac we recommend installing GraphViz through [Homebrew](). Once you have Homebrew installed and configured to work with your shell, you can type `brew install graphviz`. + +### Editing and Building the Guides + +To edit the existing documentation: + +- Create a branch (see {ref}`how-to-make-a-pull-request`). +- In ``doc/sphinx-guides/source`` you will find the .rst files that correspond to https://guides.dataverse.org. +- Using your preferred text editor, open and edit the necessary files, or create new ones. + +Once you are done, you can preview the changes by building the guides locally. As explained, you can build the guides with Sphinx locally installed, or with a Docker container. + +#### Building the Guides with Sphinx Installed Locally + +Open a terminal, change directories to `doc/sphinx-guides`, activate (or reactivate) your Python virtual environment, and build the guides. + +`cd doc/sphinx-guides` + +`source venv/bin/activate` + +`make clean` + +`make html` + +#### Building the Guides with a Sphinx Docker Container and a Makefile + +We have added a Makefile to simplify the process of building the guides using a Docker container, you can use the following commands from the repository root: + +- `make docs-html` +- `make docs-pdf` +- `make docs-epub` +- `make docs-all` + +#### Building the Guides with a Sphinx Docker Container and CLI + +If you want to build the guides using a Docker container, execute the following command in the repository root: + +`docker run -it --rm -v $(pwd):/docs sphinxdoc/sphinx:7.2.6 bash -c "cd doc/sphinx-guides && pip3 install -r requirements.txt && make html"` + +#### Previewing the Guides + +After Sphinx is done processing the files you should notice that the `html` folder in `doc/sphinx-guides/build` directory has been updated. You can click on the files in the `html` folder to preview the changes. + +Now you can make a commit with the changes to your own fork in GitHub and submit a pull request. See {ref}`how-to-make-a-pull-request`. + +## Writing Guidelines + +### Writing Style Guidelines + +Please observe the following when writing documentation: + +- Use American English spelling. +- Use examples when possible. +- Break up longer paragraphs. +- Use Title Case in Headings. +- Use "double quotes" instead of 'single quotes'. +- Favor "and" (data and code) over slashes (data/code). + +### Table of Contents + +Every non-index page should use the following code to display a table of contents of internal sub-headings. This code should be placed below any introductory text and directly above the first subheading, much like a Wikipedia page. + +If the page is written in reStructuredText (.rst), use this form: + + .. contents:: |toctitle| + :local: + +If the page is written in Markdown (.md), use this form: + + ```{contents} Contents: + :local: + :depth: 3 + ``` + +### Links + +Getting links right with .rst files can be tricky. + +#### Custom Titles + +You can use a custom title when linking to a document like this: + + :doc:`Custom title ` + +See also + +### Images + +A good documentation is just like a website enhanced and upgraded by adding high quality and self-explanatory images. Often images depict a lot of written text in a simple manner. Within our Sphinx docs, you can add them in two ways: a) add a PNG image directly and include or b) use inline description languages like GraphViz (current only option). + +While PNGs in the git repo can be linked directly via URL, Sphinx-generated images do not need a manual step and might provide higher visual quality. Especially in terms of quality of content, generated images can be extendend and improved by a textbased and reviewable commit, without needing raw data or source files and no diff around. + +### Cross References + +When adding ReStructured Text (.rst) [cross references](https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#ref-role), use the hyphen character (`-`) as the word separator for the cross reference label. For example, `my-reference-label` would be the preferred label for a cross reference as opposed to, for example, `my_reference_label`. + +## PDF Version of the Guides + +The HTML version of the guides is the official one. Any other formats are maintained on a best effort basis. + +If you would like to build a PDF version of the guides and have Docker installed, please try the command below from the root of the git repo: + +`docker run -it --rm -v $(pwd):/docs sphinxdoc/sphinx-latexpdf:7.2.6 bash -c "cd doc/sphinx-guides && pip3 install -r requirements.txt && make latexpdf LATEXMKOPTS=\"-interaction=nonstopmode\"; cd ../.. && ls -1 doc/sphinx-guides/build/latex/Dataverse.pdf"` + +A few notes about the command above: + +- Hopefully the PDF was created at `doc/sphinx-guides/build/latex/Dataverse.pdf`. +- For now, we are using "nonstopmode" but this masks some errors. +- See requirements.txt for a note regarding the version of Sphinx we are using. + +Also, as of this writing we have enabled PDF builds from the "develop" branch. You download the PDF from + +If you would like to help improve the PDF version of the guides, please get in touch! Please see {ref}`getting-help-developers` for ways to contact the developer community. + + +## Hosting Your Own Version of the Guides + +Some installations of Dataverse maintain their own versions of the guides and use settings like {ref}`:NavbarGuidesUrl` or {ref}`:GuidesBaseUrl` to point their users to them. + +### Having Google Index the Latest Version + +As each version of the Dataverse software is released, there is an updated version of the guides released with it. Google and other search engines index all versions, which may confuse users who discover your guides in the search results as to which version they should be looking at. When learning about your installation from the search results, it is best to be viewing the *latest* version. + +In order to make it clear to the crawlers that we only want the latest version discoverable in their search results, we suggest adding this to your `robots.txt` file: + + User-agent: * + Allow: /en/latest/ + Disallow: /en/ diff --git a/doc/sphinx-guides/source/contributor/index.md b/doc/sphinx-guides/source/contributor/index.md new file mode 100644 index 00000000000..f7979b1dd0c --- /dev/null +++ b/doc/sphinx-guides/source/contributor/index.md @@ -0,0 +1,105 @@ +# Contributor Guide + +Thank you for your interest in contributing to Dataverse! We are open to contributions from everyone. We welcome contributions of ideas, bug reports, documentation, code, and more! + +```{contents} Contents: +:local: +:depth: 2 +``` + +## Ideas and Feature Requests + +We would love to hear your ideas!💡 + +1. Please check if your idea or feature request is already captured in our [issue tracker][] or [roadmap][]. +1. Bring your idea to the community by posting on our [Google Group][] or [chat.dataverse.org][]. +1. To discuss privately, email us at . + +[issue tracker]: https://github.com/IQSS/dataverse/issues +[roadmap]: https://www.iq.harvard.edu/roadmap-dataverse-project +[chat.dataverse.org]: http://chat.dataverse.org +[Google Group]: https://groups.google.com/group/dataverse-community + +## Bug Reports + +Before submitting an issue, please search for existing issues in our [issue tracker][]. If there is an existing open issue that matches the issue you want to report, please add a comment to it or give it a 👍 (thumbs up). + +If there is no pre-existing issue or it has been closed, please open a new issue (unless it is a security issue which should be reported privately to as discussed under {ref}`reporting-security-issues` in the Installation Guide). + +If you do not receive a reply to your new issue or comment in a timely manner, please ping us at [chat.dataverse.org][]. + +## Documentation + +Documentation is such a large topic (and important!) that we have a dedicate section on it: + +```{toctree} +:maxdepth: 1 +documentation.md +``` + +## Translations + +If you speak multiple languages, you are very welcome to help us translate Dataverse! Please see {ref}`help-translate` for details. + +## Code + +Dataverse is open source and we love code contributions. Developers are not limited to the main Dataverse code in this git repo. We have projects in C, C++, Go, Java, Javascript, Julia, PHP, Python, R, Ruby, Rust, TypeScript and more. To get started, please see the following pages: + +```{toctree} +:maxdepth: 1 +code.md +``` + +## Usability Testing + +Please email us at or fill in our [feedback form][] if you are interested in participating in usability testing. + +[feedback form]: https://goo.gl/forms/p7uu3GfiWYSlJrsi1 + +## Answering Questions + +People come along with questions on the [mailing list](https://groups.google.com/g/dataverse-community) and in [chat][] all the time. You are very welcome to help out by answering these questions to the best of your ability. + +[chat]: https://chat.dataverse.org + +## Sample Data + +Consider contributing to , a git repo of realistic-looking data that is used for testing. + +## Issue Triage + +New issues come in all the time, especially for the main issue tracker at . + +You can help by leaving comments. You can mention related issues or answer questions. + +If you are interested in adding issue labels or related curation, please get in touch! + +## Giving Talks + +If you give a recorded talk about Dataverse, we are happy to add it to [DataverseTV](https://dataverse.org/dataversetv). You can just leave a comment on the [spreadsheet](https://docs.google.com/spreadsheets/d/1uVk_57Ek_A49sLZ5OKdI6QASKloWNzykni3kcYNzpxA/edit#gid=0) or make some noise in [chat][]. + +For non-recorded talks, we are happy to upload your slides to . Please email . + +## Working Groups + +Most working groups are wide open to participation. For the current list of groups, please see . + +You're welcome to start your own working group, of course. We can help you get the word out. + +## GDCC + +The popularity of the Dataverse software has resulted in a continuously growing community with different needs and requirements. The Global Dataverse Community Consortium (GDCC) helps coordinate community efforts and sustain the software and community in the long term. Please consider contributing to GDCC by joining as an institutional member (). + +## Artwork + +Surely we can put artistic talent to use. A contributor [drew cartoon animals chatting about Dataverse](https://github.com/IQSS/chat.dataverse.org/issues/18), for example. + +As [annnounced](https://groups.google.com/g/dataverse-community/c/pM39_9O5Rug/m/CK-gJqZFBgAJ), we have a [sticker template](https://dataverse.org/sites/projects.iq.harvard.edu/files/dataverseorg/files/dataverse_community_stickers_template.zip) you can use. + +See [Illustrations from The Turing Way](https://zenodo.org/doi/10.5281/zenodo.3332807) for how that community has created artwork. Perhaps we can create a similar collection. + +## Other Contributions + +We consulted but no list is comprehensive. + +What else should we list here? How to YOU want to contribute to Dataverse? 🎉 diff --git a/doc/sphinx-guides/source/developers/api-design.rst b/doc/sphinx-guides/source/developers/api-design.rst new file mode 100755 index 00000000000..d51481fece4 --- /dev/null +++ b/doc/sphinx-guides/source/developers/api-design.rst @@ -0,0 +1,78 @@ +========== +API Design +========== + +API design is a large topic. We expect this page to grow over time. + +.. contents:: |toctitle| + :local: + +.. _openapi-dev: + +OpenAPI +------- + +As you add API endpoints, please be conscious that we are exposing these endpoints as an OpenAPI document at ``/openapi`` (e.g. http://localhost:8080/openapi ). See :ref:`openapi` in the API Guide for the user-facing documentation. + +We've played around with validation tools such as https://quobix.com/vacuum/ and https://pb33f.io/doctor/ only to discover that our OpenAPI output is less than ideal, generating various warnings and errors. + +You can prevent additional problems in our OpenAPI document by observing the following practices: + +- When creating a method name within an API class, make it unique. + +If you are looking for a reference about the annotations used to generate the OpenAPI document, you can find it in the `MicroProfile OpenAPI Specification `_. + +Paths +----- + +A reminder `from Wikipedia `_ of what a path is: + +.. code-block:: bash + + userinfo host port + ┌──┴───┐ ┌──────┴──────┐ ┌┴┐ + https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top + └─â”Ŧ─┘ └─────────────â”Ŧ────────────┘└───────â”Ŧ───────┘ └────────────â”Ŧ────────────┘ └â”Ŧ┘ + scheme authority path query fragment + +Exposing Settings +~~~~~~~~~~~~~~~~~ + +Since Dataverse 4, database settings have been exposed via API at http://localhost:8080/api/admin/settings + +(JVM options are probably available via the Payara REST API, but this is out of scope.) + +Settings need to be exposed outside to API clients outside of ``/api/admin`` (which is typically restricted to localhost). Here are some guidelines to follow when exposing settings. + +- When you are exposing a database setting as-is: + + - Use ``/api/info/settings`` as the root path. + + - Append the name of the setting including the colon (e.g. ``:DatasetPublishPopupCustomText``) + + - Final path example: ``/api/info/settings/:DatasetPublishPopupCustomText`` + +- If the absence of the database setting is filled in by a default value (e.g. ``:ZipDownloadLimit`` or ``:ApiTermsOfUse``): + + - Use ``/api/info`` as the root path. + + - Append the setting but remove the colon and downcase the first character (e.g. ``zipDownloadLimit``) + + - Final path example: ``/api/info/zipDownloadLimit`` + +- If the database setting you're exposing make more sense outside of ``/api/info`` because there's more context (e.g. ``:CustomDatasetSummaryFields``): + + - Feel free to use a path outside of ``/api/info`` as the root path. + + - Given additional context, append a shortened name (e.g. ``/api/datasets/summaryFieldNames``). + + - Final path example: ``/api/datasets/summaryFieldNames`` + +- If you need to expose a JVM option (MicroProfile setting) such as ``dataverse.api.allow-incomplete-metadata``: + + - Use ``/api/info`` as the root path. + + - Append a meaningful name for the setting (e.g. ``incompleteMetadataViaApi``). + + - Final path example: ``/api/info/incompleteMetadataViaApi`` + diff --git a/doc/sphinx-guides/source/developers/aux-file-support.rst b/doc/sphinx-guides/source/developers/aux-file-support.rst new file mode 100644 index 00000000000..9b2734b3a25 --- /dev/null +++ b/doc/sphinx-guides/source/developers/aux-file-support.rst @@ -0,0 +1,69 @@ +Auxiliary File Support +====================== + +Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the `OpenDP project `_. In future versions, this approach will likely become more broadly used and supported. + +Adding an Auxiliary File to a Datafile +-------------------------------------- +To add an auxiliary file, specify the primary key of the datafile (FILE_ID), and the formatTag and formatVersion (if applicable) associated with the auxiliary file. There are multiple form parameters. "Origin" specifies the application/entity that created the auxiliary file, and "isPublic" controls access to downloading the file. If "isPublic" is true, any user can download the file if the dataset has been published, else, access authorization is based on the access rules as defined for the DataFile itself. The "type" parameter is used to group similar auxiliary files in the UI. Currently, auxiliary files with type "DP" appear under "Differentially Private Statistics", while all other auxiliary files appear under "Other Auxiliary Files". + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export FILENAME='auxfile.json' + export FILETYPE='application/json' + export FILE_ID='12345' + export FORMAT_TAG='dpJson' + export FORMAT_VERSION='v1' + export TYPE='DP' + export SERVER_URL=https://demo.dataverse.org + + curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME;type=$FILETYPE" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION" + +You should expect a 200 ("OK") response and JSON with information about your newly uploaded auxiliary file. + +Downloading an Auxiliary File that Belongs to a Datafile +-------------------------------------------------------- +To download an auxiliary file, use the primary key of the datafile, and the formatTag and formatVersion (if applicable) associated with the auxiliary file. An API token is shown in the example below but it is not necessary if the auxiliary file was uploaded with isPublic=true and the dataset has been published. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export FILE_ID='12345' + export FORMAT_TAG='dpJson' + export FORMAT_VERSION='v1' + + curl -H X-Dataverse-key:$API_TOKEN "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION" + +Listing Auxiliary Files for a Datafile by Origin +------------------------------------------------ +To list auxiliary files, specify the primary key of the datafile (FILE_ID), and the origin associated with the auxiliary files to list (the application/entity that created them). + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export FILE_ID='12345' + export SERVER_URL=https://demo.dataverse.org + export ORIGIN='app1' + + curl -H X-Dataverse-key:$API_TOKEN "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$ORIGIN" + +You should expect a 200 ("OK") response and a JSON array with objects representing the auxiliary files found, or a 404/Not Found response if no auxiliary files exist with that origin. + +Deleting an Auxiliary File that Belongs to a Datafile +----------------------------------------------------- +To delete an auxiliary file, use the primary key of the datafile, and the +formatTag and formatVersion (if applicable) associated with the auxiliary file: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export FILE_ID='12345' + export FORMAT_TAG='dpJson' + export FORMAT_VERSION='v1' + + curl -H X-Dataverse-key:$API_TOKEN DELETE -X "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION" + + diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index 37a794e804e..75a50e2513d 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -1,270 +1,192 @@ Big Data Support ================ -Big data support is highly experimental. Eventually this content will move to the Installation Guide. +Big data support includes some experimental options. Eventually more of this content will move to the Installation Guide. .. contents:: |toctitle| :local: -Various components need to be installed and configured for big data support. +Various components will need to be installed and/or configured for big data support via the methods described below. -Data Capture Module (DCM) -------------------------- +S3 Direct Upload and Download +----------------------------- -Data Capture Module (DCM) is an experimental component that allows users to upload large datasets via rsync over ssh. +A lightweight option for supporting file sizes beyond a few gigabytes - a size that can cause performance issues when uploaded through a Dataverse installation itself - is to configure an S3 store to provide direct upload and download via 'pre-signed URLs'. When these options are configured, file uploads and downloads are made directly to and from a configured S3 store using secure (https) connections that enforce a Dataverse installation's access controls. (The upload and download URLs are signed with a unique key that only allows access for a short time period and a Dataverse installation will only generate such a URL if the user has permission to upload/download the specific file in question.) -Install a DCM -~~~~~~~~~~~~~ +This option can handle files >300GB and could be appropriate for files up to a TB or larger. Other options can scale farther, but this option has the advantages that it is simple to configure and does not require any user training - uploads and downloads are done via the same interface as normal uploads to a Dataverse installation. -Installation instructions can be found at https://github.com/sbgrid/data-capture-module/blob/master/doc/installation.md. Note that shared storage (posix or AWS S3) between Dataverse and your DCM is required. You cannot use a DCM with Swift at this point in time. +To configure these options, an administrator must set two JVM options for the Dataverse installation using the same process as for other configuration options: -.. FIXME: Explain what ``dataverse.files.dcm-s3-bucket-name`` is for and what it has to do with ``dataverse.files.s3-bucket-name``. +``./asadmin create-jvm-options "-Ddataverse.files..download-redirect=true"`` -Once you have installed a DCM, you will need to configure two database settings on the Dataverse side. These settings are documented in the :doc:`/installation/config` section of the Installation Guide: +``./asadmin create-jvm-options "-Ddataverse.files..upload-redirect=true"`` -- ``:DataCaptureModuleUrl`` should be set to the URL of a DCM you installed. -- ``:UploadMethods`` should include ``dcm/rsync+ssh``. - -This will allow your Dataverse installation to communicate with your DCM, so that Dataverse can download rsync scripts for your users. - -Downloading rsync scripts via Dataverse API -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The rsync script can be downloaded from Dataverse via API using an authorized API token. In the curl example below, substitute ``$PERSISTENT_ID`` with a DOI or Handle: - -``curl -H "X-Dataverse-key: $API_TOKEN" $DV_BASE_URL/api/datasets/:persistentId/dataCaptureModule/rsync?persistentId=$PERSISTENT_ID`` - -How a DCM reports checksum success or failure to Dataverse -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Once the user uploads files to a DCM, that DCM will perform checksum validation and report to Dataverse the results of that validation. The DCM must be configured to pass the API token of a superuser. The implementation details, which are subject to change, are below. - -The JSON that a DCM sends to Dataverse on successful checksum validation looks something like the contents of :download:`checksumValidationSuccess.json <../_static/installation/files/root/big-data-support/checksumValidationSuccess.json>` below: - -.. literalinclude:: ../_static/installation/files/root/big-data-support/checksumValidationSuccess.json - :language: json - -- ``status`` - The valid strings to send are ``validation passed`` and ``validation failed``. -- ``uploadFolder`` - This is the directory on disk where Dataverse should attempt to find the files that a DCM has moved into place. There should always be a ``files.sha`` file and a least one data file. ``files.sha`` is a manifest of all the data files and their checksums. The ``uploadFolder`` directory is inside the directory where data is stored for the dataset and may have the same name as the "identifier" of the persistent id (DOI or Handle). For example, you would send ``"uploadFolder": "DNXV2H"`` in the JSON file when the absolute path to this directory is ``/usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/DNXV2H/DNXV2H``. -- ``totalSize`` - Dataverse will use this value to represent the total size in bytes of all the files in the "package" that's created. If 360 data files and one ``files.sha`` manifest file are in the ``uploadFolder``, this value is the sum of the 360 data files. - - -Here's the syntax for sending the JSON. - -``curl -H "X-Dataverse-key: $API_TOKEN" -X POST -H 'Content-type: application/json' --upload-file checksumValidationSuccess.json $DV_BASE_URL/api/datasets/:persistentId/dataCaptureModule/checksumValidation?persistentId=$PERSISTENT_ID`` - - -Steps to set up a DCM mock for Development -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -See instructions at https://github.com/sbgrid/data-capture-module/blob/master/doc/mock.md - - -Add Dataverse settings to use mock (same as using DCM, noted above): - -- ``curl http://localhost:8080/api/admin/settings/:DataCaptureModuleUrl -X PUT -d "http://localhost:5000"`` -- ``curl http://localhost:8080/api/admin/settings/:UploadMethods -X PUT -d "dcm/rsync+ssh"`` - -At this point you should be able to download a placeholder rsync script. Dataverse is then waiting for news from the DCM about if checksum validation has succeeded or not. First, you have to put files in place, which is usually the job of the DCM. You should substitute "X1METO" for the "identifier" of the dataset you create. You must also use the proper path for where you store files in your dev environment. - -- ``mkdir /usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/X1METO`` -- ``mkdir /usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/X1METO/X1METO`` -- ``cd /usr/local/glassfish4/glassfish/domains/domain1/files/10.5072/FK2/X1METO/X1METO`` -- ``echo "hello" > file1.txt`` -- ``shasum file1.txt > files.sha`` - - - -Now the files are in place and you need to send JSON to Dataverse with a success or failure message as described above. Make a copy of ``doc/sphinx-guides/source/_static/installation/files/root/big-data-support/checksumValidationSuccess.json`` and put the identifier in place such as "X1METO" under "uploadFolder"). Then use curl as described above to send the JSON. - -Troubleshooting -~~~~~~~~~~~~~~~ -The following low level command should only be used when troubleshooting the "import" code a DCM uses but is documented here for completeness. +With multiple stores configured, it is possible to configure one S3 store with direct upload and/or download to support large files (in general or for specific Dataverse collections) while configuring only direct download, or no direct access for another store. -``curl -H "X-Dataverse-key: $API_TOKEN" -X POST "$DV_BASE_URL/api/batch/jobs/import/datasets/files/$DATASET_DB_ID?uploadFolder=$UPLOAD_FOLDER&totalSize=$TOTAL_SIZE"`` - -Steps to set up a DCM via Docker for Development -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If you need a fully operating DCM client for development purposes, these steps will guide you to setting one up. This includes steps to set up the DCM on S3 variant. - -Docker Image Set-up -^^^^^^^^^^^^^^^^^^^ - -See https://github.com/IQSS/dataverse/blob/develop/conf/docker-dcm/readme.md - -- Install docker if you do not have it - -Optional steps for setting up the S3 Docker DCM Variant -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -- Before: the default bucket for DCM to hold files in S3 is named test-dcm. It is coded into `post_upload_s3.bash` (line 30). Change to a different bucket if needed. - - - Add AWS bucket info to dcmsrv - - Add AWS credentials to ``~/.aws/credentials`` - - - ``[default]`` - - ``aws_access_key_id =`` - - ``aws_secret_access_key =`` - -- Dataverse configuration (on dvsrv): - - - Set S3 as the storage driver - - - ``cd /opt/glassfish4/bin/`` - - ``./asadmin delete-jvm-options "\-Ddataverse.files.storage-driver-id=file"`` - - ``./asadmin create-jvm-options "\-Ddataverse.files.storage-driver-id=s3"`` - - - Add AWS bucket info to Dataverse - - Add AWS credentials to ``~/.aws/credentials`` - - - ``[default]`` - - ``aws_access_key_id =`` - - ``aws_secret_access_key =`` - - - Also: set region in ``~/.aws/config`` to create a region file. Add these contents: - - - ``[default]`` - - ``region = us-east-1`` - - - Add the S3 bucket names to Dataverse - - - S3 bucket for Dataverse - - - ``/usr/local/glassfish4/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.s3-bucket-name=iqsstestdcmbucket"`` - - - S3 bucket for DCM (as Dataverse needs to do the copy over) +The direct upload option now switches between uploading the file in one piece (up to 1 GB by default) and sending it as multiple parts. The default can be changed by setting: + +``./asadmin create-jvm-options "-Ddataverse.files..min-part-size="`` - - ``/usr/local/glassfish4/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.dcm-s3-bucket-name=test-dcm"`` +For AWS, the minimum allowed part size is 5*1024*1024 bytes and the maximum is 5 GB (5*1024**3). Other providers may set different limits. - - Set download method to be HTTP, as DCM downloads through S3 are over this protocol ``curl -X PUT "http://localhost:8080/api/admin/settings/:DownloadMethods" -d "native/http"`` +It is also possible to set file upload size limits per store. See the :MaxFileUploadSizeInBytes setting described in the :doc:`/installation/config` guide. -Using the DCM Docker Containers -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +At present, one potential drawback for direct-upload is that files are only partially 'ingested' - tabular and FITS files are processed, but zip files are not unzipped, and the file contents are not inspected to evaluate their mimetype. This could be appropriate for large files, or it may be useful to completely turn off ingest processing for performance reasons (ingest processing requires a copy of the file to be retrieved by the Dataverse installation from the S3 store). A store using direct upload can be configured to disable all ingest processing for files above a given size limit: -For using these commands, you will need to connect to the shell prompt inside various containers (e.g. ``docker exec -it dvsrv /bin/bash``) +``./asadmin create-jvm-options "-Ddataverse.files..ingestsizelimit="`` -- Create a dataset and download rsync upload script +.. _s3-direct-upload-features-disabled: - - connect to client container: ``docker exec -it dcm_client bash`` - - create dataset: ``cd /mnt ; ./create.bash`` ; this will echo the database ID to stdout - - download transfer script: ``./get_transfer.bash $database_id_from_create_script`` - - execute the transfer script: ``bash ./upload-${database_id_from-create_script}.bash`` , and follow instructions from script. +Features that are Disabled if S3 Direct Upload is Enabled +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Run script +The following features are disabled when S3 direct upload is enabled. - - e.g. ``bash ./upload-3.bash`` (``3`` being the database id from earlier commands in this example). +- Unzipping of zip files. (See :ref:`compressed-files`.) +- Detection of file type based on JHOVE and custom code that reads the first few bytes except for the refinement of Stata file types to include the version. (See :ref:`redetect-file-type`.) +- Extraction of metadata from FITS files. (See :ref:`fits`.) +- Creation of NcML auxiliary files (See :ref:`netcdf-and-hdf5`.) +- Extraction of a geospatial bounding box from NetCDF and HDF5 files (see :ref:`netcdf-and-hdf5`) unless :ref:`dataverse.netcdf.geo-extract-s3-direct-upload` is set to true. -- Manually run post upload script on dcmsrv +.. _cors-s3-bucket: - - for posix implementation: ``docker exec -it dcmsrv /opt/dcm/scn/post_upload.bash`` - - for S3 implementation: ``docker exec -it dcmsrv /opt/dcm/scn/post_upload_s3.bash`` +Allow CORS for S3 Buckets +~~~~~~~~~~~~~~~~~~~~~~~~~ -Additional DCM docker development tips -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +**IMPORTANT:** One additional step that is required to enable direct uploads via a Dataverse installation and for direct download to work with previewers and direct upload to work with dvwebloader (:ref:`folder-upload`) is to allow cross site (CORS) requests on your S3 store. +The example below shows how to enable CORS rules (to support upload and download) on a bucket using the AWS CLI command line tool. Note that you may want to limit the AllowedOrigins and/or AllowedHeaders further. https://github.com/gdcc/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3 has some additional information about doing this. -- You can completely blow away all the docker images with these commands (including non DCM ones!) - - ``docker-compose -f docmer-compose.yml down -v`` +If you'd like to check the CORS configuration on your bucket before making changes: -- There are a few logs to tail +``aws s3api get-bucket-cors --bucket `` - - dvsrv : ``tail -n 2000 -f /opt/glassfish4/glassfish/domains/domain1/logs/server.log`` - - dcmsrv : ``tail -n 2000 -f /var/log/lighttpd/breakage.log`` - - dcmsrv : ``tail -n 2000 -f /var/log/lighttpd/access.log`` +To proceed with making changes: -- You may have to restart the glassfish domain occasionally to deal with memory filling up. If deployment is getting reallllllly slow, its a good time. +``aws s3api put-bucket-cors --bucket --cors-configuration file://cors.json`` -Repository Storage Abstraction Layer (RSAL) -------------------------------------------- +with the contents of the file cors.json as follows: -Steps to set up a DCM via Docker for Development -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. code-block:: json -See https://github.com/IQSS/dataverse/blob/develop/conf/docker-dcm/readme.md + { + "CORSRules": [ + { + "AllowedOrigins": ["*"], + "AllowedHeaders": ["*"], + "AllowedMethods": ["PUT", "GET"], + "ExposeHeaders": ["ETag", "Accept-Ranges", "Content-Encoding", "Content-Range"] + } + ] + } -Using the RSAL Docker Containers -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above. -- Create a dataset (either with the procedure mentioned in DCM Docker Containers, or another process) -- Publish the dataset (from the client container): ``cd /mnt; ./publish_major.bash ${database_id}`` -- Run the RSAL component of the workflow (from the host): ``docker exec -it rsalsrv /opt/rsal/scn/pub.py`` -- If desired, from the client container you can download the dataset following the instructions in the dataset access section of the dataset page. +.. _s3-tags-and-direct-upload: -Configuring the RSAL Mock +S3 Tags and Direct Upload ~~~~~~~~~~~~~~~~~~~~~~~~~ -Info for configuring the RSAL Mock: https://github.com/sbgrid/rsal/tree/master/mocks - -Also, to configure Dataverse to use the new workflow you must do the following (see also the :doc:`workflows` section): +Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and new files are added in the Dataverse installation. Note that not all S3 implementations support tags. Minio, for example, does not. With such stores, direct upload may not work and you might need to disable tagging. For details, see :ref:`s3-tagging` in the Installation Guide. -1. Configure the RSAL URL: +Trusted Remote Storage with the ``remote`` Store Type +----------------------------------------------------- -``curl -X PUT -d 'http://:5050' http://localhost:8080/api/admin/settings/:RepositoryStorageAbstractionLayerUrl`` +For very large, and/or very sensitive data, it may not make sense to transfer or copy files to Dataverse at all. The experimental ``remote`` store type in the Dataverse software now supports this use case. -2. Update workflow json with correct URL information: +With this storage option Dataverse stores a URL reference for the file rather than transferring the file bytes to a store managed directly by Dataverse. Basic configuration for a remote store is described at :ref:`file-storage` in the Configuration Guide. -Edit internal-httpSR-workflow.json and replace url and rollbackUrl to be the url of your RSAL mock. +Once the store is configured, it can be assigned to a collection or individual datasets as with other stores. In a dataset using this store, users can reference remote files which will then appear the same basic way as other datafiles. -3. Create the workflow: +Currently, remote files can only be added via the API. Users can also upload smaller files via the UI or API which will be stored in the configured base store. -``curl http://localhost:8080/api/admin/workflows -X POST --data-binary @internal-httpSR-workflow.json -H "Content-type: application/json"`` +If the store has been configured with a remote-store-name or remote-store-url, the dataset file table will include this information for remote files. These provide a visual indicator that the files are not managed directly by Dataverse and are stored/managed by a remote trusted store. -4. List available workflows: +Rather than sending the file bytes, metadata for the remote file is added using the "jsonData" parameter. +jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, etc. For remote references, the jsonData object must also include values for: -``curl http://localhost:8080/api/admin/workflows`` +* "storageIdentifier" - String, as specified in prior calls +* "fileName" - String +* "mimeType" - String +* fixity/checksum: either: -5. Set the workflow (id) as the default workflow for the appropriate trigger: + * "md5Hash" - String with MD5 hash value, or + * "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings -``curl http://localhost:8080/api/admin/workflows/default/PrePublishDataset -X PUT -d 2`` +The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512 -6. Check that the trigger has the appropriate default workflow set: +(The remote store leverages the same JSON upload syntax as the last step in direct upload to S3 described in the :ref:`Adding the Uploaded file to the Dataset ` section of the :doc:`/developers/s3-direct-upload-api`.) -``curl http://localhost:8080/api/admin/workflows/default/PrePublishDataset`` +.. code-block:: bash -7. Add RSAL to whitelist + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV + export JSON_DATA='{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"trs://images/dataverse_project_logo.svg", "fileName":"dataverse_logo.svg", "mimeType":"image/svg+xml", "checksum": {"@type": "SHA-1", "@value": "123456"}}' -8. When finished testing, unset the workflow: - -``curl -X DELETE http://localhost:8080/api/admin/workflows/default/PrePublishDataset`` + curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA" + +The variant allowing multiple files to be added once that is discussed in the :doc:`/developers/s3-direct-upload-api` document can also be used. -Configuring download via rsync -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Considerations: -In order to see the rsync URLs, you must run this command: +* Remote stores are configured with a base-url which limits what files can be referenced, i.e. the absolute URL for the file is /. +* The current store will not prevent you from providing a relative URL that results in a 404 when resolved. (I.e. if you make a typo). You should check to make sure the file exists at the location you specify - by trying to download in Dataverse, by checking to see that Dataverse was able to get the file size (which it does by doing a HEAD call to that location), or just manually trying the URL in your browser. +* Admins are trusting the organization managing the site/service at base-url to maintain the referenced files for as long as the Dataverse instance needs them. Formal agreements are recommended for production +* For large files, direct-download should always be used with a remote store. (Otherwise the Dataverse will be involved in the download.) +* For simple websites, a remote store should be marked public which will turn off restriction and embargo functionality in Dataverse (since Dataverse cannot restrict access to the file on the remote website) +* Remote stores can be configured with a secret-key. This key will be used to sign URLs when Dataverse retrieves the file content or redirects a user for download. If remote service is able to validate the signature and reject invalid requests, the remote store mechanism can be used to manage restricted and embargoes files, access requests in Dataverse, etc. Dataverse contains Java code that validates these signatures which could be used, for example, to create a validation proxy in front of a web server to allow Dataverse to manage access. The secret-key is a shared secret between Dataverse and the remote service and is not shared with/is not accessible by users or those with access to user's machines. +* Sophisticated remote services may wish to register file URLs that do not directly reference the file contents (bytes) but instead direct the user to a website where further information about the remote service's download process can be found. +* Due to the current design, ingest cannot be done on remote files and administrators should disable ingest when using a remote store. This can be done by setting the ingest size limit for the store to 0 and/or using the recently added option to not perform tabular ingest on upload. +* Dataverse will normally try to access the file contents itself, i.e. for ingest (in future versions), full-text indexing, thumbnail creation, etc. This processing may not be desirable for large/sensitive data, and, for the case where the URL does not reference the file itself, would not be possible. At present, administrators should configure the relevant size limits to avoid such actions. +* The current implementation of remote stores is experimental in the sense that future work to enhance it is planned. This work may result in changes to how the store works and lead to additional work when upgrading for sites that start using this mechanism now. -``curl -X PUT -d 'rsal/rsync' http://localhost:8080/api/admin/settings/:DownloadMethods`` +To configure the options mentioned above, an administrator must set two JVM options for the Dataverse installation using the same process as for other configuration options: -.. TODO: Document these in the Installation Guide once they're final. +``./asadmin create-jvm-options "-Ddataverse.files..download-redirect=true"`` +``./asadmin create-jvm-options "-Ddataverse.files..secret-key=somelongrandomalphanumerickeythelongerthebetter123456"`` +``./asadmin create-jvm-options "-Ddataverse.files..public=true"`` +``./asadmin create-jvm-options "-Ddataverse.files..ingestsizelimit="`` -To specify replication sites that appear in rsync URLs: +.. _globus-support: -Download :download:`add-storage-site.json <../../../../scripts/api/data/storageSites/add-storage-site.json>` and adjust it to meet your needs. The file should look something like this: +Globus File Transfer +-------------------- -.. literalinclude:: ../../../../scripts/api/data/storageSites/add-storage-site.json +Note: Globus file transfer is still experimental but feedback is welcome! See :ref:`support`. -Then add the storage site using curl: +Users can transfer files via `Globus `_ into and out of datasets, or reference files on a remote Globus endpoint, when their Dataverse installation is configured to use a Globus accessible store(s) +and a community-developed `dataverse-globus `_ app has been properly installed and configured. -``curl -H "Content-type:application/json" -X POST http://localhost:8080/api/admin/storageSites --upload-file add-storage-site.json`` +Globus endpoints can be in a variety of places, from data centers to personal computers. +This means that from within the Dataverse software, a Globus transfer can feel like an upload or a download (with Globus Personal Connect running on your laptop, for example) or it can feel like a true transfer from one server to another (from a cluster in a data center into a Dataverse dataset or vice versa). -You make a storage site the primary site by passing "true". Pass "false" to make it not the primary site. (id "1" in the example): +Globus transfer uses an efficient transfer mechanism and has additional features that make it suitable for large files and large numbers of files: -``curl -X PUT -d true http://localhost:8080/api/admin/storageSites/1/primaryStorage`` +* robust file transfer capable of restarting after network or endpoint failures +* third-party transfer, which enables a user accessing a Dataverse installation in their desktop browser to initiate transfer of their files from a remote endpoint (i.e. on a local high-performance computing cluster), directly to an S3 store managed by the Dataverse installation -You can delete a storage site like this (id "1" in the example): +Note: Due to differences in the access control models of a Dataverse installation and Globus and the current Globus store model, Dataverse cannot enforce per-file-access restrictions. +It is therefore recommended that a store be configured as public, which disables the ability to restrict and embargo files in that store, when Globus access is allowed. -``curl -X DELETE http://localhost:8080/api/admin/storageSites/1`` +Dataverse supports three options for using Globus, two involving transfer to Dataverse-managed endpoints and one allowing Dataverse to reference files on remote endpoints. +Dataverse-managed endpoints must be Globus 'guest collections' hosted on either a file-system-based endpoint or an S3-based endpoint (the latter requires use of the Globus +S3 connector which requires a paid Globus subscription at the host institution). In either case, Dataverse is configured with the Globus credentials of a user account that can manage the endpoint. +Users will need a Globus account, which can be obtained via their institution or directly from Globus (at no cost). -You can view a single storage site like this: (id "1" in the example): +With the file-system endpoint, Dataverse does not currently have access to the file contents. Thus, functionality related to ingest, previews, fixity hash validation, etc. are not available. (Using the S3-based endpoint, Dataverse has access via S3 and all functionality normally associated with direct uploads to S3 is available.) -``curl http://localhost:8080/api/admin/storageSites/1`` +For the reference use case, Dataverse must be configured with a list of allowed endpoint/base paths from which files may be referenced. In this case, since Dataverse is not accessing the remote endpoint itself, it does not need Globus credentials. +Users will need a Globus account in this case, and the remote endpoint must be configured to allow them access (i.e. be publicly readable, or potentially involving some out-of-band mechanism to request access (that could be described in the dataset's Terms of Use and Access). -You can view all storage site like this: +All of Dataverse's Globus capabilities are now store-based (see the store documentation) and therefore different collections/datasets can be configured to use different Globus-capable stores (or normal file, S3 stores, etc.) -``curl http://localhost:8080/api/admin/storageSites`` +More details of the setup required to enable Globus is described in the `Community Dataverse-Globus Setup and Configuration document `_ and the references therein. -In the GUI, this is called "Local Access". It's where you can compute on files on your cluster. +As described in that document, Globus transfers can be initiated by choosing the Globus option in the dataset upload panel. (Globus, which does asynchronous transfers, is not available during dataset creation.) Analogously, "Globus Transfer" is one of the download options in the "Access Dataset" menu and optionally the file landing page download menu (if/when supported in the dataverse-globus app). -``curl http://localhost:8080/api/admin/settings/:LocalDataAccessPath -X PUT -d "/programs/datagrid"`` +An overview of the control and data transfer interactions between components was presented at the 2022 Dataverse Community Meeting and can be viewed in the `Integrations and Tools Session Video `_ around the 1 hr 28 min mark. +See also :ref:`Globus settings <:GlobusSettings>`. +An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`. diff --git a/doc/sphinx-guides/source/developers/classic-dev-env.rst b/doc/sphinx-guides/source/developers/classic-dev-env.rst new file mode 100755 index 00000000000..d1f54fd9d5f --- /dev/null +++ b/doc/sphinx-guides/source/developers/classic-dev-env.rst @@ -0,0 +1,232 @@ +======================= +Classic Dev Environment +======================= + +These are the old instructions we used for Dataverse 4 and 5. They should still work but these days we favor running Dataverse in Docker as described in :doc:`dev-environment`. + +These instructions are purposefully opinionated and terse to help you get your development environment up and running as quickly as possible! Please note that familiarity with running commands from the terminal is assumed. + +.. contents:: |toctitle| + :local: + +Quick Start (Docker) +-------------------- + +The quickest way to get Dataverse running is in Docker as explained in :doc:`../container/dev-usage` section of the Container Guide. + + +Classic Dev Environment +----------------------- + +Since before Docker existed, we have encouraged installing Dataverse and all its dependencies directly on your development machine, as described below. This can be thought of as the "classic" development environment for Dataverse. + +However, in 2023 we decided that we'd like to encourage all developers to start using Docker instead and opened https://github.com/IQSS/dataverse/issues/9616 to indicate that we plan to rewrite this page to recommend the use of Docker. + +There's nothing wrong with the classic instructions below and we don't plan to simply delete them. They are a valid alternative to running Dataverse in Docker. We will likely move them to another page. + +Set Up Dependencies +------------------- + +Supported Operating Systems +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Mac OS X or Linux is required because the setup scripts assume the presence of standard Unix utilities. + +Windows is gaining support through Docker as described in the :doc:`windows` section. + +Install Java +~~~~~~~~~~~~ + +The Dataverse Software requires Java 17. + +We suggest downloading OpenJDK from https://adoptopenjdk.net + +On Linux, you are welcome to use the OpenJDK available from package managers. + +Install Netbeans or Maven +~~~~~~~~~~~~~~~~~~~~~~~~~ + +NetBeans IDE is recommended, and can be downloaded from https://netbeans.org . Developers may use any editor or IDE. We recommend NetBeans because it is free, works cross platform, has good support for Jakarta EE projects, and includes a required build tool, Maven. + +Below we describe how to build the Dataverse Software war file with Netbeans but if you prefer to use only Maven, you can find installation instructions in the :doc:`tools` section. + +Install Homebrew (Mac Only) +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +On Mac, install Homebrew to simplify the steps below: https://brew.sh + +Clone the Dataverse Software Git Repo +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Fork https://github.com/IQSS/dataverse and then clone your fork like this: + +``git clone git@github.com:[YOUR GITHUB USERNAME]/dataverse.git`` + +Build the Dataverse Software War File +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you installed Netbeans, follow these steps: + +- Launch Netbeans and click "File" and then "Open Project". Navigate to where you put the Dataverse Software code and double-click "Dataverse" to open the project. +- If you see "resolve project problems," go ahead and let Netbeans try to resolve them. This will probably including downloading dependencies, which can take a while. +- Allow Netbeans to install nb-javac (required for Java 8 and below). +- Select "Dataverse" under Projects and click "Run" in the menu and then "Build Project (Dataverse)". Check back for "BUILD SUCCESS" at the end. + +If you installed Maven instead of Netbeans, run ``mvn package``. Check for "BUILD SUCCESS" at the end. + +NOTE: Do you use a locale different than ``en_US.UTF-8`` on your development machine? Are you in a different timezone +than Harvard (Eastern Time)? You might experience issues while running tests that were written with these settings +in mind. The Maven ``pom.xml`` tries to handle this for you by setting the locale to ``en_US.UTF-8`` and timezone +``UTC``, but more, not yet discovered building or testing problems might lurk in the shadows. + +Install jq +~~~~~~~~~~ + +On Mac, run this command: + +``brew install jq`` + +On Linux, install ``jq`` from your package manager or download a binary from https://stedolan.github.io/jq/ + +.. _install-payara-dev: + +Install Payara +~~~~~~~~~~~~~~ + +Payara 6.2025.3 or higher is required. + +To install Payara, run the following commands: + +``cd /usr/local`` + +``sudo curl -O -L https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2025.3/payara-6.2025.3.zip`` + +``sudo unzip payara-6.2025.3.zip`` + +``sudo chown -R $USER /usr/local/payara6`` + +If nexus.payara.fish is ever down for maintenance, Payara distributions are also available from https://repo1.maven.org/maven2/fish/payara/distributions/payara/ + +Install Service Dependencies Directly on localhost +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Install PostgreSQL +^^^^^^^^^^^^^^^^^^ + +The Dataverse Software has been tested with PostgreSQL versions up to 17. PostgreSQL version 10+ is required. + +On Mac, go to https://www.postgresql.org/download/macosx/ and choose "Interactive installer by EDB" option. Note that version 16 is used in the command line examples below, but the process should be similar for other versions. When prompted to set a password for the "database superuser (postgres)" just enter "password". + +After installation is complete, make a backup of the ``pg_hba.conf`` file like this: + +``sudo cp /Library/PostgreSQL/16/data/pg_hba.conf /Library/PostgreSQL/16/data/pg_hba.conf.orig`` + +Then edit ``pg_hba.conf`` with an editor such as vi: + +``sudo vi /Library/PostgreSQL/16/data/pg_hba.conf`` + +In the "METHOD" column, change all instances of "scram-sha-256" (or whatever is in that column) to "trust". This will make it so PostgreSQL doesn't require a password. + +In the Finder, click "Applications" then "PostgreSQL 16" and launch the "Reload Configuration" app. Click "OK" after you see "server signaled". + +Next, to confirm the edit worked, launch the "pgAdmin" application from the same folder. Under "Browser", expand "Servers" and double click "PostgreSQL 16". When you are prompted for a password, leave it blank and click "OK". If you have successfully edited "pg_hba.conf", you can get in without a password. + +On Linux, you should just install PostgreSQL using your favorite package manager, such as ``yum``. (Consult the PostgreSQL section of :doc:`/installation/prerequisites` in the main Installation guide for more info and command line examples). Find ``pg_hba.conf`` and set the authentication method to "trust" and restart PostgreSQL. + +Install Solr +^^^^^^^^^^^^ + +`Solr `_ 9.8.0 is required. + +Follow the instructions in the "Installing Solr" section of :doc:`/installation/prerequisites` in the main Installation guide. + +Install Service Dependencies Using Docker Compose +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +To avoid having to install service dependencies like PostgreSQL or Solr directly on your localhost, there is the alternative of using the ``docker-compose-dev.yml`` file available in the repository root. For this option you need to have Docker and Docker Compose installed on your machine. + +The ``docker-compose-dev.yml`` can be configured to only run the service dependencies necessary to support a Dataverse installation running directly on localhost. In addition to PostgreSQL and Solr, it also runs a SMTP server. + +Before running the Docker Compose file, you need to update the value of the ``DATAVERSE_DB_USER`` environment variable to ``postgres``. The variable can be found inside the ``.env`` file in the repository root. This step is required as the Dataverse installation script expects that database user. + +To run the Docker Compose file, go to the Dataverse repository root, then run: + +``docker-compose -f docker-compose-dev.yml up -d --scale dev_dataverse=0`` + +Note that this command omits the Dataverse container defined in the Docker Compose file, since Dataverse is going to be installed directly on localhost in the next section. + +The command runs the containers in detached mode, but if you want to run them attached and thus view container logs in real time, remove the ``-d`` option from the command. + +Data volumes of each dependency will be persisted inside the ``docker-dev-volumes`` folder, inside the repository root. + +If you want to stop the containers, then run (for detached mode only, otherwise use ``Ctrl + C``): + +``docker-compose -f docker-compose-dev.yml stop`` + +If you want to remove the containers, then run: + +``docker-compose -f docker-compose-dev.yml down`` + +If you want to run a single container (the mail server, for example) then run: + +``docker-compose -f docker-compose-dev.yml up dev_smtp`` + +For a fresh installation, and before running the Software Installer Script, it is recommended to delete the docker-dev-env folder to avoid installation problems due to existing data in the containers. + +Run the Dataverse Software Installer Script +------------------------------------------- + +Navigate to the directory where you cloned the Dataverse Software git repo change directories to the ``scripts/installer`` directory like this: + +``cd scripts/installer`` + +Create a Python virtual environment, activate it, then install dependencies: + +``python3 -m venv venv`` + +``source venv/bin/activate`` + +``pip install psycopg2-binary`` + +The installer will try to connect to the SMTP server you tell it to use. If you haven't used the Docker Compose option for setting up the dependencies, or you don't have a mail server handy, you can run ``nc -l 25`` in another terminal and choose "localhost" (the default) to get past this check. + +Finally, run the installer (see also :download:`README_python.txt <../../../../scripts/installer/README_python.txt>` if necessary): + +``python3 install.py`` + +Verify the Dataverse Software is Running +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +After the script has finished, you should be able to log into your Dataverse installation with the following credentials: + +- http://localhost:8080 +- username: dataverseAdmin +- password: admin + +Configure Your Development Environment for Publishing +----------------------------------------------------- + +Run the following command: + +``curl http://localhost:8080/api/admin/settings/:DoiProvider -X PUT -d FAKE`` + +This will disable DOI registration by using a fake (in-code) DOI provider. Please note that this feature is only available in Dataverse Software 4.10+ and that at present, the UI will give no indication that the DOIs thus minted are fake. + +Developers may also wish to consider using :ref:`PermaLinks ` + +Configure Your Development Environment for GUI Edits +---------------------------------------------------- + +Out of the box, a JSF setting is configured for production use and prevents edits to the GUI (xhtml files) from being visible unless you do a full deployment. + +It is recommended that you run the following command so that simply saving the xhtml file in Netbeans is enough for the change to show up. + +``asadmin create-system-properties "dataverse.jsf.refresh-period=1"`` + +For more on JSF settings like this, see :ref:`jsf-config`. + +Next Steps +---------- + +If you can log in to the Dataverse installation, great! If not, please see the :doc:`troubleshooting` section. For further assistance, please see "Getting Help" in the :doc:`intro` section. + +You're almost ready to start hacking on code. Now that the installer script has you up and running, you need to continue on to the :doc:`tips` section to get set up to deploy code from your IDE or the command line. diff --git a/doc/sphinx-guides/source/developers/coding-style.rst b/doc/sphinx-guides/source/developers/coding-style.rst index 1ae11a2f4ca..2a1c0d5d232 100755 --- a/doc/sphinx-guides/source/developers/coding-style.rst +++ b/doc/sphinx-guides/source/developers/coding-style.rst @@ -2,7 +2,7 @@ Coding Style ============ -Like all development teams, the `Dataverse developers at IQSS `_ have their habits and styles when it comes to writing code. Let's attempt to get on the same page. :) +Like all development teams, the `Dataverse Project developers at IQSS `_ have their habits and styles when it comes to writing code. Let's attempt to get on the same page. :) .. contents:: |toctitle| :local: @@ -18,6 +18,11 @@ Tabs vs. Spaces Don't use tabs. Use 4 spaces. +Imports +^^^^^^^ + +Wildcard imports are neither encouraged nor discouraged. + Braces Placement ^^^^^^^^^^^^^^^^ @@ -57,12 +62,12 @@ Place curly braces according to the style below, which is an example you can see Format Code You Changed with Netbeans ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -As you probably gathered from the :doc:`dev-environment` section, IQSS has standardized on Netbeans. It is much appreciated when you format your code (but only the code you touched!) using the out-of-the-box Netbeans configuration. If you have created an entirely new Java class, you can just click Source -> Format. If you are adjusting code in an existing class, highlight the code you changed and then click Source -> Format. Keeping the "diff" in your pull requests small makes them easier to code review. +IQSS has standardized on Netbeans. It is much appreciated when you format your code (but only the code you touched!) using the out-of-the-box Netbeans configuration. If you have created an entirely new Java class, you can just click Source -> Format. If you are adjusting code in an existing class, highlight the code you changed and then click Source -> Format. Keeping the "diff" in your pull requests small makes them easier to code review. Checking Your Formatting With Checkstyle ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The easiest way to adopt Dataverse coding style is to use Netbeans as your IDE, avoid change the default Netbeans formatting settings, and only reformat code you've changed, as described above. +The easiest way to adopt the Dataverse Project coding style is to use Netbeans as your IDE, avoid change the default Netbeans formatting settings, and only reformat code you've changed, as described above. If you do not use Netbeans, you are encouraged to check the formatting of your code using Checkstyle. @@ -85,7 +90,7 @@ Use this ``logger`` field with varying levels such as ``fine`` or ``info`` like logger.fine("will get thumbnail from dataset logo"); -Generally speaking you should use ``fine`` for everything that you don't want to show up in Glassfish's ``server.log`` file by default. If you use a higher level such as ``info`` for common operations, you will probably hear complaints that your code is too "chatty" in the logs. These logging levels can be controlled at runtime both on your development machine and in production as explained in the :doc:`debugging` section. +Generally speaking you should use ``fine`` for everything that you don't want to show up by default in the app server's log file. If you use a higher level such as ``info`` for common operations, you will probably hear complaints that your code is too "chatty" in the logs. These logging levels can be controlled at runtime both on your development machine and in production as explained in the :doc:`debugging` section. When adding logging, do not simply add ``System.out.println()`` lines because the logging level cannot be controlled. @@ -97,7 +102,7 @@ Special strings should be defined as public constants. For example, ``DatasetFie Avoid Hard-Coding User-Facing Messaging in English ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -There is an ongoing effort to translate Dataverse into various languages. Look for "lang" or "languages" in the :doc:`/installation/config` section of the Installation Guide for details if you'd like to help or play around with this feature. +There is an ongoing effort to translate the Dataverse Software into various languages. Look for "lang" or "languages" in the :doc:`/installation/config` section of the Installation Guide for details if you'd like to help or play around with this feature. The translation effort is hampered if you hard code user-facing messages in English in the Java code. Put English strings in ``Bundle.properties`` and use ``BundleUtil`` to pull them out. This is especially important for messages that appear in the UI. We are aware that the API has many, many hard coded English strings in it. If you touch a method in the API and notice English strings, you are strongly encouraged to use that opportunity to move the English to ``Bundle.properties``. @@ -131,7 +136,3 @@ Bike Shedding What color should the `bike shed `_ be? :) Come debate with us about coding style in this Google doc that has public comments enabled: https://docs.google.com/document/d/1KTd3FpM1BI3HlBofaZjMmBiQEJtFf11jiiGpQeJzy7A/edit?usp=sharing - ----- - -Previous: :doc:`debugging` | Next: :doc:`deployment` diff --git a/doc/sphinx-guides/source/developers/configuration.rst b/doc/sphinx-guides/source/developers/configuration.rst new file mode 100644 index 00000000000..d342c28efc6 --- /dev/null +++ b/doc/sphinx-guides/source/developers/configuration.rst @@ -0,0 +1,126 @@ +Consuming Configuration +======================= + +.. contents:: |toctitle| + :local: + +The Dataverse Software uses different types of configuration: + +1. JVM system properties +2. Simple database value settings +3. Complex database stored data structures + +1 and 2 are usually simple text strings, boolean switches or digits. All of those can be found in :doc:`/installation/config`. + +Anything for 3 is configured via the API using either TSV or JSON structures. Examples are metadata blocks, +authentication providers, harvesters and others. + +Simple Configuration Options +---------------------------- + +Developers can access simple properties via: + +1. ``JvmSettings..lookup(...)`` for JVM system property settings. +2. ``SettingsServiceBean.get(...)`` for database settings. +3. ``SystemConfig.xxx()`` for specially treated settings, maybe mixed from 1 and 2 and other sources. +4. ``SettingsWrapper`` for use in frontend JSF (xhtml) pages to obtain settings from 2 and 3. Using the wrapper is a must for performance as explained in :ref:`avoid common efficiency issues with JSF render logic expressions + `. +5. ``System.getProperty()`` only for very special use cases not covered by ``JvmSettings``. + +As of Dataverse Software 5.3, we start to streamline our efforts into using a more consistent approach, also bringing joy and +happiness to all the system administrators out there. This will be done by adopting the use of +`MicroProfile Config `_ over time. + +So far we streamlined configuration of these Dataverse Software parts: + +- ✅ Database Connection + +Complex Configuration Options +----------------------------- + +We should enable variable substitution in JSON configuration. Example: using substitution to retrieve values from +MicroProfile Config and insert into the authentication provider would allow much easier provisioning of secrets +into the providers. + +Why should I care about MicroProfile Config API? +------------------------------------------------ + +Developers benefit from: + +- A streamlined API to retrieve configuration, backward-compatible renaming strategies and easier testbed configurations. +- Config API is also pushing for validation of configuration, as it's typesafe and converters for non-standard types + can be added within our codebase. +- Defaults in code or bundled in ``META-INF/microprofile-config.properties`` allow for optional values without much hassle. +- A single place to lookup any existing JVM setting in code, easier to keep in sync with the documentation. + +System administrators benefit from: + +- Lots of database settings have been introduced in the past, but should be more easily configurable and not rely on a + database connection. +- Running a Dataverse installation in containers gets much easier when configuration can be provisioned in a + streamlined fashion, mitigating the need for scripting glue and distinguishing between setting types. +- Classic installations have a profit, too: we can enable using a single config file, e.g. living in + ``/etc/dataverse/config.properties`` by adding our own, hot-reload config source. +- Features for monitoring resources and others are easier to use with this streamlined configuration, as we can + avoid people having to deal with ``asadmin`` commands and change a setting with comfort instead. + +Adopting MicroProfile Config API +--------------------------------- + +This technology is introduced on a step-by-step basis. There will not be a big shot, crashing upgrades for everyone. +Instead, we will provide backward compatibility by deprecating renamed or moved config options, while still +supporting the old way of setting them. + +- Introducing a new setting or moving an old one should result in a scoped key + ``dataverse..``. That way we enable sys admins to recognize the meaning of an option + and avoid name conflicts. + Starting with ``dataverse`` makes it perfectly clear that this is a setting meant for this application, which is + important when using environment variables, system properties or other MPCONFIG sources. +- Replace ``System.getProperty()`` calls with ``JvmSettings..lookup(...)``, adding the setting there first. + This might be paired with renaming and providing backward-compatible aliases. +- Database settings need to be refactored in multiple steps and it is not yet clear how this will be done. + Many Database settings are of very static nature and might be moved to JVM settings (in backward compatible ways). + +Adding a JVM Setting +^^^^^^^^^^^^^^^^^^^^ + +Whenever a new option gets added or an existing configuration gets migrated to +``edu.harvard.iq.dataverse.settings.JvmSettings``, you will attach the setting to an existing scope or create new +sub-scopes first. + +- Scopes and settings are organised in a tree-like structure within a single enum ``JvmSettings``. +- The root scope is "dataverse". +- All sub-scopes are below that. +- Scopes are separated by dots (periods). +- A scope may be a placeholder, filled with a variable during lookup. (Named object mapping.) +- The setting should be in kebab case (``signing-secret``) rather than camel case (``signingSecret``). + +Any consumer of the setting can choose to use one of the fluent ``lookup()`` methods, which hides away alias handling, +conversion etc from consuming code. See also the detailed Javadoc for these methods. + +Moving or Replacing a JVM Setting +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When moving an old key to a new (especially when doing so with a former JVM system property setting), you should +add an alias to the ``JvmSettings`` definition to enable backward compatibility. Old names given there are capable of +being used with patterned lookups. + +Another option is to add the alias in ``src/main/resources/META-INF/microprofile-aliases.properties``. The format is +always like ``dataverse..newname...=old.property.name``. Note this doesn't provide support for patterned +aliases. + +Details can be found in ``edu.harvard.iq.dataverse.settings.source.AliasConfigSource`` + +Adding a Feature Flag +^^^^^^^^^^^^^^^^^^^^^ + +Some parts of our codebase might be opt-in only. Experimental or optional feature previews can be switched on using our +usual configuration mechanism, a JVM setting. + +Feature flags are implemented in the enumeration ``edu.harvard.iq.dataverse.settings.FeatureFlags``, which allows for +convenient usage of it anywhere in the codebase. When adding a flag, please add it to the enum, think of a default +status, add some Javadocs about the flagged feature and add a ``@since`` tag to make it easier to identify when a flag +has been introduced. + +We want to maintain a list of all :ref:`feature flags ` in the :ref:`configuration guide `, +please add yours to the list. \ No newline at end of file diff --git a/doc/sphinx-guides/source/developers/containers.rst b/doc/sphinx-guides/source/developers/containers.rst index 31d13b38314..ed477ccefea 100755 --- a/doc/sphinx-guides/source/developers/containers.rst +++ b/doc/sphinx-guides/source/developers/containers.rst @@ -1,412 +1,31 @@ -================================= -Docker, Kubernetes, and OpenShift -================================= +================================== +Docker, Kubernetes, and Containers +================================== -Dataverse is exploring the use of Docker, Kubernetes, OpenShift and other container-related technologies. +The Dataverse community is exploring the use of Docker, Kubernetes, and other container-related technologies. .. contents:: |toctitle| :local: -OpenShift ---------- +Container Guide +--------------- -From the Dataverse perspective, we are in the business of providing a "template" for OpenShift that describes how the various components we build our application on (Glassfish, PostgreSQL, Solr, the Dataverse war file itself, etc.) work together. We publish Docker images to DockerHub at https://hub.docker.com/u/iqss/ that are used in this OpenShift template. +We recommend starting with the :doc:`/container/index`. The core Dataverse development team, with lots of help from the community, is iterating on containerizing the Dataverse software and its dependencies there. -Dataverse's (light) use of Docker is documented below in a separate section. We actually started with Docker in the context of OpenShift, which is why OpenShift is listed first but we can imagine rearranging this in the future. +Help Containerize Dataverse +--------------------------- -The OpenShift template for Dataverse can be found at ``conf/openshift/openshift.json`` and if you need to hack on the template or related files under ``conf/docker`` it is recommended that you iterate on them using Minishift. +If you would like to contribute to the containerization effort, please consider joining the `Containerization Working Group `_. -The instructions below will walk you through spinning up Dataverse within Minishift. It is recommended that you do this on the "develop" branch to make sure everything is working before changing anything. +Community-Lead Projects +----------------------- -Install Minishift -~~~~~~~~~~~~~~~~~ +The primary community-lead projects (which the core team is drawing inspiration from!) are: -Minishift requires a hypervisor and since we already use VirtualBox for Vagrant, you should install VirtualBox from http://virtualbox.org . +- https://github.com/IQSS/dataverse-docker +- https://github.com/IQSS/dataverse-kubernetes (especially the https://github.com/EOSC-synergy/dataverse-kubernetes fork) -Download the Minishift tarball from https://docs.openshift.org/latest/minishift/getting-started/installing.html and put the ``minishift`` binary in ``/usr/local/bin`` or somewhere in your ``$PATH``. This assumes Mac or Linux. These instructions were last tested on version ``v1.14.0+1ec5877`` of Minishift. +Using Containers for Reproducible Research +------------------------------------------ -At this point, you might want to consider going through the Minishift quickstart to get oriented: https://docs.openshift.org/latest/minishift/getting-started/quickstart.html - -Start Minishift -~~~~~~~~~~~~~~~ - -``minishift start --vm-driver=virtualbox --memory=8GB`` - -Make the OpenShift Client Binary (oc) Executable -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -``eval $(minishift oc-env)`` - -Log in to Minishift from the Command Line -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Note that if you just installed and started Minishift, you are probably logged in already. This ``oc login`` step is included in case you aren't logged in anymore. - -``oc login --username developer --password=whatever`` - -Use "developer" as the username and a couple characters as the password. - -Create a Minishift Project -~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Calling the project "project1" is fairly arbitrary. We'll probably want to revisit this name in the future. A project is necessary in order to create an OpenShift app. - -``oc new-project project1`` - -Note that ``oc projects`` will return a list of projects. - -Create a Dataverse App within the Minishift Project -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The following command operates on the ``conf/openshift/openshift.json`` file that resides in the main Dataverse git repo. It will download images from Docker Hub and use them to spin up Dataverse within Minishift/OpenShift. Later we will cover how to make changes to the images on Docker Hub. - -``oc new-app conf/openshift/openshift.json`` - -Log into Minishift and Visit Dataverse in your Browser -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -After running the ``oc new-app`` command above, deployment of Dataverse within Minishift/OpenShift will begin. You should log into the OpenShift web interface to check on the status of the deployment. If you just created the Minishift VM with the ``minishift start`` command above, the ``oc new-app`` step is expected to take a while because the images need to be downloaded from Docker Hub. Also, the installation of Dataverse takes a while. - -Typing ``minishift console`` should open the OpenShift web interface in your browser. The IP address might not be "192.168.99.100" but it's used below as an example. - -- https://192.168.99.100:8443 (or URL from ``minishift console``) -- username: developer -- password: - -In the OpenShift web interface you should see a link that looks something like http://dataverse-project1.192.168.99.100.nip.io but the IP address will vary and will match the output of ``minishift ip``. Eventually, after deployment is complete, the Dataverse web interface will appear at this URL and you will be able to log in with the username "dataverseAdmin" and the password "admin". - -Another way to verify that Dataverse has been succesfully deployed is to make sure that the Dataverse "info" API endpoint returns a version (note that ``minishift ip`` is used because the IP address will vary): - -``curl http://dataverse-project1.`minishift ip`.nip.io/api/info/version`` - -From the perspective of OpenShift and the ``openshift.json`` config file, the HTTP link to Dataverse in called a route. See also documentation for ``oc expose``. - -Troubleshooting -~~~~~~~~~~~~~~~ - -Here are some tips on troubleshooting your deployment of Dataverse to Minishift. - -Check Status of Dataverse Deployment to Minishift -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``oc status`` - -Once images have been downloaded from Docker Hub, the output below will change from ``Pulling`` to ``Pulled``. - -``oc get events | grep Pull`` - -This is a deep dive: - -``oc get all`` - -Review Logs of Dataverse Deployment to Minishift -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Logs are provided in the web interface to each of the deployment configurations. The URLs should be something like this (but the IP address) will vary and you should click "View Log". The installation of Dataverse is done within the one Glassfish deployment configuration: - -- https://192.168.99.100:8443/console/project/project1/browse/dc/dataverse-glassfish -- https://192.168.99.100:8443/console/project/project1/browse/dc/dataverse-postgresql -- https://192.168.99.100:8443/console/project/project1/browse/dc/dataverse-solr - -You can also see logs from each of the components (Glassfish, PostgreSQL, and Solr) from the command line with ``oc logs`` like this (just change the ``grep`` at the end): - -``oc logs $(oc get po -o json | jq '.items[] | select(.kind=="Pod").metadata.name' -r | grep glassfish)`` - -Get a Shell (ssh/rsh) on Containers Deployed to Minishift -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -You can get a shell on any of the containers for each of the components (Glassfish, PostgreSQL, and Solr) with ``oc rc`` (just change the ``grep`` at the end): - -``oc rsh $(oc get po -o json | jq '.items[] | select(.kind=="Pod").metadata.name' -r | grep glassfish)`` - -From the ``rsh`` prompt of the Glassfish container you could run something like the following to make sure that Dataverse is running on port 8080: - -``curl http://localhost:8080/api/info/version`` - -Cleaning up -~~~~~~~~~~~ - -If you simply wanted to try out Dataverse on Minishift and want to clean up, you can run ``oc delete project project1`` to delete the project or ``minishift stop`` and ``minishift delete`` to delete the entire Minishift VM and all the Docker containers inside it. - -Making Changes -~~~~~~~~~~~~~~ - -Making Changes to Docker Images -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -If you're interested in using Minishift for development and want to change the Dataverse code, you will need to get set up to create Docker images based on your changes and make them available within Minishift. - -It is recommended to add experimental images to Minishift's internal registry. Note that despite what https://docs.openshift.org/latest/minishift/openshift/openshift-docker-registry.html says you will not use ``docker push`` because we have seen "unauthorized: authentication required” when trying to push to it as reported at https://github.com/minishift/minishift/issues/817 . Rather you will run ``docker build`` and run ``docker images`` to see that your newly build images are listed in Minishift's internal registry. - -First, set the Docker environment variables so that ``docker build`` and ``docker images`` refer to the internal Minishift registry rather than your normal Docker setup: - -``eval $(minishift docker-env)`` - -When you're ready to build, change to the right directory: - -``cd conf/docker`` - -And then run the build script in "internal" mode: - -``./build.sh internal`` - -Note that ``conf/openshift/openshift.json`` must not have ``imagePullPolicy`` set to ``Always`` or it will pull from "iqss" on Docker Hub. Changing it to ``IfNotPresent`` allow Minishift to use the images shown from ``docker images`` rather than the ones on Docker Hub. - -Using Minishift for day to day Dataverse development might be something we want to investigate in the future. These blog posts talk about developing Java applications using Minishift/OpenShift: - -- https://blog.openshift.com/fast-iterative-java-development-on-openshift-kubernetes-using-rsync/ -- https://blog.openshift.com/debugging-java-applications-on-openshift-kubernetes/ - -Making Changes to the OpenShift Config -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -If you are interested in changing the OpenShift config file for Dataverse at ``conf/openshift/openshift.json`` note that in many cases once you have Dataverse running in Minishift you can use ``oc process`` and ``oc apply`` like this (but please note that some errors and warnings are expected): - -``oc process -f conf/openshift/openshift.json | oc apply -f -`` - -The slower way to iterate on the ``openshift.json`` file is to delete the project and re-create it. - -Making Changes to the PostgreSQL Database from the Glassfish Pod -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -You can access and modify the PostgreSQL database via an interactive terminal called psql. - -To log in to psql from the command line of the Glassfish pod, type the following command: - -``PGPASSWORD=$POSTGRES_PASSWORD; export PGPASSWORD; /usr/bin/psql -h $POSTGRES_SERVER.$POSTGRES_SERVICE_HOST -U $POSTGRES_USER -d $POSTGRES_DATABASE`` - -To log in as an admin, type this command instead: - -``PGPASSWORD=$POSTGRESQL_ADMIN_PASSWORD; export PGPASSWORD; /usr/bin/psql -h $POSTGRES_SERVER.$POSTGRES_SERVICE_HOST -U postgres -d $POSTGRES_DATABASE`` - -Scaling Dataverse by Increasing Replicas in a StatefulSet -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Glassfish, Solr and PostgreSQL Pods are in a "StatefulSet" which is a concept from OpenShift and Kubernetes that you can read about at https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ - -As of this writing, the ``openshift.json`` file has a single "replica" for each of these two stateful sets. It's possible to increase the number of replicas from 1 to 3, for example, with this command: - -``oc scale statefulset/dataverse-glassfish --replicas=3`` - -The command above should result in two additional Glassfish pods being spun up. The name of the pods is significant and there is special logic in the "zeroth" pod ("dataverse-glassfish-0" and "dataverse-postgresql-0"). For example, only "dataverse-glassfish-0" makes itself the dedicated timer server as explained in :doc:`/admin/timers` section of the Admin Guide. "dataverse-glassfish-1" and other higher number pods will not be configured as a timer server. - -Once you have multiple Glassfish servers you may notice bugs that will require additional configuration to fix. One such bug has to do with Dataverse logos which are stored at ``/usr/local/glassfish4/glassfish/domains/domain1/docroot/logos`` on each of the Glassfish servers. This means that the logo will look fine when you just uploaded it because you're on the server with the logo on the local file system but when you visit that dataverse in the future and you're on a differernt Glassfish server, you will see a broken image. (You can find some discussion of this logo bug at https://github.com/IQSS/dataverse-aws/issues/10 and http://irclog.iq.harvard.edu/dataverse/2016-10-21 .) This is all "advanced" installation territory (see the :doc:`/installation/advanced` section of the Installation Guide) and OpenShift might be a good environment in which to work on some of these bugs. - -Multiple PostgreSQL servers are possible within the OpenShift environment as well and have been set up with some amount of replication. "dataverse-postgresql-0" is the master and non-zero pods are the slaves. We have just scratched the surface of this configuration but replication from master to slave seems to we working. Future work could include failover and making Dataverse smarter about utilizing multiple PostgreSQL servers for reads. Right now we assume Dataverse is only being used with a single PostgreSQL server and that it's the master. - -Solr supports index distribution and replication for scaling. For OpenShift use, we choose replication. It's possible to scale up Solr using the method method similar to Glassfish, as mentioned aboved -In OpenShift, the first Solr pod, dataverse-solr-0, will be the master node, and the rest will be slave nodes - - -Configuring Persistent Volumes and Solr master node recovery -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Solr requires backing up the search index to persistent storage. For our proof of concept, we configure a hostPath, which allows Solr containers to access the hosts' file system, for our Solr containers backups. To read more about OpenShift/Kubernetes' persistent volumes, please visit: https://kubernetes.io/docs/concepts/storage/persistent-volumes - -To allow containers to use a host's storage, we need to allow access to that directory first. In this example, we expose /tmp/share to the containers:: - -# mkdir /tmp/share -# chcon -R -t svirt_sandbox_file_t -# chgrp root -R /tmp/share -# oc login -u system:admin -# oc edit scc restricted # Update allowHostDirVolumePlugin to true and runAsUser type to RunAsAny - - -To add a persistent volume and persistent volume claim, in conf/docker/openshift/openshift.json, add the following to objects in openshift.json. -Here, we are using hostPath for development purposes. Since OpenShift supports many types of cluster storages, -if the administrator wishes to use any cluster storage like EBS, Google Cloud Storage, etc, they would have to use a different type of Persistent Storage:: - - { - "kind" : "PersistentVolume", - "apiVersion" : "v1", - "metadata":{ - "name" : "solr-index-backup", - "labels":{ - "name" : "solr-index-backup", - "type" : "local" - } - }, - "spec":{ - "capacity":{ - "storage" : "8Gi" - }, - "accessModes":[ - "ReadWriteMany", "ReadWriteOnce", "ReadOnlyMany" - ], - "hostPath": { - "path" : "/tmp/share" - } - } - }, - { - "kind" : "PersistentVolumeClaim", - "apiVersion": "v1", - "metadata": { - "name": "solr-claim" - }, - "spec": { - "accessModes": [ - "ReadWriteMany", "ReadWriteOnce", "ReadOnlyMany" - ], - "resources": { - "requests": { - "storage": "3Gi" - } - }, - "selector":{ - "matchLabels":{ - "name" : "solr-index-backup", - "type" : "local" - } - } - } - } - - -To make solr container mount the hostPath, add the following part under .spec.spec (for Solr StatefulSet):: - - { - "kind": "StatefulSet", - "apiVersion": "apps/v1beta1", - "metadata": { - "name": "dataverse-solr", - .... - - "spec": { - "serviceName" : "dataverse-solr-service", - ..... - - "spec": { - "volumes": [ - { - "name": "solr-index-backup", - "persistentVolumeClaim": { - "claimName": "solr-claim" - } - } - ], - - "containers": [ - .... - - "volumeMounts":[ - { - "mountPath" : "/var/share", - "name" : "solr-index-backup" - } - - - -Solr is now ready for backup and recovery. In order to backup:: - - oc rsh dataverse-solr-0 - curl 'http://localhost:8983/solr/collection1/replication?command=backup&location=/var/share' - - -In solr entrypoint.sh, it's configured so that if dataverse-solr-0 failed, it will get the latest version of the index in the backup and restore. All backups are stored in /tmp/share in the host, or /home/share in solr containers. - -Running Containers to Run as Root in Minishift -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -It is **not** recommended to run containers as root in Minishift because for security reasons OpenShift doesn't support running containers as root. However, it's good to know how to allow containers to run as root in case you need to work on a Docker image to make it run as non-root. - -For more information on improving Docker images to run as non-root, see "Support Arbitrary User IDs" at https://docs.openshift.org/latest/creating_images/guidelines.html#openshift-origin-specific-guidelines - -Let's say you have a container that you suspect works fine when it runs as root. You want to see it working as-is before you start hacking on the Dockerfile and entrypoint file. You can configure Minishift to allow containers to run as root with this command: - -``oc adm policy add-scc-to-user anyuid -z default --as system:admin`` - -Once you are done testing you can revert Minishift back to not allowing containers to run as root with this command: - -``oc adm policy remove-scc-from-user anyuid -z default --as system:admin`` - -Minishift Resources -~~~~~~~~~~~~~~~~~~~ - -The following resources might be helpful. - -- https://blog.openshift.com/part-1-from-app-to-openshift-runtimes-and-templates/ -- https://blog.openshift.com/part-2-creating-a-template-a-technical-walkthrough/ -- https://docs.openshift.com/enterprise/3.0/architecture/core_concepts/templates.html - -Docker ------- - -From the Dataverse perspective, Docker is important for a few reasons: - -- There is interest from the community in running Dataverse on OpenShift and some initial work has been done to get Dataverse running on Minishift in Docker containers. Minishift makes use of Docker images on Docker Hub. To build new Docker images and push them to Docker Hub, you'll need to install Docker. The main issue to follow is https://github.com/IQSS/dataverse/issues/4040 . -- Docker may aid in testing efforts if we can easily spin up Docker images based on code in pull requests and run the full integration suite against those images. See the :doc:`testing` section for more information on integration tests. - -Installing Docker -~~~~~~~~~~~~~~~~~ - -On Linux, you can probably get Docker from your package manager. - -On Mac, download the ``.dmg`` from https://www.docker.com and install it. As of this writing is it known as Docker Community Edition for Mac. - -On Windows, we have heard reports of success using Docker on a Linux VM running in VirtualBox or similar. There's something called "Docker Community Edition for Windows" but we haven't tried it. See also the :doc:`windows` section. - -As explained above, we use Docker images in two different contexts: - -- Testing using an "all in one" Docker image (ephemeral, unpublished) -- Future production use on Minishift/OpenShift/Kubernetes (published to Docker Hub) - -All In One Docker Images for Testing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The "all in one" Docker files are in ``conf/docker-aio`` and you should follow the readme in that directory for more information on how to use them. - -Future production use on Minishift/OpenShift/Kubernetes -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -FIXME: rewrite this section to talk about only pushing stable images to Docker Hub. - -When working with Docker in the context of Minishift, follow the instructions above and make sure you get the Dataverse Docker images running in Minishift before you start messing with them. - -As of this writing, the Dataverse Docker images we publish under https://hub.docker.com/u/iqss/ are highly experimental. They were originally tagged with branch names like ``kick-the-tires`` and as of this writing the ``latest`` tag should be considered highly experimental and not for production use. See https://github.com/IQSS/dataverse/issues/4040 for the latest status and please reach out if you'd like to help! - -Change to the docker directory: - -``cd conf/docker`` - -Edit one of the files: - -``vim dataverse-glassfish/Dockerfile`` - -At this point you want to build the image and run it. We are assuming you want to run it in your Minishift environment. We will be building your image and pushing it to Docker Hub. - -Log in to Docker Hub with an account that has access to push to the ``iqss`` organization: - -``docker login`` - -(If you don't have access to push to the ``iqss`` organization, you can push elsewhere and adjust your ``openshift.json`` file accordingly.) - -Build and push the images to Docker Hub: - -``./build.sh`` - -Note that you will see output such as ``digest: sha256:213b6380e6ee92607db5d02c9e88d7591d81f4b6d713224d47003d5807b93d4b`` that should later be reflected in Minishift to indicate that you are using the latest image you just pushed to Docker Hub. - -You can get a list of all repos under the ``iqss`` organization with this: - -``curl https://hub.docker.com/v2/repositories/iqss/`` - -To see a specific repo: - -``curl https://hub.docker.com/v2/repositories/iqss/dataverse-glassfish/`` - -Known Issues with Dataverse Images on Docker Hub -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Again, Dataverse Docker images on Docker Hub are highly experimental at this point. As of this writing, their purpose is primarily for kicking the tires on Dataverse. Here are some known issues: - -- The Dataverse installer is run in the entrypoint script every time you run the image. Ideally, Dataverse would be installed in the Dockerfile instead. Dataverse is being installed in the entrypoint script because it needs PosgreSQL to be up already so that database tables can be created when the war file is deployed. -- The storage should be abstracted. Storage of data files and PostgreSQL data. Probably Solr data. -- Better tuning of memory by examining ``/sys/fs/cgroup/memory/memory.limit_in_bytes`` and incorporating this into the Dataverse installation script. -- Only a single Glassfish server can be used. See "Dedicated timer server in a Dataverse server cluster" in the :doc:`/admin/timers` section of the Installation Guide. -- Only a single PostgreSQL server can be used. -- Only a single Solr server can be used. - ----- - -Previous: :doc:`deployment` | Next: :doc:`making-releases` +Please see :ref:`research-code` in the User Guide for this related topic. diff --git a/doc/sphinx-guides/source/developers/dataset-migration-api.rst b/doc/sphinx-guides/source/developers/dataset-migration-api.rst new file mode 100644 index 00000000000..941527133ef --- /dev/null +++ b/doc/sphinx-guides/source/developers/dataset-migration-api.rst @@ -0,0 +1,69 @@ +Dataset Migration API +===================== + +The Dataverse software includes several ways to add Datasets originally created elsewhere (not to mention Harvesting capabilities). These include the Sword API (see the :doc:`/api/sword` guide) and the /dataverses/{id}/datasets/:import methods (json and ddi) (see the :doc:`/api/native-api` guide). + +This experimental migration API offers an additional option with some potential advantages: + +* Metadata can be specified using the json-ld format used in the OAI-ORE metadata export. Please note that the json-ld generated by OAI-ORE metadata export is not directly compatible with the Migration API. OAI-ORE export nests resource metadata under :code:`ore:describes` wrapper and Dataset Migration API requires that metadata is on the root level. Please check example file below for reference. + + * If you need a tool to convert OAI-ORE exported json-ld into a format compatible with the Dataset Migration API, or if you need to generate compatible json-ld from sources other than an existing Dataverse installation, the `BaseX `_ database engine, used together with the XQuery language, provides an efficient solution. Please see example script :download:`transform-oai-ore-jsonld.xq <../_static/api/transform-oai-ore-jsonld.xq>` for a simple conversion from exported OAI-ORE json-ld to a Dataset Migration API -compatible version. + +* Existing publication dates and PIDs are maintained (currently limited to the case where the PID can be managed by the Dataverse software, e.g. where the authority and shoulder match those the software is configured for) + +* Updating the PID at the provider can be done immediately or later (with other existing APIs). + +* Adding files can be done via the standard APIs, including using direct-upload to S3. + +This API consists of 2 calls: one to create an initial Dataset version, and one to 'republish' the dataset through Dataverse with a specified publication date. +Both calls require super-admin privileges. + +These calls can be used in concert with other API calls to add files, update metadata, etc. before the 'republish' step is done. + + +Start Migrating a Dataset into a Dataverse Collection +----------------------------------------------------- + +.. note:: This action requires a Dataverse installation account with superuser permissions. + +To import a dataset with an existing persistent identifier (PID), the provided json-ld metadata should include it. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export DATAVERSE_ID=root + + curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets/:startmigration --upload-file dataset-migrate.jsonld + +An example jsonld file is available at :download:`dataset-migrate.jsonld <../_static/api/dataset-migrate.jsonld>` . Note that you would need to replace the PID in the sample file with one supported in your Dataverse instance. + +You also need to replace the :code:`dataverse.siteUrl` in the json-ld :code:`@context` with your current Dataverse site URL. This is necessary to define a local URI for metadata terms originating from community metadata blocks (in the case of the example file, from the Social Sciences and Humanities and Geospatial blocks). + +Currently, as of Dataverse 6.5 and earlier, community metadata blocks do not assign a default global URI to the terms used in the block in contrast to citation metadata, which has global URI defined. + + + +Publish a Migrated Dataset +-------------------------- + +The call above creates a Dataset. Once it is created, other APIs can be used to add files, add additional metadata, etc. When a version is complete, the following call can be used to publish it with its original publication date. + +.. note:: This action requires a Dataverse installation account with superuser permissions. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + + curl -H 'Content-Type: application/ld+json' -H X-Dataverse-key:$API_TOKEN -X POST -d '{"schema:datePublished": "2020-10-26","@context":{ "schema":"http://schema.org/"}}' "$SERVER_URL/api/datasets/{id}/actions/:releasemigrated" + +datePublished is the only metadata supported in this call. + +An optional query parameter: updatepidatprovider (default is false) can be set to true to automatically update the metadata and targetUrl of the PID at the provider. With this set true, the result of this call will be that the PID redirects to this dataset rather than the dataset in the source repository. + +.. code-block:: bash + + curl -H 'Content-Type: application/ld+json' -H X-Dataverse-key:$API_TOKEN -X POST -d '{"schema:datePublished": "2020-10-26","@context":{ "schema":"http://schema.org/"}}' "$SERVER_URL/api/datasets/{id}/actions/:releasemigrated?updatepidatprovider=true" + +If the parameter is not added and set to true, other existing APIs can be used to update the PID at the provider later, e.g. :ref:`send-metadata-to-pid-provider` diff --git a/doc/sphinx-guides/source/developers/dataset-semantic-metadata-api.rst b/doc/sphinx-guides/source/developers/dataset-semantic-metadata-api.rst new file mode 100644 index 00000000000..4f374bdc039 --- /dev/null +++ b/doc/sphinx-guides/source/developers/dataset-semantic-metadata-api.rst @@ -0,0 +1,121 @@ +Dataset Semantic Metadata API +============================= +.. contents:: |toctitle| + :local: + + +The OAI_ORE metadata export format represents Dataset metadata using json-ld (see the :doc:`/admin/metadataexport` section). As part of an RDA-supported effort to allow import of Datasets exported as Bags with an included OAI_ORE metadata file, +an experimental API has been created that provides a json-ld alternative to the v1.0 API calls to get/set/delete Dataset metadata in the :doc:`/api/native-api`. + +You may prefer to work with this API if you are building a tool to import from a Bag/OAI-ORE source or already work with json-ld representations of metadata, or if you prefer the flatter json-ld representation to Dataverse software's json representation (which includes structure related to the metadata blocks involved and the type/multiplicity of the metadata fields.) +You may not want to use this API if you need stability and backward compatibility (the 'experimental' designation for this API implies that community feedback is desired and that, in future Dataverse software versions, the API may be modified based on that feedback). + +Note: The examples use the 'application/ld+json' mimetype. For compatibility reasons, the APIs also be used with mimetype "application/json-ld" + +Get Dataset Metadata +-------------------- + +To get the json-ld formatted metadata for a Dataset, specify the Dataset ID (DATASET_ID) or Persistent identifier (DATASET_PID), and, for specific versions, the version number. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export DATASET_ID='12345' + export DATASET_PID='doi:10.5072/FK2A1B2C3' + export VERSION='1.0' + export SERVER_URL=https://demo.dataverse.org + + Example 1: Get metadata for version '1.0' + + curl -H X-Dataverse-key:$API_TOKEN -H 'Accept: application/ld+json' "$SERVER_URL/api/datasets/$DATASET_ID/versions/$VERSION/metadata" + + Example 2: Get metadata for the latest version using the DATASET PID + + curl -H X-Dataverse-key:$API_TOKEN -H 'Accept: application/ld+json' "$SERVER_URL/api/datasets/:persistentId/metadata?persistentId=$DATASET_PID" + +You should expect a 200 ("OK") response and JSON-LD mirroring the OAI-ORE representation in the returned 'data' object. + + +.. _add-semantic-metadata: + +Add Dataset Metadata +-------------------- + +To add json-ld formatted metadata for a Dataset, specify the Dataset ID (DATASET_ID) or Persistent identifier (DATASET_PID). Adding '?replace=true' will overwrite an existing metadata value. The default (replace=false) will only add new metadata or add a new value to a multi-valued field. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export DATASET_ID='12345' + export DATASET_PID='doi:10.5072/FK2A1B2C3' + export VERSION='1.0' + export SERVER_URL=https://demo.dataverse.org + + Example: Change the Dataset title + + curl -X PUT -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -d '{"title": "Submit menu test", "@context":{"title": "http://purl.org/dc/terms/title"}}' "$SERVER_URL/api/datasets/$DATASET_ID/metadata?replace=true" + + Example 2: Add a description using the DATASET PID + + curl -X PUT -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -d '{"citation:dsDescription": {"citation:dsDescriptionValue": "New description"}, "@context":{"citation": "https://dataverse.org/schema/citation/"}}' "$SERVER_URL/api/datasets/:persistentId/metadata?persistentId=$DATASET_PID" + +You should expect a 200 ("OK") response indicating whether a draft Dataset version was created or an existing draft was updated. + + +Delete Dataset Metadata +----------------------- + +To delete metadata for a Dataset, send a json-ld representation of the fields to delete and specify the Dataset ID (DATASET_ID) or Persistent identifier (DATASET_PID). + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export DATASET_ID='12345' + export DATASET_PID='doi:10.5072/FK2A1B2C3' + export VERSION='1.0' + export SERVER_URL=https://demo.dataverse.org + + Example: Delete the TermsOfUseAndAccess 'restrictions' value 'No restrictions' for the latest version using the DATASET PID + + curl -X PUT -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -d '{"https://dataverse.org/schema/core#restrictions":"No restrictions"}' "$SERVER_URL/api/datasets/:persistentId/metadata/delete?persistentId=$DATASET_PID" + +Note, this example uses the term URI directly rather than adding an ``@context`` element. You can use either form in any of these API calls. + +You should expect a 200 ("OK") response indicating whether a draft Dataset version was created or an existing draft was updated. + +.. _api-semantic-create-dataset: + +Create a Dataset +---------------- + +Specifying the Content-Type as application/ld+json with the existing /api/dataverses/{id}/datasets API call (see :ref:`create-dataset-command`) supports using the same metadata format when creating a Dataset. + +With curl, this is done by adding the following header: + +.. code-block:: bash + + -H 'Content-Type: application/ld+json' + + .. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export DATAVERSE_ID=root + export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV + + curl -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets --upload-file dataset-create.jsonld + +An example jsonld file is available at :download:`dataset-create.jsonld <../_static/api/dataset-create.jsonld>` (:download:`dataset-create_en.jsonld <../_static/api/dataset-create.jsonld>` is a version that sets the metadata language (see :ref:`:MetadataLanguages`) to English (en).) + +.. _api-semantic-create-dataset-with-type: + +Create a Dataset with a Dataset Type +------------------------------------ + +By default, datasets are given the type "dataset" but if your installation had added additional types (see :ref:`api-add-dataset-type`), you can specify the type. + +An example JSON-LD file is available at :download:`dataset-create-software.jsonld <../_static/api/dataset-create-software.jsonld>`. + +You can use this file with the normal :ref:`api-semantic-create-dataset` endpoint above. + +See also :ref:`dataset-types`. diff --git a/doc/sphinx-guides/source/developers/debugging.rst b/doc/sphinx-guides/source/developers/debugging.rst index e2cfd0ac9f4..ffee6764b7f 100644 --- a/doc/sphinx-guides/source/developers/debugging.rst +++ b/doc/sphinx-guides/source/developers/debugging.rst @@ -8,8 +8,58 @@ Debugging Logging ------- -By default, Glassfish logs at the "INFO" level but logging can be increased to "FINE" on the fly with (for example) ``./asadmin set-log-levels edu.harvard.iq.dataverse.api.Datasets=FINE``. Running ``./asadmin list-log-levels`` will show the current logging levels. +By default, the app server logs at the "INFO" level but logging can be increased to "FINE" on the fly with (for example) ``./asadmin set-log-levels edu.harvard.iq.dataverse.api.Datasets=FINE``. Running ``./asadmin list-log-levels`` will show the current logging levels. ----- +.. _jsf-config: -Previous: :doc:`documentation` | Next: :doc:`coding-style` +Java Server Faces (JSF) Configuration Options +--------------------------------------------- + +Some JSF options can be easily changed via MicroProfile Config (using environment variables, system properties, etc.) +during development without recompiling. Changing the options will require at least a redeployment, obviously depending +how you get these options in. (Variable substitution only happens during deployment and when using system properties +or environment variables, you'll need to pass these into the domain, which usually will require an app server restart.) + +Please note you can use +`MicroProfile Config `_ +to maintain your settings more easily for different environments. + +.. list-table:: + :widths: 15 15 60 10 + :header-rows: 1 + :align: left + + * - JSF Option + - MPCONFIG Key + - Description + - Default + * - javax.faces.PROJECT_STAGE + - dataverse.jsf.project-stage + - Switch to different levels to make JSF more verbose, disable caches etc. + Read more `at `_ + `various `_ `places `_. + - ``Production`` + * - javax.faces.INTERPRET_EMPTY + _STRING_SUBMITTED_VALUES_AS_NULL + - dataverse.jsf.empty-string-null + - See `Jakarta Server Faces 3.0 Spec`_ + - ``true`` + * - javax.faces.FACELETS_SKIP_COMMENTS + - dataverse.jsf.skip-comments + - See `Jakarta Server Faces 3.0 Spec`_ + - ``true`` + * - javax.faces.FACELETS_BUFFER_SIZE + - dataverse.jsf.buffer-size + - See `Jakarta Server Faces 3.0 Spec`_ + - ``102400`` (100 KB) + * - javax.faces.FACELETS_REFRESH_PERIOD + - dataverse.jsf.refresh-period + - See `Jakarta Server Faces 3.0 Spec`_ + - ``-1`` + * - primefaces.THEME + - dataverse.jsf.primefaces.theme + - See `PrimeFaces Configuration Docs`_ + - ``bootstrap`` + +.. _Jakarta Server Faces 3.0 Spec: https://jakarta.ee/specifications/faces/3.0/jakarta-faces-3.0.html#a6088 +.. _PrimeFaces Configuration Docs: https://primefaces.github.io/primefaces/11_0_0/#/gettingstarted/configuration diff --git a/doc/sphinx-guides/source/developers/dependencies.rst b/doc/sphinx-guides/source/developers/dependencies.rst index 72fdad03e43..26880374f23 100644 --- a/doc/sphinx-guides/source/developers/dependencies.rst +++ b/doc/sphinx-guides/source/developers/dependencies.rst @@ -3,27 +3,35 @@ Dependency Management ===================== .. contents:: |toctitle| - :local: + :local: -Dataverse is (currently) a Java EE 7 based application, that uses a lot of additional libraries for special purposes. -This includes features like support for SWORD-API, S3 storage and many others. +Introduction +------------ + +As explained under :ref:`core-technologies`, the Dataverse Software is a Jakarta EE 8 based application that uses a lot of additional libraries for +special purposes. This includes support for the SWORD API, S3 storage, and many other features. + +Besides the code that glues together individual pieces, any developer needs to describe dependencies used within the +Maven-based build system. As is familiar to any Maven user, this happens inside the "Project Object Model" (POM) file, ``pom.xml``. + +Recursive and convergent dependency resolution makes dependency management with Maven quite easy, but sometimes, in +projects with many complex dependencies like the Dataverse Software, you have to help Maven make the right choices. + +Maven can foster good development practices by enabling modulithic (modular monolithic) architecture: splitting +functionalities into different Maven submodules while expressing dependencies between them. But there's more: the +parent-child model allows you to create consistent dependency versioning (see below) within children. -Besides the code that glues together the single pieces, any developer needs to describe used dependencies for the -Maven-based build system. As is familiar to any Maven user, this happens inside the "Project Object Model" (POM) living in -``pom.xml`` at the root of the project repository. Recursive and convergent dependency resolution makes dependency -management with Maven very easy. But sometimes, in projects with many complex dependencies like Dataverse, you have -to help Maven make the right choices. Terms ----- As a developer, you should familiarize yourself with the following terms: -- **Direct dependencies**: things *you use* yourself in your own code for Dataverse. +- **Direct dependencies**: things *you use* yourself in your own code for the Dataverse Software. - **Transitive dependencies**: things *others use* for things you use, pulled in recursively. See also: `Maven docs `_. -.. graphviz:: + .. graphviz:: digraph { rankdir="LR"; @@ -44,6 +52,94 @@ As a developer, you should familiarize yourself with the following terms: yc -> dtz; } +- **Project Object Model** (POM): the basic XML file unit to describe a Maven-based project. +- **Bill Of Materials** (BOM): larger projects like Payara, Amazon SDK etc. provide lists of their direct dependencies. + This comes in handy when adding these dependencies (transitive for us) as direct dependencies, see below. + + .. graphviz:: + + digraph { + rankdir="TD"; + node [fontsize=10] + edge [fontsize=8] + + msp [label="Maven Super POM"] + sp [label="Your POM"] + bom [label="Some BOM"] + td [label="Direct & Transitive\nDependency"] + + msp -> sp [label="inherit", dir="back"]; + bom -> sp [label="import", dir="back"]; + bom -> td [label="depend on"]; + sp -> td [label="depend on\n(same version)", constraint=false]; + } + +- **Parent POM**, **Super POM**: any project may be a child of a parent. + + Project silently inherit from a "super POM", which is the global Maven standard parent POM. + Children may also be aggregated by a parent (without them knowing) for convenient builds of larger projects. + + .. graphviz:: + + digraph { + rankdir="TD"; + node [fontsize=10] + edge [fontsize=8] + + msp [label="Maven Super POM"] + ap [label="Any POM"] + msp -> ap [label="inherit", dir="back"]; + + pp [label="Parent 1 POM"] + cp1 [label="Submodule 1 POM"] + cp2 [label="Submodule 2 POM"] + + msp -> pp [label="inherit", dir="back", constraint=false]; + pp -> cp1 [label="aggregate"]; + pp -> cp2 [label="aggregate"]; + } + + Children may inherit dependencies, properties, settings, plugins etc. from the parent (making it possible to share + common ground). Both approaches may be combined. Children may import as many BOMs as they want, but can have only a + single parent to inherit from at a time. + + .. graphviz:: + + digraph { + rankdir="TD"; + node [fontsize=10] + edge [fontsize=8] + + msp [label="Maven Super POM"] + pp [label="Parent POM"] + cp1 [label="Submodule 1 POM"] + cp2 [label="Submodule 2 POM"] + + msp -> pp [label="inherit", dir="back", constraint=false]; + pp -> cp1 [label="aggregate"]; + pp -> cp2 [label="aggregate"]; + cp1 -> pp [label="inherit"]; + cp2 -> pp [label="inherit"]; + + d [label="Dependency"] + pp -> d [label="depends on"] + cp1 -> d [label="inherit:\ndepends on", style=dashed]; + cp2 -> d [label="inherit:\ndepends on", style=dashed]; + } + +- **Modules**: when using parents and children, these are called "modules" officially, each having their own POM. + + Using modules allows bundling different aspects of (Dataverse) software in their own domains, with their own + behavior, dependencies etc. Parent modules allow for sharing of common settings, properties, dependencies and more. + Submodules may also be used as parent modules for a lower level of submodules. + + Maven modules within the same software project may also depend on each other, allowing to create complex structures + of packages and projects. Each module may be released on their own (e. g. on Maven Central) and other projects may + rely on and reuse them. This is especially useful for parent POMs: they may be reused as BOMs or to share a standard + between independent software projects. + + Maven modules should not be confused with the `Java Platform Module System (JPMS) `_ introduced in Java 9 under Project Jigsaw. + Direct dependencies ------------------- @@ -62,24 +158,34 @@ Within the POM, any direct dependencies reside within the ```` tag Anytime you add a ````, Maven will try to fetch it from defined/configured repositories and use it -within the build lifecycle. You have to define a ````, but ```` is optional for ``compile``. -(See `Maven docs: Dep. Scope `_) +within the build lifecycle. You have to define a ```` (note exception below), but ```` is optional for +``compile``. (See `Maven docs: Dep. Scope `_) - -During fetching, Maven will analyse all transitive dependencies (see graph above) and, if necessary, fetch those, too. +During fetching, Maven will analyze all transitive dependencies (see graph above) and, if necessary, fetch those too. Everything downloaded once is cached locally by default, so nothing needs to be fetched again and again, as long as the dependency definition does not change. **Rules to follow:** 1. You should only use direct dependencies for **things you are actually using** in your code. -2. **Clean up** direct dependencies no longer in use. It will bloat the deployment package otherwise! -3. Care about the **scope**. Do not include "testing only" dependencies in the package - it will hurt you in IDEs and bloat things. [#f1]_ -4. Avoid using different dependencies for the **same purpose**, e. g. different JSON parsing libraries. -5. Refactor your code to **use Java EE** standards as much as possible. -6. When you rely on big SDKs or similar big cool stuff, try to **include the smallest portion possible**. Complete SDK +2. When declaring a direct dependency with its **version** managed by ````, a BOM or parent POM, you + may not provide one unless you want to explicitly override! +3. **Clean up** direct dependencies no longer in use. It will bloat the deployment package otherwise! +4. Care about the **scope** [#f1]_: + + * Do not include "testing only" dependencies in the final package - it will hurt you in IDEs and bloat things. + There is scope ``test`` for this! + * Make sure to use the ``runtime`` scope when you need to ensure a library is present on our classpath at runtime. + An example is the SLF4J JUL bridge: we want to route logs from SLF4J into ``java.util.logging``, so it needs to be + present on the classpath, although we aren't using SLF4J unlike, some of our dependencies. + * Some dependencies might be ``provided`` by the runtime environment. Good example: everything from Jakarta EE! + We use the Payara BOM to ensure using the same version during development and runtime. + +5. Avoid using different dependencies for the **same purpose**, e. g. different JSON parsing libraries. +6. Refactor your code to **use Jakarta EE** standards as much as possible. +7. When you rely on big SDKs or similar big cool stuff, try to **include the smallest portion possible**. Complete SDK bundles are typically heavyweight and most of the time unnecessary. -7. **Don't include transitive dependencies.** [#f2]_ +8. **Don't include transitive dependencies.** [#f2]_ * Exception: if you are relying on it in your code (see *Z* in the graph above), you must declare it. See below for proper handling in these (rare) cases. @@ -92,8 +198,8 @@ Maven is comfortable for developers; it handles recursive resolution, downloadin However, as life is a box of chocolates, you might find yourself in *version conflict hell* sooner than later without even knowing, but experiencing unintended side effects. -When you look at the graph above, imagine *B* and *TB* rely on different *versions* of *TC*. How does Maven decide -which version it will include? Easy: the dependent version of the nearest version wins: +When you look at the topmost graph above, imagine *B* and *TB* rely on different *versions* of *TC*. How does Maven +decide which version it will include? Easy: the version of the dependency nearest to our project ("Your Code)" wins. The following graph gives an example: .. graphviz:: @@ -110,19 +216,19 @@ which version it will include? Easy: the dependent version of the nearest versio yc -> dtz2; } -In this case, version "2.0" will be included. If you know something about semantic versioning, a red alert should ring in your mind right now. -How do we know that *B* is compatible with *Z v2.0* when depending on *Z v1.0*? +In this case, version "2.0" will be included. If you know something about semantic versioning, a red alert should ring +in your mind right now. How do we know that *B* is compatible with *Z v2.0* when depending on *Z v1.0*? Another scenario getting us in trouble: indirect use of transitive dependencies. Imagine the following: we rely on *Z* -in our code, but do not include a direct dependency for it within the POM. Now *B* is updated and removed its dependency -on *Z*. You definitely don't want to head down that road. +in our code, but do not include a direct dependency for it within the POM. Now assume *B* is updated and removed its +dependency on *Z*. You definitely don't want to head down that road. **Follow the rules to be safe:** -1. Do **not use transitive deps implicit**: add a direct dependency for transitive deps you re-use in your code. -2. On every build check that no implicit usage was added by accident. +1. Do **not use transitive deps implicitly**: add a direct dependency for transitive deps you re-use in your code. +2. On every build, check that no implicit usage was added by accident. 3. **Explicitly declare versions** of transitive dependencies in use by multiple direct dependencies. -4. On every build check that there are no convergence problems hiding in the shadows. +4. On every build, check that there are no convergence problems hiding in the shadows. 5. **Do special tests** on every build to verify these explicit combinations work. Managing transitive dependencies in ``pom.xml`` @@ -130,15 +236,24 @@ Managing transitive dependencies in ``pom.xml`` Maven can manage versions of transitive dependencies in four ways: -1. Make a transitive-only dependency not used in your code a direct one and add a ```` tag. - Typically a bad idea, don't do that. -2. Use ```` or ```` tags on direct dependencies that request the transitive dependency. - *Last resort*, you really should avoid this. Not explained or used here. - `See Maven docs `_. -3. Explicitly declare the transitive dependency in ```` and add a ```` tag. -4. For more complex transitive dependencies, reuse a "Bill of Materials" (BOM) within ```` - and add a ```` tag. Many bigger and standard use projects provide those, making the POM much less bloated - compared to adding every bit yourself. +.. list-table:: + :align: left + :stub-columns: 1 + :widths: 12 40 40 + + * - Safe Good Practice + - (1) Explicitly declare the transitive dependency in ```` with a ```` tag. + - (2) For more complex transitive dependencies, reuse a "Bill of Materials" (BOM) within ````. + Many bigger projects provide them, making the POM much less bloated compared to adding every bit yourself. + * - Better Avoid or Don't + - (3) Use ```` or ```` tags on direct dependencies that request the transitive dependency. + *Last resort*, you really should avoid this. Not explained or used here, but sometimes unavoidable. + `See Maven docs `_. + - (4) Make a transitive-only dependency not used in your code a direct one and add a ```` tag. + Typically a bad idea; don't do that. + +**Note:** when the same transitive dependency is used in multiple Maven modules of a software project, it might be added +to a common ```` section of an inherited parent POM instead. (Overrides are still possible.) A reduced example, only showing bits relevant to the above cases and usage of an explicit transitive dep directly: @@ -167,7 +282,7 @@ A reduced example, only showing bits relevant to the above cases and usage of an import + Jackson is used by AWS SDK and others, but we also use it in the Dataverse Software. --> com.fasterxml.jackson jackson-bom @@ -175,7 +290,7 @@ A reduced example, only showing bits relevant to the above cases and usage of an import pom - + joda-time joda-time @@ -194,13 +309,13 @@ A reduced example, only showing bits relevant to the above cases and usage of an aws-java-sdk-s3 - + com.fasterxml.jackson.core jackson-core - + com.fasterxml.jackson.core jackson-databind @@ -214,12 +329,12 @@ Helpful tools Maven provides some plugins that are of great help to detect possible conflicts and implicit usage. -For *implicit usage detection*, use `mvn dependency:analyze`. Examine the output with great care. Sometimes you will +For *implicit usage detection*, use ``mvn dependency:analyze``. Examine the output with great care. Sometimes you will see implicit usages that do no harm, especially if you are using bigger SDKs having some kind of `core` package. This will also report on any direct dependency which is not in use and can be removed from the POM. Again, do this with great caution and double check. -If you want to see the dependencies both direct and transitive in a *dependency tree format*, use `mvn dependency:tree`. +If you want to see the dependencies both direct and transitive in a *dependency tree format*, use ``mvn dependency:tree``. This will however not help you with detecting possible version conflicts. For this you need to use the `Enforcer Plugin `_ with its built in `dependency convergence rule @@ -228,11 +343,10 @@ This will however not help you with detecting possible version conflicts. For th Repositories ------------ -Maven receives all dependencies from *repositories*. Those can be public like `Maven Central `_ -and others, but you can also use a private repository on premises or in the cloud. Last but not least, you can use -local repositories, which can live next to your application code (see ``local_lib`` dir within Dataverse codebase). +Maven receives all dependencies from *repositories*. These can be public like `Maven Central `_ +and others, but you can also use a private repository on premises or in the cloud. -Repositories are defined within the Dataverse POM like this: +Repositories are defined within the Dataverse Software POM like this: .. code:: xml @@ -249,11 +363,6 @@ Repositories are defined within the Dataverse POM like this: http://repository.primefaces.org default - - dvn.private - Local repository for hosting jars not available from network repositories. - file://${project.basedir}/local_lib - You can also add repositories to your local Maven settings, see `docs `_. @@ -262,13 +371,76 @@ Typically you will skip the addition of the central repository, but adding it to dependencies are first looked up there (which in theory can speed up downloads). You should keep in mind that repositories are used in the order they appear. + +Dataverse Parent POM +-------------------- + +Within ``modules/dataverse-parent`` you will find the parent POM for the Dataverse codebase. It serves different +purposes: + +1. Provide the common version number for a Dataverse release (may be overriden where necessary). +2. Provide common metadata necessary for releasing modules to repositories like Maven Central. +3. Declare aggregated submodules via ````. +4. Collate common BOMs and transitive dependencies within ````. + (Remember: a direct dependency declaration may omit the version element when defined in that area!) +5. Collect common ```` regarding the Maven project (encoding, ...), dependency versions, target Java version, etc. +6. Gather common ```` and ```` - no need to repeat those in submodules. +7. Make submodules use current Maven plugin release versions via ````. + +As of this writing (2022-02-10), our parent module looks like this: + +.. graphviz:: + + digraph { + rankdir="TD"; + node [fontsize=10] + edge [fontsize=8] + + dvp [label="Dataverse Parent"] + dvw [label="Submodule:\nDataverse WAR"] + zip [label="Submodule:\nZipdownloader JAR"] + + dvw -> dvp [label="inherit"]; + dvp -> dvw [label="aggregate"]; + zip -> dvp [label="inherit"]; + dvp -> zip [label="aggregate"]; + + pay [label="Payara BOM"] + aws [label="AWS SDK BOM"] + ggl [label="Googe Cloud BOM"] + tc [label="Testcontainers BOM"] + td [label="Multiple (transitive) dependencies\n(PSQL, Logging, Apache Commons, ...)"] + + dvp -> td [label="manage"]; + + pay -> dvp [label="import", dir="back"]; + aws -> dvp [label="import", dir="back"]; + ggl -> dvp [label="import", dir="back"]; + tc -> dvp [label="import", dir="back"]; + + } + +The codebase is structured like this: + +.. code-block:: + + # Dataverse WAR Module + ├── pom.xml # (POM file of WAR module) + ├── modules # + │ └── dataverse-parent # Dataverse Parent Module + │ └── pom.xml # (POM file of Parent Module) + └── scripts # + └── zipdownload # Zipdownloader JAR Module + └── pom.xml # (POM file of Zipdownloader Module) + +- Any developer cloning the project and running ``mvn`` within the project root will interact with the Dataverse WAR + module, which is the same behavior since Dataverse 4.0 has been released. +- Running ``mvn`` targets within the parent module will execute all aggregated submodules in one go. + + ---- .. rubric:: Footnotes .. [#f1] Modern IDEs import your Maven POM and offer import autocompletion for classes based on direct dependencies in the model. You might end up using legacy or repackaged classes because of a wrong scope. .. [#f2] This is going to bite back in modern IDEs when importing classes from transitive dependencies by "autocompletion accident". - ----- - -Previous: :doc:`documentation` | Next: :doc:`debugging` diff --git a/doc/sphinx-guides/source/developers/deployment.rst b/doc/sphinx-guides/source/developers/deployment.rst index 9532e7c769f..89ae9ac4c2e 100755 --- a/doc/sphinx-guides/source/developers/deployment.rst +++ b/doc/sphinx-guides/source/developers/deployment.rst @@ -2,15 +2,15 @@ Deployment ========== -Developers often only deploy Dataverse to their :doc:`dev-environment` but it can be useful to deploy Dataverse to cloud services such as Amazon Web Services (AWS). +Developers often only deploy the Dataverse Software to their :doc:`dev-environment` but it can be useful to deploy the Dataverse Software to cloud services such as Amazon Web Services (AWS). .. contents:: |toctitle| :local: -Deploying Dataverse to Amazon Web Services (AWS) ------------------------------------------------- +Deploying the Dataverse Software to Amazon Web Services (AWS) +------------------------------------------------------------- -We have written scripts to deploy Dataverse to Amazon Web Services (AWS) but they require some setup. +We have written scripts to deploy the Dataverse Software to Amazon Web Services (AWS) but they require some setup. Install AWS CLI ~~~~~~~~~~~~~~~ @@ -40,10 +40,10 @@ After all this, you can try the "version" command again. Note that it's possible to add an ``export`` line like the one above to your ``~/.bash_profile`` file so you don't have to run it yourself when you open a new terminal. -Configure AWS CLI -~~~~~~~~~~~~~~~~~ +Configure AWS CLI with Stored Credentials +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Next you need to configure AWS CLI. +Dataverse can access S3 using credentials stored as described below, or using an IAM role described a little further below. Create a ``.aws`` directory in your home directory (which is called ``~``) like this: @@ -70,36 +70,73 @@ Then update the file and replace the values for "aws_access_key_id" and "aws_sec If you are having trouble configuring the files manually as described above, see https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html which documents the ``aws configure`` command. +Configure Role-Based S3 Access +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Amazon offers instructions on using an IAM role to grant permissions to applications running in EC2 at https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html + Configure Ansible File (Optional) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In order to configure Dataverse settings such as the password of the dataverseAdmin user, download https://raw.githubusercontent.com/IQSS/dataverse-ansible/master/defaults/main.yml and edit the file to your liking. +In order to configure Dataverse installation settings such as the password of the dataverseAdmin user, download https://raw.githubusercontent.com/GlobalDataverseCommunityConsortium/dataverse-ansible/master/defaults/main.yml and edit the file to your liking. You can skip this step if you're fine with the values in the "main.yml" file in the link above. Download and Run the "Create Instance" Script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Once you have done the configuration above, you are ready to try running the "ec2-create-instance.sh" script to spin up Dataverse in AWS. +Once you have done the configuration above, you are ready to try running the "ec2-create-instance.sh" script to spin up a Dataverse installation in AWS. + +Download `ec2-create-instance.sh`_ and put it somewhere reasonable. For the purpose of these instructions we'll assume it's in the "Downloads" directory in your home directory. + +.. _ec2-create-instance.sh: https://raw.githubusercontent.com/GlobalDataverseCommunityConsortium/dataverse-ansible/master/ec2/ec2-create-instance.sh + +To run the script, you can make it executable (``chmod 755 ec2-create-instance.sh``) or run it with bash, like this with ``-h`` as an argument to print the help: + +``bash ~/Downloads/ec2-create-instance.sh -h`` + +If you run the script without any arguments, it should spin up the latest version of Dataverse. + +You will need to wait for 15 minutes or so until the deployment is finished, longer if you've enabled sample data and/or the API test suite. Eventually, the output should tell you how to access the Dataverse installation in a web browser or via SSH. It will also provide instructions on how to delete the instance when you are finished with it. Please be aware that AWS charges per minute for a running instance. You may also delete your instance from https://console.aws.amazon.com/console/home?region=us-east-1 . + +Caveat Recipiens +~~~~~~~~~~~~~~~~ + +Please note that while the script should work well on new-ish branches, older branches that have different dependencies such as an older version of Solr may not produce a working Dataverse installation. Your mileage may vary. + + +Migrating Datafiles from Local Storage to S3 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A number of pilot Dataverse installations start on local storage, then administrators are tasked with migrating datafiles into S3 or similar object stores. The files may be copied with a command-line utility such as `s3cmd `_. You will want to retain the local file hierarchy, keeping the authority (for example: 10.5072) at the bucket "root." + +The below example queries may assist with updating dataset and datafile locations in the Dataverse installation's PostgresQL database. Depending on the initial version of the Dataverse Software and subsequent upgrade path, Datafile storage identifiers may or may not include a ``file://`` prefix, so you'll want to catch both cases. -Download :download:`ec2-create-instance.sh <../../../../scripts/installer/ec2-create-instance.sh>` and put it somewhere reasonable. For the purpose of these instructions we'll assume it's in the "Downloads" directory in your home directory. +To Update Dataset Location to S3, Assuming a ``file://`` Prefix +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -ec2-create-instance accepts a number few command-line switches: +:: -* -r: GitHub Repository URL (defaults to https://github.com/IQSS/dataverse.git) -* -b: branch to build (defaults to develop) -* -p: pemfile directory (defaults to $HOME) -* -g: Ansible GroupVars file (if you wish to override role defaults) + UPDATE dvobject SET storageidentifier=REPLACE(storageidentifier,'file://','s3://') + WHERE dtype='Dataset'; -``bash ~/Downloads/ec2-create-instance.sh -b develop -r https://github.com/scholarsportal/dataverse.git -g main.yml`` +To Update Datafile Location to your-s3-bucket, Assuming a ``file://`` Prefix +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Now you will need to wait around 15 minutes until the deployment is finished. Eventually, the output should tell you how to access the installation of Dataverse in a web browser or via ssh. It will also provide instructions on how to delete the instance when you are finished with it. Please be aware that AWS charges per minute for a running instance. You can also delete your instance from https://console.aws.amazon.com/console/home?region=us-east-1 . +:: -Caveats -~~~~~~~ + UPDATE dvobject + SET storageidentifier=REPLACE(storageidentifier,'file://','s3://your-s3-bucket:') + WHERE id IN (SELECT o.id FROM dvobject o, dataset s WHERE o.dtype = 'DataFile' + AND s.id = o.owner_id AND s.harvestingclient_id IS null + AND o.storageidentifier NOT LIKE 's3://%'); -Please note that while the script should work fine on newish branches, older branches that have different dependencies such as an older version of Solr may not produce a working Dataverse installation. Your mileage may vary. +To Update Datafile Location to your-s3-bucket, Assuming no ``file://`` Prefix +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ----- +:: -Previous: :doc:`coding-style` | Next: :doc:`containers` + UPDATE dvobject SET storageidentifier=CONCAT('s3://your-s3-bucket:', storageidentifier) + WHERE id IN (SELECT o.id FROM dvobject o, dataset s WHERE o.dtype = 'DataFile' + AND s.id = o.owner_id AND s.harvestingclient_id IS null + AND o.storageidentifier NOT LIKE '%://%'); diff --git a/doc/sphinx-guides/source/developers/dev-environment.rst b/doc/sphinx-guides/source/developers/dev-environment.rst index 82b2f0bcc56..26f09417af2 100755 --- a/doc/sphinx-guides/source/developers/dev-environment.rst +++ b/doc/sphinx-guides/source/developers/dev-environment.rst @@ -2,209 +2,89 @@ Development Environment ======================= -These instructions are purposefully opinionated and terse to help you get your development environment up and running as quickly as possible! Please note that familiarity with running commands from the terminal is assumed. +These instructions are oriented around Docker but the "classic" instructions we used for Dataverse 4 and 5 are still available at :doc:`classic-dev-env`. .. contents:: |toctitle| :local: -Quick Start ------------ +.. _container-dev-quickstart: -The quickest way to get Dataverse running is to use Vagrant as described in the :doc:`tools` section, but for day to day development work, we recommended the following setup. - -Set Up Dependencies -------------------- - -Supported Operating Systems -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Mac OS X or Linux is required because the setup scripts assume the presence of standard Unix utilities. - -Windows is not well supported, unfortunately, but Vagrant and Minishift environments are described in the :doc:`windows` section. - -Install Java -~~~~~~~~~~~~ - -Dataverse requires Java 8. - -On Mac, we recommend Oracle's version of the JDK, which can be downloaded from http://www.oracle.com/technetwork/java/javase/downloads/index.html - -On Linux, you are welcome to use the OpenJDK available from package managers. - -Install Netbeans or Maven -~~~~~~~~~~~~~~~~~~~~~~~~~ - -NetBeans IDE (Java EE bundle) is recommended, and can be downloaded from http://netbeans.org . Developers may use any editor or IDE. We recommend NetBeans because it is free, works cross platform, has good support for Java EE projects, and includes a required build tool, Maven. - -Below we describe how to build the Dataverse war file with Netbeans but if you prefer to use only Maven, you can find installation instructions in the :doc:`tools` section. - -Install Homebrew (Mac Only) -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -On Mac, install Homebrew to simplify the steps below: https://brew.sh - -Clone the Dataverse Git Repo -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Fork https://github.com/IQSS/dataverse and then clone your fork like this: - -``git clone git@github.com:[YOUR GITHUB USERNAME]/dataverse.git`` - -Build the Dataverse War File -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To build the Dataverse war file using versions of Netbeans newer than 8.2 requires some setup because Java EE support is not enabled by default. An alternative is to build the war file with Maven, which is explained below. - -Launch Netbeans and click "File" and then "Open Project". Navigate to where you put the Dataverse code and double-click "dataverse" to open the project. - -If you are using Netbeans 8.2, Java EE support should "just work" but if you are using a newer version of Netbeans, you will see "dataverse (broken)". If you see "broken", click "Tools", "Plugins", and "Installed". Check the box next to "Java Web and EE" and click "Activate". Let Netbeans install all the dependencies. You will observe that the green "Active" checkmark does not appear next to "Java Web and EE". Restart Netbeans. - -In Netbeans, select "dataverse" under Projects and click "Run" in the menu and then "Build Project (dataverse)". The first time you build the war file, it will take a few minutes while dependencies are downloaded from Maven Central. Feel free to move on to other steps but check back for "BUILD SUCCESS" at the end. - -If you installed Maven instead of Netbeans, run ``mvn package``. - -NOTE: Do you use a locale different than ``en_US.UTF-8`` on your development machine? Are you in a different timezone -than Harvard (Eastern Time)? You might experience issues while running tests that were written with these settings -in mind. The Maven ``pom.xml`` tries to handle this for you by setting the locale to ``en_US.UTF-8`` and timezone -``UTC``, but more, not yet discovered building or testing problems might lurk in the shadows. - -Install jq -~~~~~~~~~~ - -On Mac, run this command: - -``brew install jq`` - -On Linux, install ``jq`` from your package manager or download a binary from http://stedolan.github.io/jq/ - -Install Glassfish -~~~~~~~~~~~~~~~~~ - -Glassfish 4.1 is required. - -To install Glassfish, run the following commands: - -``cd /usr/local`` - -``sudo curl -O http://download.oracle.com/glassfish/4.1/release/glassfish-4.1.zip`` - -``sudo unzip glassfish-4.1.zip`` - -``sudo chown -R $USER /usr/local/glassfish4`` - -Test Glassfish Startup Time on Mac -++++++++++++++++++++++++++++++++++ - -``cd /usr/local/glassfish4/glassfish/bin`` - -``./asadmin start-domain`` - -``grep "startup time" /usr/local/glassfish4/glassfish/domains/domain1/logs/server.log`` - -If you are seeing startup times in the 30 second range (31,584ms for "Felix" for example) please be aware that startup time can be greatly reduced (to less than 1.5 seconds in our testing) if you make a small edit to your ``/etc/hosts`` file as described at https://stackoverflow.com/questions/39636792/jvm-takes-a-long-time-to-resolve-ip-address-for-localhost/39698914#39698914 and https://thoeni.io/post/macos-sierra-java/ - -Look for a line that says ``127.0.0.1 localhost`` and add a space followed by the output of ``hostname`` which should be something like ``foobar.local`` depending on the name of your Mac. For example, the line would say ``127.0.0.1 localhost foobar.local`` if your Mac's name is "foobar". - -Install PostgreSQL -~~~~~~~~~~~~~~~~~~ - -PostgreSQL 9.6 is recommended to match the version in the Installation Guide. - -On Mac, go to https://www.postgresql.org/download/macosx/ and choose "Interactive installer by EnterpriseDB" option. We've tested version 9.6.9. When prompted to set a password for the "database superuser (postgres)" just enter "password". - -After installation is complete, make a backup of the ``pg_hba.conf`` file like this: +Quickstart +---------- -``sudo cp /Library/PostgreSQL/9.6/data/pg_hba.conf /Library/PostgreSQL/9.6/data/pg_hba.conf.orig`` +First, install Java 17, Maven, and Docker. -Then edit ``pg_hba.conf`` with an editor such as vi: +After cloning the `dataverse repo `_, run this: -``sudo vi /Library/PostgreSQL/9.6/data/pg_hba.conf`` +``mvn -Pct clean package docker:run`` -In the "METHOD" column, change all instances of "md5" to "trust". +(Note that if you are Windows, you must run the command above in `WSL `_ rather than cmd.exe. See :doc:`windows`.) -In the Finder, click "Applications" then "PostgreSQL 9.6" and launch the "Reload Configuration" app. Click "OK" after you see "server signaled". +After some time you should be able to log in: -Next, launch the "pgAdmin" application from the same folder. Under "Browser", expand "Servers" and double click "PostgreSQL 9.6". When you are prompted for a password, leave it blank and click "OK". If you have successfully edited "pg_hba.conf", you can get in without a password. +- url: http://localhost:8080 +- username: dataverseAdmin +- password: admin1 -On Linux, you should just install PostgreSQL from your package manager without worrying about the version as long as it's 9.x. Find ``pg_hba.conf`` and set the authentication method to "trust" and restart PostgreSQL. +Detailed Steps +-------------- -Install Solr +Install Java ~~~~~~~~~~~~ -`Solr `_ 7.3.1 is required. - -To install Solr, execute the following commands: - -``sudo mkdir /usr/local/solr`` - -``sudo chown $USER /usr/local/solr`` - -``cd /usr/local/solr`` - -``curl -O http://archive.apache.org/dist/lucene/solr/7.3.1/solr-7.3.1.tgz`` - -``tar xvfz solr-7.3.1.tgz`` - -``cd solr-7.3.1/server/solr`` +The recommended version is Java 17 because it's the version we test with. See https://github.com/IQSS/dataverse/pull/9764. -``cp -r configsets/_default collection1`` +On Mac and Windows, we suggest using `SDKMAN `_ to install Temurin (Eclipe's name for its OpenJDK distribution). Type ``sdk install java 17`` and then hit the "tab" key until you get to a version that ends with ``-tem`` and then hit enter. -``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.3.1/schema.xml`` +Alternatively you can download Temurin from https://adoptium.net (formerly `AdoptOpenJDK `_). -``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.3.1/schema_dv_mdb_fields.xml`` +On Linux, you are welcome to use the OpenJDK available from package managers. -``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.3.1/schema_dv_mdb_copies.xml`` +Install Maven +~~~~~~~~~~~~~ -``mv schema*.xml collection1/conf`` +If you are using SKDMAN, run this command: -``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.3.1/solrconfig.xml`` +``sdk install maven`` -``mv solrconfig.xml collection1/conf/solrconfig.xml`` +Otherwise, follow instructions at https://maven.apache.org. -``cd /usr/local/solr/solr-7.3.1`` +Install and Start Docker +~~~~~~~~~~~~~~~~~~~~~~~~ -``bin/solr start`` +Follow instructions at https://www.docker.com -``bin/solr create_core -c collection1 -d server/solr/collection1/conf`` +Be sure to start Docker. -Run the Dataverse Installer Script ----------------------------------- +Git Clone Repo +~~~~~~~~~~~~~~ -Navigate to the directory where you cloned the Dataverse git repo and run these commands: +Fork https://github.com/IQSS/dataverse and then clone your fork like this: -``cd scripts/installer`` +``git clone git@github.com:[YOUR GITHUB USERNAME]/dataverse.git`` -``./install`` +Build and Run +~~~~~~~~~~~~~ -It's fine to accept the default values. +Change into the ``dataverse`` directory you just cloned and run the following command: -After a while you will see ``Enter admin user name [Enter to accept default]>`` and you can just hit Enter. +``mvn -Pct clean package docker:run`` -Verify Dataverse is Running -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Verify +~~~~~~ -After the script has finished, you should be able to log into Dataverse with the following credentials: +After some time you should be able to log in: -- http://localhost:8080 +- url: http://localhost:8080 - username: dataverseAdmin -- password: admin - -Configure Your Development Environment for Publishing ------------------------------------------------------ - -Run the following command: - -``curl http://localhost:8080/api/admin/settings/:DoiProvider -X PUT -d FAKE`` - -This will disable DOI registration by using a fake (in-code) DOI provider. Please note that this feature is only available in version >= 4.10 and that at present, the UI will give no indication that the DOIs thus minted are fake. +- password: admin1 Next Steps ---------- -If you can log in to Dataverse, great! If not, please see the :doc:`troubleshooting` section. For further assitance, please see "Getting Help" in the :doc:`intro` section. - -You're almost ready to start hacking on code. Now that the installer script has you up and running, you need to continue on to the :doc:`tips` section to get set up to deploy code from your IDE or the command line. +See the :doc:`/container/dev-usage` section of the Container Guide for tips on fast redeployment, viewing logs, and more. ----- +Getting Help +------------ -Previous: :doc:`intro` | Next: :doc:`tips` +Please feel free to reach out at https://chat.dataverse.org or https://groups.google.com/g/dataverse-dev if you have any difficulty setting up a dev environment! diff --git a/doc/sphinx-guides/source/developers/documentation.rst b/doc/sphinx-guides/source/developers/documentation.rst deleted file mode 100755 index f16c0d311ba..00000000000 --- a/doc/sphinx-guides/source/developers/documentation.rst +++ /dev/null @@ -1,138 +0,0 @@ -===================== -Writing Documentation -===================== - -.. contents:: |toctitle| - :local: - -Quick Fix ------------ - -If you find a typo or a small error in the documentation you can fix it using GitHub's online web editor. Generally speaking, we will be following https://help.github.com/en/articles/editing-files-in-another-users-repository - -- Navigate to https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source where you will see folders for each of the guides: - - - `admin`_ - - `api`_ - - `developers`_ - - `installation`_ - - `user`_ - -- Find the file you want to edit under one of the folders above. -- Click the pencil icon in the upper-right corner. If this is your first contribution to Dataverse, the hover text over the pencil icon will say "Fork this project and edit this file". -- Make changes to the file and preview them. -- In the **Commit changes** box, enter a description of the changes you have made and click **Propose file change**. -- Under the **Write** tab, delete the long welcome message and write a few words about what you fixed. -- Click **Create Pull Request**. - -That's it! Thank you for your contribution! Your pull request will be added manually to the main Dataverse project board at https://github.com/orgs/IQSS/projects/2 and will go through code review and QA before it is merged into the "develop" branch. Along the way, developers might suggest changes or make them on your behalf. Once your pull request has been merged you will be listed as a contributor at https://github.com/IQSS/dataverse/graphs/contributors - -Please see https://github.com/IQSS/dataverse/pull/5857 for an example of a quick fix that was merged (the "Files changed" tab shows how a typo was fixed). - -If you would like to read more about the Dataverse project's use of GitHub, please see the :doc:`version-control` section. For bug fixes and features we request that you create an issue before making a pull request but this is not at all necessary for quick fixes to the documentation. - -.. _admin: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/admin -.. _api: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/api -.. _developers: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/developers -.. _installation: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/installation -.. _user: https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source/user - -Other Changes (Sphinx) ----------------------- - -The documentation for Dataverse was written using Sphinx (http://sphinx-doc.org/). -If you are interested in suggesting changes or updates we recommend that you create -the html files using Sphinx locally and then submit a pull request through GitHub. Here are the instructions on how to proceed: - - -Installing Sphinx -~~~~~~~~~~~~~~~~~ - -On a Mac: - -Download the sphinx zip file from http://sphinx-doc.org/install.html - -Unzip it somewhere. In the unzipped directory, do the following as -root, (sudo -i): - -python setup.py build -python setup.py install - -Alternative option (Mac/Unix/Windows): - -Unless you already have it, install pip (https://pip.pypa.io/en/latest/installing.html) - -run ``pip install sphinx`` in a terminal - -This is all you need. You should now be able to build HTML/pdf documentation from git sources locally. - -Using Sphinx -~~~~~~~~~~~~ - -First, you will need to make a fork of the dataverse repository in GitHub. Then, you will need to make a clone of your fork so you can manipulate the files outside GitHub. - -To edit the existing documentation: - -- Create a branch (refer to http://guides.dataverse.org/en/latest/developers/version-control.html > *Create a New Branch off the develop Branch*) to record the changes you are about to perform. -- Go to ~/dataverse/doc/sphinx-guides/source directory inside your clone. There, you will find the .rst files that correspond to the guides in the dataverse page (http://guides.dataverse.org/en/latest/). -- Using your preferred text editor, open and edit the necessary files, or create new ones. - -**NOTE:** When adding ReStructured Text (RST) `cross references `_, use the hyphen character (``-``) as the word separator for the cross reference label. For example, ``my-reference-label`` would be the preferred label for a cross reference as opposed to, for example, ``my_reference_label``. - -Once you are done, open a terminal and change directories to ~/dataverse/doc/sphinx-guides . Then, run the following commands: - -- ``make clean`` - -- ``make html`` - -After sphinx is done processing the files you should notice that the html folder in ~/dataverse/doc/sphinx-guides/build directory has been updated. -You can click on the files in the html folder to preview the changes. - -Now you can make a commit with the changes to your own fork in GitHub and submit a pull request to the original (upstream) dataverse repository. - -Table of Contents ------------------ - -Every non-index page should use the following code to display a table of contents of internal sub-headings: :: - - .. contents:: |toctitle| - :local: - -This code should be placed below any introductory text/images and directly above the first subheading, much like a Wikipedia page. - -Images ------- - -A good documentation is just like a website enhanced and upgraded by adding high quality and self-explanatory images. -Often images depict a lot of written text in a simple manner. Within our Sphinx docs, you can add them in two ways: a) add a -PNG image directly and include or b) use inline description languages like GraphViz (current only option). - -While PNGs in the git repo can be linked directly via URL, Sphinx-generated images do not need a manual step and might -provide higher visual quality. Especially in terms of quality of content, generated images can be extendend and improved -by a textbased and reviewable commit, without needing raw data or source files and no diff around. - -GraphViz based images -~~~~~~~~~~~~~~~~~~~~~ - -In some parts of the documentation, graphs are rendered as images via Sphinx GraphViz extension. - -This requires `GraphViz `_ installed and either ``dot`` on the path or -`adding options to the make call `_. - -This has been tested and works on Mac, Linux, and Windows. If you have not properly configured GraphViz, then the worst thing that might happen is a warning and missing images in your local documentation build. - - -Versions --------- - -For installations hosting their own copies of the guides, note that as each version of Dataverse is released, there is an updated version of the guides released with it. Google and other search engines index all versions, which may confuse users who discover your guides in the search results as to which version they should be looking at. When learning about your installation from the search results, it is best to be viewing the *latest* version. - -In order to make it clear to the crawlers that we only want the latest version discoverable in their search results, we suggest adding this to your ``robots.txt`` file:: - - User-agent: * - Allow: /en/latest/ - Disallow: /en/ - ----- - -Previous: :doc:`testing` | Next: :doc:`dependencies` diff --git a/doc/sphinx-guides/source/developers/fontcustom.rst b/doc/sphinx-guides/source/developers/fontcustom.rst new file mode 100755 index 00000000000..edcda1e69ab --- /dev/null +++ b/doc/sphinx-guides/source/developers/fontcustom.rst @@ -0,0 +1,102 @@ +=========== +Font Custom +=========== + +As mentioned under :ref:`style-guide-fontcustom` in the Style Guide, Dataverse uses `Font Custom`_ to create custom icon fonts. + +.. _Font Custom: https://github.com/FontCustom/fontcustom + +.. contents:: |toctitle| + :local: + +Previewing Icons +---------------- + +The process below updates a `preview page`_ that you can use to see how the icons look. + +.. _preview page: ../_static/fontcustom-preview.html + +In `scripts/icons/svg`_ in the source tree, you can see the SVG files that the icons are built from. + +.. _scripts/icons/svg: https://github.com/IQSS/dataverse/tree/develop/scripts/icons + +Install Font Custom +------------------- + +You'll need Font Custom and its dependencies installed if you want to update the icons. + +Ruby Version +~~~~~~~~~~~~ + +Font Custom is written in Ruby. Ruby 3.0 didn't "just work" with FontAwesome but Ruby 2.7 was fine. + +RVM is a good way to install a specific version of Ruby: https://rvm.io + +Install Dependencies and Font Custom Gem +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The brew commands below assume you are on a Mac. + +.. code-block:: bash + + brew tap bramstein/webfonttools + brew update + brew install woff2 + brew install sfnt2woff + brew install fontforge + brew install eot-utils + gem install fontcustom + + +(``brew install sfnt2woff`` isn't currently listed in the FontCustom README but it's in mentioned in https://github.com/FontCustom/fontcustom/pull/385) + +If ``fontcustom --help`` works now, you have it installed. + +Updating Icons +-------------- + +Navigate to ``scripts/icons`` in the source tree (or `online`_) and you will find: + +- An ``svg`` directory containing the "source" for the icons. +- Scripts to update the icons. + +.. _online: https://github.com/IQSS/dataverse/tree/develop/scripts/icons + +There is a copy of these icons in both the app and the guides. We'll update the guides first because it's much quicker to iterate and notice any problems with the icons. + +Updating the Guides Icons +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Run ``docs.sh`` and then open ``../../doc/sphinx-guides/source/_static/fontcustom-preview.html`` in a browser to look at the icons. (This is the `preview page`_ mentioned above that gets incorporated in the next Sphinx build.) + +Update any files in the ``svg`` directory and run the script again to see any differences. + +Note that Font Custom creates font files with unique names. For this reason, we should remove the old files from git as we add the new ones. The script deletes the old files for you but in a step below we'll do a ``git add`` to stage this change. + +Updating the App Icons +~~~~~~~~~~~~~~~~~~~~~~ + +Assuming you're happy with how the icons look in the preview page in the guides, you can move on to updating the icons in the Dataverse app itself. + +This time the script is called ``app.sh`` and it works the same way with the addition of tweaking some URLs. Go ahead and run this script and do a full "clean and build" before making sure the changes are visible in the application. + +Committing Changes to Git +~~~~~~~~~~~~~~~~~~~~~~~~~ + +As mentioned above, icons are in both the app and the docs. Again, because the filenames change, we should make sure the old files are removed from git. + +From the root of the repo, run the following: + +.. code-block:: bash + + git add doc/sphinx-guides/source/_static + git add src/main/webapp/resources + +That should be enough to make sure old files are replaced by new ones. At this point, you can commit and make a pull request. + +Caveats About Font Custom +------------------------- + +Font Custom is a useful tool and has an order of magnitude more stars on GitHub than its competitors. However, an `issue`_ suggests that the tool is somewhat abandoned. Its domain has expired but you can still get at what used to be its website at https://fontcustom.github.io/fontcustom/ + +.. _issue: https://github.com/FontCustom/fontcustom/issues/321 diff --git a/doc/sphinx-guides/source/developers/geospatial.rst b/doc/sphinx-guides/source/developers/geospatial.rst index 2857f7df9bf..48d300524c2 100644 --- a/doc/sphinx-guides/source/developers/geospatial.rst +++ b/doc/sphinx-guides/source/developers/geospatial.rst @@ -5,19 +5,12 @@ Geospatial Data .. contents:: |toctitle| :local: -Geoconnect ----------- +How The Dataverse Software Ingests Shapefiles +--------------------------------------------- -Geoconnect works as a middle layer, allowing geospatial data files in Dataverse to be visualized with Harvard WorldMap. To set up a Geoconnect development environment, you can follow the steps outlined in the `local_setup.md `_ guide. You will need Python and a few other prerequisites. +A shapefile is a set of files, often uploaded/transferred in ``.zip`` format. This set may contain up to fifteen files. A minimum of three specific files (``.shp``, ``.shx``, ``.dbf``) are needed to be a valid shapefile and a fourth file (``.prj``) is required for some applications -- or any type of meaningful visualization. -As mentioned under "Architecture and Components" in the :doc:`/installation/prep` section of the Installation Guide, Geoconnect is an optional component of Dataverse, so this section is only necessary to follow it you are working on an issue related to this feature. - -How Dataverse Ingests Shapefiles --------------------------------- - -A shapefile is a set of files, often uploaded/transferred in ``.zip`` format. This set may contain up to fifteen files. A minimum of three specific files (``.shp``, ``.shx``, ``.dbf``) are needed to be a valid shapefile and a fourth file (``.prj``) is required for WorldMap -- or any type of meaningful visualization. - -For ingest and connecting to WorldMap, four files are the minimum required: +For ingest, four files are the minimum required: - ``.shp`` - shape format; the feature geometry itself - ``.shx`` - shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly @@ -27,7 +20,7 @@ For ingest and connecting to WorldMap, four files are the minimum required: Ingest ~~~~~~ -When uploaded to Dataverse, the ``.zip`` is unpacked (same as all ``.zip`` files). Shapefile sets are recognized by the same base name and specific extensions. These individual files constitute a shapefile set. The first four are the minimum required (``.shp``, ``.shx``, ``.dbf``, ``.prj``) +When uploaded to a Dataverse installation, the ``.zip`` is unpacked (same as all ``.zip`` files). Shapefile sets are recognized by the same base name and specific extensions. These individual files constitute a shapefile set. The first four are the minimum required (``.shp``, ``.shx``, ``.dbf``, ``.prj``) For example: @@ -38,19 +31,19 @@ For example: - bicycles.sbx (NOT required extension) - bicycles.sbn (NOT required extension) -Upon recognition of the four required files, Dataverse will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set: +Upon recognition of the four required files, the Dataverse installation will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set: - Required: ``.shp``, ``.shx``, ``.dbf``, ``.prj`` -- Optional: ``.sbn``, ``.sbx``, ``.fbn``, ``.fbx``, ``.ain``, ``.aih``, ``.ixs``, ``.mxs``, ``.atx``, ``.cpg``, ``shp.xml`` +- Optional: ``.sbn``, ``.sbx``, ``.fbn``, ``.fbx``, ``.ain``, ``.aih``, ``.ixs``, ``.mxs``, ``.atx``, ``.cpg``, ``.qpj``, ``.qmd``, ``shp.xml`` -Then Dataverse creates a new ``.zip`` with mimetype as a shapefile. The shapefile set will persist as this new ``.zip``. +Then the Dataverse installation creates a new ``.zip`` with mimetype as a shapefile. The shapefile set will persist as this new ``.zip``. Example ~~~~~~~ **1a.** Original ``.zip`` contents: -A file named ``bikes_and_subways.zip`` is uploaded to the Dataverse. This ``.zip`` contains the following files. +A file named ``bikes_and_subways.zip`` is uploaded to the Dataverse installation. This ``.zip`` contains the following files. - ``bicycles.shp`` (shapefile set #1) - ``bicycles.shx`` (shapefile set #1) @@ -66,9 +59,9 @@ A file named ``bikes_and_subways.zip`` is uploaded to the Dataverse. This ``.zip - ``subway_line.prj`` (shapefile set #2) - ``subway_line.dbf`` (shapefile set #2) -**1b.** Dataverse unzips and re-zips files: +**1b.** The Dataverse installation unzips and re-zips files: -Upon ingest, Dataverse unpacks the file ``bikes_and_subways.zip``. Upon recognizing the shapefile sets, it groups those files together into new ``.zip`` files: +Upon ingest, the Dataverse installation unpacks the file ``bikes_and_subways.zip``. Upon recognizing the shapefile sets, it groups those files together into new ``.zip`` files: - files making up the "bicycles" shapefile become a new ``.zip`` - files making up the "subway_line" shapefile become a new ``.zip`` @@ -76,7 +69,7 @@ Upon ingest, Dataverse unpacks the file ``bikes_and_subways.zip``. Upon recogniz To ensure that a shapefile set remains intact, individual files such as ``bicycles.sbn`` are kept in the set -- even though they are not used for mapping. -**1c.** Dataverse final file listing: +**1c.** The Dataverse installation final file listing: - ``bicycles.zip`` (contains shapefile set #1: ``bicycles.shp``, ``bicycles.shx``, ``bicycles.prj``, ``bicycles.dbf``, ``bicycles.sbx``, ``bicycles.sbn``) - ``bicycles.txt`` (separate, not part of a shapefile set) @@ -88,128 +81,3 @@ For two "final" shapefile sets, ``bicycles.zip`` and ``subway_line.zip``, a new - Mimetype: ``application/zipped-shapefile`` - Mimetype Label: "Shapefile as ZIP Archive" - -WorldMap JoinTargets + API Endpoint ------------------------------------ - -WorldMap supplies target layers -- or JoinTargets -- that a tabular file may be mapped against. A JSON description of these `CGA `_-curated JoinTargets may be retrieved via API at ``http://worldmap.harvard.edu/datatables/api/jointargets/``. Please note: login is required. You may use any WorldMap account credentials via HTTP Basic Auth. - -Example of JoinTarget information returned via the API: - -.. code-block:: json - - { - "data":[ - { - "layer":"geonode:census_tracts_2010_boston_6f6", - "name":"Census Tracts, Boston (GEOID10: State+County+Tract)", - "geocode_type_slug":"us-census-tract", - "geocode_type":"US Census Tract", - "attribute":{ - "attribute":"CT_ID_10", - "type":"xsd:string" - }, - "abstract":"As of the 2010 census, Boston, MA contains 7,288 city blocks [truncated for example]", - "title":"Census Tracts 2010, Boston (BARI)", - "expected_format":{ - "expected_zero_padded_length":-1, - "is_zero_padded":false, - "description":"Concatenation of state, county and tract for 2010 Census Tracts. Reference: https://www.census.gov/geo/maps-data/data/tract_rel_layout.html\r\n\r\nNote: Across the US, this can be a zero-padded \"string\" but the original Boston layer has this column as \"numeric\" ", - "name":"2010 Census Boston GEOID10 (State+County+Tract)" - }, - "year":2010, - "id":28 - }, - { - "layer":"geonode:addresses_2014_boston_1wr", - "name":"Addresses, Boston", - "geocode_type_slug":"boston-administrative-geography", - "geocode_type":"Boston, Administrative Geography", - "attribute":{ - "attribute":"LocationID", - "type":"xsd:int" - }, - "abstract":"Unique addresses present in the parcels data set, which itself is derived from [truncated for example]", - "title":"Addresses 2015, Boston (BARI)", - "expected_format":{ - "expected_zero_padded_length":-1, - "is_zero_padded":false, - "description":"Boston, Administrative Geography, Boston Address Location ID. Example: 1, 2, 3...nearly 120000", - "name":"Boston Address Location ID (integer)" - }, - "year":2015, - "id":18 - }, - { - "layer":"geonode:bra_neighborhood_statistical_areas_2012__ug9", - "name":"BRA Neighborhood Statistical Areas, Boston", - "geocode_type_slug":"boston-administrative-geography", - "geocode_type":"Boston, Administrative Geography", - "attribute":{ - "attribute":"BOSNA_R_ID", - "type":"xsd:double" - }, - "abstract":"BRA Neighborhood Statistical Areas 2015, Boston. Provided by [truncated for example]", - "title":"BRA Neighborhood Statistical Areas 2015, Boston (BARI)", - "expected_format":{ - "expected_zero_padded_length":-1, - "is_zero_padded":false, - "description":"Boston, Administrative Geography, Boston BRA Neighborhood Statistical Area ID (integer). Examples: 1, 2, 3, ... 68, 69", - "name":"Boston BRA Neighborhood Statistical Area ID (integer)" - }, - "year":2015, - "id":17 - } - ], - "success":true - } - -How Geoconnect Uses Join Target Information -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -When a user attempts to map a tabular file, the application looks in the Geoconnect database for ``JoinTargetInformation``. If this information is more than 10 minutes* old, the application will retrieve fresh information and save it to the db. - -(* Change the timing via the Django settings variable ``JOIN_TARGET_UPDATE_TIME``.) - -This JoinTarget info is used to populate HTML forms used to match a tabular file column to a JoinTarget column. Once a JoinTarget is chosen, the JoinTarget ID is an essential piece of information used to make an API call to the WorldMap and attempt to map the file. - -Retrieving Join Target Information from WorldMap API -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The ``get_join_targets()`` function in ``dataverse_layer_services.py`` uses the WorldMap API, retrieves a list of available tabular file JointTargets. (See the `dataverse_layer_services code in GitHub `_.) - -Saving Join Target Information to Geoconnect Database -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The ``get_latest_jointarget_information()`` in ``utils.py`` retrieves recent JoinTarget Information from the database. (See the `utils code in GitHub `_.) - -Setting Up WorldMap Test Data -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -For the dataset page, this script gives a query to add test WorldMap map data. After the query is run, the "Explore Map" button should appear for a tabular file or shapefile. In the example SQL queries below, substitute ``$DATASET_ID`` and ``$DATAFILE_ID`` with the appropriate ID's. - -To add sample map data for a tabular file: - -.. code:: - - INSERT INTO maplayermetadata (id, isjoinlayer, joindescription, embedmaplink, layerlink, layername, mapimagelink, worldmapusername, dataset_id, datafile_id) - VALUES (DEFAULT, true, 'This file was joined with WorldMap layer x, y, z', - 'https://worldmap.harvard.edu/maps/embed/?layer=geonode:zip_codes_2015_zip_s9i','https://worldmap.harvard.edu/data/geonode:zip_codes_2015_zip_s9i', - 'geonode:zip_codes_2015_zip_s9i', - 'http://worldmap.harvard.edu/download/wms/27289/png?layers=geonode%3Azip_codes_2015_zip_s9i&width=865&bbox=-71.1911091251%2C42.2270382738%2C-70.9228275369%2C42.3976144794&service=WMS&format=image%2Fpng&srs=EPSG%3A4326&request=GetMap&height=550', - 'admin',$DATASET_ID,$DATAFILE_ID}); - -To add sample map data for a tabular shapefile: - -.. code:: - - INSERT INTO maplayermetadata (id, isjoinlayer, embedmaplink, layerlink, layername, mapimagelink, worldmapusername, dataset_id, datafile_id) - VALUES (DEFAULT, false, - 'https://worldmap.harvard.edu/maps/embed/?layer=geonode:zip_codes_2015_zip_s9i','https://worldmap.harvard.edu/data/geonode:zip_codes_2015_zip_s9i', - 'geonode:zip_codes_2015_zip_s9i', - 'http://worldmap.harvard.edu/download/wms/27289/png?layers=geonode%3Azip_codes_2015_zip_s9i&width=865&bbox=-71.1911091251%2C42.2270382738%2C-70.9228275369%2C42.3976144794&service=WMS&format=image%2Fpng&srs=EPSG%3A4326&request=GetMap&height=550', - 'admin',$DATASET_ID,$DATAFILE_ID); - ----- - -Previous: :doc:`unf/index` | Next: :doc:`remote-users` diff --git a/doc/sphinx-guides/source/developers/globus-api.rst b/doc/sphinx-guides/source/developers/globus-api.rst new file mode 100644 index 00000000000..43c237546be --- /dev/null +++ b/doc/sphinx-guides/source/developers/globus-api.rst @@ -0,0 +1,249 @@ +Globus Transfer API +=================== + +.. contents:: |toctitle| + :local: + +The Globus API addresses three use cases: + +* Transfer to a Dataverse-managed Globus endpoint (File-based or using the Globus S3 Connector) +* Reference of files that will remain in a remote Globus endpoint +* Transfer from a Dataverse-managed Globus endpoint + +The ability for Dataverse to interact with Globus endpoints is configured via a Globus store - see :ref:`globus-storage`. + +Globus transfers (or referencing a remote endpoint) for upload and download transfers involve a series of steps. These can be accomplished using the Dataverse and Globus APIs. (These are used internally by the `dataverse-globus app `_ when transfers are done via the Dataverse UI.) + +Requesting Upload or Download Parameters +---------------------------------------- + +The first step in preparing for a Globus transfer/reference operation is to request the parameters relevant for a given dataset: + +.. code-block:: bash + + curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusUploadParameters?persistentId=$PERSISTENT_IDENTIFIER&locale=$LOCALE" + +The response will be of the form: + +.. code-block:: bash + + { + "status": "OK", + "data": { + "queryParameters": { + "datasetId": 29, + "siteUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com", + "datasetVersion": ":draft", + "dvLocale": "en", + "datasetPid": "doi:10.5072/FK2/ILLPXE", + "managed": "true", + "fileSizeLimit": 100000000000, + "remainingQuota": 1000000000000, + "endpoint": "d8c42580-6528-4605-9ad8-116a61982644" + }, + "signedUrls": [ + { + "name": "requestGlobusTransferPaths", + "httpMethod": "POST", + "signedUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com/api/v1/datasets/29/requestGlobusUploadPaths?until=2023-11-22T01:52:03.648&user=dataverseAdmin&method=POST&token=63ac4bb748d12078dded1074916508e19e6f6b61f64294d38e0b528010b07d48783cf2e975d7a1cb6d4a3c535f209b981c7c6858bc63afdfc0f8ecc8a139b44a", + "timeOut": 300 + }, + { + "name": "addGlobusFiles", + "httpMethod": "POST", + "signedUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com/api/v1/datasets/29/addGlobusFiles?until=2023-11-22T01:52:03.648&user=dataverseAdmin&method=POST&token=2aaa03f6b9f851a72e112acf584ffc0758ed0cc8d749c5a6f8c20494bb7bc13197ab123e1933f3dde2711f13b347c05e6cec1809a8f0b5484982570198564025", + "timeOut": 300 + }, + { + "name": "getDatasetMetadata", + "httpMethod": "GET", + "signedUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com/api/v1/datasets/29/versions/:draft?until=2023-11-22T01:52:03.649&user=dataverseAdmin&method=GET&token=1878d6a829cd5540e89c07bdaf647f1bea5314cc7a55433b0b506350dd330cad61ade3714a8ee199a7b464fb3b8cddaea0f32a89ac3bfc4a86cd2ea3004ecbb8", + "timeOut": 300 + }, + { + "name": "getFileListing", + "httpMethod": "GET", + "signedUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com/api/v1/datasets/29/versions/:draft/files?until=2023-11-22T01:52:03.650&user=dataverseAdmin&method=GET&token=78e8ca8321624f42602af659227998374ef3788d0feb43d696a0e19086e0f2b3b66b96981903a1565e836416c504b6248cd3c6f7c2644566979bd16e23a99622", + "timeOut": 300 + } + ] + } + } + +The response includes the id for the Globus endpoint to use along with several parameters and signed URLs. The parameters include whether the Globus endpoint is "managed" by Dataverse and, +if so, if there is a "fileSizeLimit" (see :ref:`:MaxFileUploadSizeInBytes`) that will be enforced and/or, if there is a quota (see :doc:`/admin/collectionquotas`) on the overall size of data +that can be upload, what the "remainingQuota" is. Both are in bytes. + +Note that while Dataverse will not add files that violate the size or quota rules, Globus itself doesn't enforce these during the transfer. API users should thus check the size of the files +they intend to transfer before submitting a transfer request to Globus. + +The getDatasetMetadata and getFileListing URLs are just signed versions of the standard Dataset metadata and file listing API calls. The other two are Globus specific. + +If called for a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a +the "managed" parameter will be false, the "endpoint" parameter is replaced with a JSON array of "referenceEndpointsWithPaths" and the +requestGlobusTransferPaths and addGlobusFiles URLs are replaced with ones for requestGlobusReferencePaths and addFiles. All of these calls are +described further below. + +The call to set up for a transfer out (download) is similar: + +.. code-block:: bash + + curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusDownloadParameters?persistentId=$PERSISTENT_IDENTIFIER&locale=$LOCALE" + +Note that this API call supports an additional downloadId query parameter. This is only used when the globus-dataverse app is called from the Dataverse user interface. There is no need to use it when calling the API directly. + +The returned response includes the same getDatasetMetadata and getFileListing URLs as in the upload case and includes "monitorGlobusDownload" and "requestGlobusDownload" URLs. The response will also indicate whether the store is "managed" and will provide the "endpoint" from which downloads can be made. + + +Performing an Upload/Transfer In +-------------------------------- + +The information from the API call above can be used to provide a user with information about the dataset and to prepare to transfer (managed=true) or to reference files (managed=false). + +Once the user identifies which files are to be added, the requestGlobusTransferPaths or requestGlobusReferencePaths URLs can be called. These both reference the same API call but must be used with different entries in the JSON body sent: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV + export LOCALE=en-US + export JSON_DATA="... (SEE BELOW)" + + curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/requestGlobusUploadPaths?persistentId=$PERSISTENT_IDENTIFIER" + +Note that when using the dataverse-globus app or the return from the previous call, the URL for this call will be signed and no API_TOKEN is needed. + +In the managed case, the JSON body sent must include the id of the Globus user that will perform the transfer and the number of files that will be transferred: + +.. code-block:: bash + + { + "principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75", + "numberOfFiles":2 + } + +In the remote reference case, the JSON body sent must include the Globus endpoint/paths that will be referenced: + +.. code-block:: bash + + { + "referencedFiles":[ + "d8c42580-6528-4605-9ad8-116a61982644/hdc1/test1.txt" + ] + } + +The response will include a JSON object. In the managed case, the map is from new assigned file storageidentifiers and specific paths on the managed Globus endpoint: + +.. code-block:: bash + + { + "status":"OK", + "data":{ + "globusm://18b49d3688c-62137dcb06e4":"/hdc1/10.5072/FK2/ILLPXE/18b49d3688c-62137dcb06e4", + "globusm://18b49d3688c-5c17d575e820":"/hdc1/10.5072/FK2/ILLPXE/18b49d3688c-5c17d575e820" + } + } + +In the managed case, the specified Globus principal is granted write permission to the specified endpoint/path, +which will allow initiation of a transfer from the external endpoint to the managed endpoint using the Globus API. +The permission will be revoked if the transfer is not started and the next call to Dataverse to finish the transfer are not made within a short time (configurable, default of 5 minutes). + +In the remote/reference case, the map is from the initially supplied endpoint/paths to the new assigned file storageidentifiers: + +.. code-block:: bash + + { + "status":"OK", + "data":{ + "d8c42580-6528-4605-9ad8-116a61982644/hdc1/test1.txt":"globus://18bf8c933f4-ed2661e7d19b//d8c42580-6528-4605-9ad8-116a61982644/hdc1/test1.txt" + } + } + + + +Adding Files to the Dataset +--------------------------- + +In the managed case, you must initiate a Globus transfer and take note of its task identifier. As in the JSON example below, you will pass it as ``taskIdentifier`` along with details about the files you are transferring: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV + export JSON_DATA='{"taskIdentifier":"3f530302-6c48-11ee-8428-378be0d9c521", \ + "files": [{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b3972213f-f6b5c2221423", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "1234"}}, \ + {"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b39722140-50eb7d3c5ece", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "2345"}}]}' + + curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA" + +Note that the mimetype is multipart/form-data, matching the /addFiles API call. Also note that the API_TOKEN is not needed when using a signed URL. + +With this information, Dataverse will begin to monitor the transfer and when it completes, will add all files for which the transfer succeeded. +As the transfer can take significant time and the API call is asynchronous, the only way to determine if the transfer succeeded via API is to use the standard calls to check the dataset lock state and contents. + +Once the transfer completes, Dataverse will remove the write permission for the principal. + +An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This new framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`. + +Note that when using a managed endpoint that uses the Globus S3 Connector, the checksum should be correct as Dataverse can validate it. For file-based endpoints, the checksum should be included if available but Dataverse cannot verify it. + +In the remote/reference case, where there is no transfer to monitor, the standard /addFiles API call (see :ref:`direct-add-to-dataset-api`) is used instead. There are no changes for the Globus case. + +Downloading/Transfer Out Via Globus +----------------------------------- + +To begin downloading files, the requestGlobusDownload URL is used: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV + + curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/requestGlobusDownload?persistentId=$PERSISTENT_IDENTIFIER" + +The JSON body sent should include a list of file ids to download and, for a managed endpoint, the Globus principal that will make the transfer: + +.. code-block:: bash + + export JSON_DATA='{ \ + "principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75", \ + "fileIds":[60, 61] \ + }' + +Note that this API call takes an optional downloadId parameter that is used with the dataverse-globus app. When downloadId is included, the list of fileIds is not needed. + +The response is a JSON object mapping the requested file Ids to Globus endpoint/paths. In the managed case, the principal will have been given read permissions for the specified paths: + +.. code-block:: bash + + { + "status":"OK", + "data":{ + "60": "d8c42580-6528-4605-9ad8-116a61982644/hdc1/10.5072/FK2/ILLPXE/18bf3af9c78-92b8e168090e", + "61": "d8c42580-6528-4605-9ad8-116a61982644/hdc1/10.5072/FK2/ILLPXE/18bf3af9c78-c8d81569305c" + } + } + +For the remote case, the use can perform the transfer without further contact with Dataverse. In the managed case, the user must initiate the transfer via the Globus API and then inform Dataverse. +Dataverse will then monitor the transfer and revoke the read permission when the transfer is complete. (Not making this last call could result in failure of the transfer.) + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV + + curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST -d "$JSON_DATA" "$SERVER_URL/api/datasets/:persistentId/monitorGlobusDownload?persistentId=$PERSISTENT_IDENTIFIER" + +The JSON body sent just contains the task identifier for the transfer: + +.. code-block:: bash + + export JSON_DATA='{ \ + "taskIdentifier":"b5fd01aa-8963-11ee-83ae-d5484943e99a" \ + }' + + diff --git a/doc/sphinx-guides/source/developers/index.rst b/doc/sphinx-guides/source/developers/index.rst index 96595220e07..28b1fbaae82 100755 --- a/doc/sphinx-guides/source/developers/index.rst +++ b/doc/sphinx-guides/source/developers/index.rst @@ -4,11 +4,12 @@ contain the root `toctree` directive. Developer Guide -======================================================= +=============== **Contents:** .. toctree:: + :maxdepth: 2 intro dev-environment @@ -18,13 +19,18 @@ Developer Guide version-control sql-upgrade-scripts testing - documentation + api-design + security + performance dependencies debugging coding-style + configuration deployment containers making-releases + making-library-releases + metadataexport tools unf/index make-data-count @@ -32,4 +38,13 @@ Developer Guide geospatial selinux big-data-support + aux-file-support + s3-direct-upload-api + globus-api + dataset-semantic-metadata-api + dataset-migration-api workflows + fontcustom + classic-dev-env + search-services + diff --git a/doc/sphinx-guides/source/developers/intro.rst b/doc/sphinx-guides/source/developers/intro.rst index ea8e924b4ef..0e74dc1c36f 100755 --- a/doc/sphinx-guides/source/developers/intro.rst +++ b/doc/sphinx-guides/source/developers/intro.rst @@ -2,7 +2,7 @@ Introduction ============ -Welcome! `Dataverse `_ is an `open source `_ project that loves `contributors `_! +Welcome! `The Dataverse Project `_ is an `open source `_ project that loves contributors! .. contents:: |toctitle| :local: @@ -10,7 +10,7 @@ Welcome! `Dataverse `_ is an `open source `_ mailing list, `community calls `_, or support@dataverse.org. +If you have any questions at all, please reach out to other developers via https://chat.dataverse.org, the `dataverse-dev `_ mailing list, the `dataverse-community `_ mailing list, or `community calls `_. + +.. _core-technologies: Core Technologies ----------------- -Dataverse is a `Java EE `_ application that is compiled into a war file and deployed to an application server (Glassfish) which is configured to work with a relational database (PostgreSQL) and a search engine (Solr). +Dataverse is a `Jakarta EE `_ application that is compiled into a WAR file and deployed to an application server (app server) which is configured to work with a relational database (PostgreSQL) and a search engine (Solr). + +We make use of a variety of Jakarta EE technologies such as JPA, JAX-RS, JMS, and JSF. In addition, we use parts of Eclipse MicroProfile such as `MicroProfile Config `_. -We make use of a variety of Java EE technologies such as JPA, JAX-RS, JMS, and JSF. The front end is built using PrimeFaces and Bootstrap. +The frontend is built using PrimeFaces and Bootstrap. A new frontend is being built using React at https://github.com/IQSS/dataverse-frontend Roadmap ------- -For the Dataverse development roadmap, please see https://www.iq.harvard.edu/roadmap-dataverse-project +For the roadmap, please see https://www.iq.harvard.edu/roadmap-dataverse-project + +.. _kanban-board: Kanban Board ------------ -You can get a sense of what's currently in flight (in dev, in QA, etc.) by looking at https://github.com/orgs/IQSS/projects/2 +You can get a sense of what's currently in flight (in dev, in QA, etc.) by looking at https://github.com/orgs/IQSS/projects/34 Issue Tracker ------------- -We use GitHub Issues as our issue tracker: https://github.com/IQSS/dataverse/issues +The main issue tracker is https://github.com/IQSS/dataverse/issues but note that individual projects have their own issue trackers. Related Guides -------------- -If you are a developer who wants to make use of Dataverse APIs, please see the :doc:`/api/index`. If you have front-end UI questions, please see the :doc:`/style/index`. +If you are wondering about how to contribute generally, please see the :doc:`/contributor/index`. + +If you are a developer who wants to make use of the Dataverse APIs, please see the :doc:`/api/index`. + +If you have frontend UI questions, please see the :doc:`/style/index`. For the new frontend, see https://github.com/IQSS/dataverse-frontend -If you are a sysadmin who likes to code, you may be interested in hacking on installation scripts mentioned in the :doc:`/installation/index`. We validate the installation scripts with :doc:`/developers/tools` such as `Vagrant `_ and Docker (see the :doc:`containers` section). +If you are a Docker enthusiasts, please check out the :doc:`/container/index`. + +.. _related-projects: Related Projects ---------------- +Note: this list is somewhat old. Please see also the :doc:`/contributor/code` section of the Contributor Guide. + As a developer, you also may be interested in these projects related to Dataverse: -- External Tools - add additional features to Dataverse without modifying the core: :doc:`/api/external-tools` -- Dataverse API client libraries - use Dataverse APIs from various languages: :doc:`/api/client-libraries` -- DVUploader - a stand-alone command-line Java application that uses the Dataverse API to support upload of files from local disk to a Dataset: https://github.com/IQSS/dataverse-uploader +- External Tools - add additional features to the Dataverse Software without modifying the core: :doc:`/api/external-tools` +- Dataverse Software API client libraries - use Dataverse Software APIs from various languages: :doc:`/api/client-libraries` +- DVUploader - a stand-alone command-line Java application that uses the Dataverse Software API to support upload of files from local disk to a Dataset: https://github.com/IQSS/dataverse-uploader - dataverse-sample-data - populate your Dataverse installation with sample data: https://github.com/IQSS/dataverse-sample-data -- dataverse-metrics - aggregate and visualize metrics for installations of Dataverse around the world: https://github.com/IQSS/dataverse-metrics -- Configuration management scripts - Ansible, Puppet, etc.: See "Advanced Installation" in the :doc:`/installation/prep` section of the Installation Guide. +- dataverse-metrics - aggregate and visualize metrics for Dataverse installations around the world: https://github.com/IQSS/dataverse-metrics +- Configuration management scripts - Ansible, Puppet, etc.: See :ref:`advanced` section in the Installation Guide. - :doc:`/developers/unf/index` (Java) - a Universal Numerical Fingerprint: https://github.com/IQSS/UNF -- GeoConnect (Python) - create a map by uploading files to Dataverse: https://github.com/IQSS/geoconnect - `DataTags `_ (Java and Scala) - tag datasets with privacy levels: https://github.com/IQSS/DataTags -- `TwoRavens `_ (Javascript) - a `d3.js `_ interface for exploring data and running Zelig models: https://github.com/IQSS/TwoRavens -- `Zelig `_ (R) - run statistical models on files uploaded to Dataverse: https://github.com/IQSS/Zelig - `Matrix `_ - a visualization showing the connectedness and collaboration between authors and their affiliations. -- Third party apps - make use of Dataverse APIs: :doc:`/api/apps` -- chat.dataverse.org - chat interface for Dataverse users and developers: https://github.com/IQSS/chat.dataverse.org +- Third party apps - make use of Dataverse installation APIs: :doc:`/api/apps` +- chat.dataverse.org - chat interface for Dataverse Project users and developers: https://github.com/IQSS/chat.dataverse.org - [Your project here] :) - ----- - -Next: :doc:`dev-environment` diff --git a/doc/sphinx-guides/source/developers/make-data-count.rst b/doc/sphinx-guides/source/developers/make-data-count.rst index 0bb7e9e0ffd..9fb41f67be4 100644 --- a/doc/sphinx-guides/source/developers/make-data-count.rst +++ b/doc/sphinx-guides/source/developers/make-data-count.rst @@ -1,7 +1,7 @@ Make Data Count =============== -Support for Make Data Count is a feature of Dataverse that is described in the :doc:`/admin/make-data-count` section of the Admin Guide. In order for developers to work on the feature, they must install Counter Processor, a Python 3 application, as described below. Counter Processor can be found at https://github.com/CDLUC3/counter-processor +Support for Make Data Count is a feature of the Dataverse Software that is described in the :doc:`/admin/make-data-count` section of the Admin Guide. In order for developers to work on the feature, they must install Counter Processor, a Python 3 application, as described below. Counter Processor can be found at https://github.com/gdcc/counter-processor .. contents:: |toctitle| :local: @@ -9,7 +9,7 @@ Support for Make Data Count is a feature of Dataverse that is described in the : Architecture ------------ -There are many components involved in Dataverse's architecture for Make Data Count as shown in the diagram below. +There are many components involved in the Dataverse Software's architecture for Make Data Count as shown in the diagram below. |makedatacount_components| @@ -28,20 +28,18 @@ To insert a citation you could insert a row like below, changing "72" in the exa Full Setup ~~~~~~~~~~ -The recommended way to work on the Make Data Count feature is to spin up an EC2 instance that has both Dataverse and Counter Processor installed. Go to the :doc:`deployment` page for details on how to spin up an EC2 instance and make sure that your Ansible file is configured to install Counter Processor before running the "create" script. +The recommended way to work on the Make Data Count feature is to spin up an EC2 instance that has both the Dataverse Software and Counter Processor installed. Go to the :doc:`deployment` page for details on how to spin up an EC2 instance and make sure that your Ansible file is configured to install Counter Processor before running the "create" script. -(Alternatively, you can try installing Counter Processor in Vagrant. :download:`setup-counter-processor.sh <../../../../scripts/vagrant/setup-counter-processor.sh>` might help you get it installed.) +After you have spun to your EC2 instance, set ``:MDCLogPath`` so that the Dataverse installation creates a log for Counter Processor to operate on. For more on this database setting, see the :doc:`/installation/config` section of the Installation Guide. -After you have spun to your EC2 instance, set ``:MDCLogPath`` so that Dataverse creates a log for Counter Processor to operate on. For more on this database setting, see the :doc:`/installation/config` section of the Installation Guide. +Next you need to have the Dataverse installation add some entries to the log that Counter Processor will operate on. To do this, click on some published datasets and download some files. -Next you need to have Dataverse add some entries to the log that Counter Processor will operate on. To do this, click on some published datasets and download some files. +Next you should run Counter Processor to convert the log into a SUSHI report, which is in JSON format. Before running Counter Processor, you need to put a configuration file into place. As a starting point use :download:`counter-processor-config.yaml <../_static/developers/counter-processor-config.yaml>` and edit the file, paying particular attention to the following settings: -Next you should run Counter Processor to convert the log into a SUSHI report, which is in JSON format. Before running Counter Processor, you need to put a configuration file into place. As a starting point use :download:`counter-processor-config.yaml <../../../../scripts/vagrant/counter-processor-config.yaml>` and edit the file, paying particular attention to the following settings: - -- ``log_name_pattern`` You might want something like ``/usr/local/glassfish4/glassfish/domains/domain1/logs/counter_(yyyy-mm-dd).log`` +- ``log_name_pattern`` You might want something like ``/usr/local/payara6/glassfish/domains/domain1/logs/counter_(yyyy-mm-dd).log`` - ``year_month`` You should probably set this to the current month. -- ``output_file`` This needs to be a directory that the "glassfish" Unix user can read but that the "counter" user can write to. In dev, you can probably get away with "/tmp" as the directory. -- ``platform`` Out of the box from Counter Processor this is set to ``Dash`` but we're not 100% sure if this should be "Dataverse" or a branch for a Dataverse installation like "LibreScholar". +- ``output_file`` This needs to be a directory that the "dataverse" Unix user can read but that the "counter" user can write to. In dev, you can probably get away with "/tmp" as the directory. +- ``platform`` Out of the box from Counter Processor this is set to ``Dash`` but this should be changed to match the name of your Dataverse installation. Examples are "Harvard Dataverse Repository" for Harvard University or "LibraData" for the University of Virginia. - ``upload_to_hub`` This should be "False" unless you are testing sending SUSHI reports to the DataCite hub. - ``simulate_date`` You should probably set this to tomorrow. @@ -51,9 +49,9 @@ Once you are done with your configuration, you can run Counter Processor like th ``su - counter`` -``cd /usr/local/counter-processor-0.0.1`` +``cd /usr/local/counter-processor-1.06`` -``CONFIG_FILE=counter-processor-config.yaml python36 main.py`` +``CONFIG_FILE=counter-processor-config.yaml python39 main.py`` (Please note that the Counter Processor README says you can also pass in values like ``START_DATE``, ``END_DATE`` etc. at the command line if you find this to be more convenient.) @@ -71,12 +69,12 @@ If all this is working and you want to send data to the test instance of the Dat ``curl --header "Content-Type: application/json; Accept: application/json" -H "Authorization: Bearer $JSON_WEB_TOKEN" -X POST https://api.test.datacite.org/reports/ -d @sushi_report.json`` -For how to put citations into your dev database and how to get them out again, see "Configuring Dataverse for Make Data Count Citations" in the :doc:`/admin/make-data-count` section of the Admin Guide. +For how to put citations into your dev database and how to get them out again, see :ref:`MDC-updateCitationsForDataset` section in Make Data Count of the Admin Guide. -Testing Make Data Count and Dataverse -------------------------------------- +Testing Make Data Count and Your Dataverse Installation +------------------------------------------------------- -A developer running Counter Processor alongside Dataverse for development or testing purposes will notice that once the raw Dataverse logs have been processed, there is no straightforward way to re-test those same logs. +A developer running Counter Processor alongside the Dataverse installation for development or testing purposes will notice that once the raw Dataverse installation logs have been processed, there is no straightforward way to re-test those same logs. The first thing to fix is to clear two files from Counter Processor ``state`` folder, ``statefile.json`` and ``counter_db_[yyyy-mm].sqlite3`` @@ -84,11 +82,42 @@ Second, if you are also sending your SUSHI report to Make Data Count, you will n ``curl -H "Authorization: Bearer $JSON_WEB_TOKEN" -X DELETE https://$MDC_SERVER/reports/$REPORT_ID`` -To get the ``REPORT_ID``, look at the logs generated in ``/usr/local/counter-processor-0.0.1/tmp/datacite_response_body.txt`` +To get the ``REPORT_ID``, look at the logs generated in ``/usr/local/counter-processor-1.06/tmp/datacite_response_body.txt`` To read more about the Make Data Count api, see https://github.com/datacite/sashimi -You can compare the MDC metrics display with Dataverse's original by toggling the ``:DisplayMDCMetrics`` setting (true by default to display MDC metrics). +You can compare the MDC metrics display with the Dataverse installation's original by toggling the ``:DisplayMDCMetrics`` setting (true by default to display MDC metrics). + +Processing Archived Logs +------------------------ + +A new script (release date TBD) will be available for processing archived Dataverse log files. Monthly logs that are zipped, TARed, and copied to an archive can be processed by this script running nightly or weekly. + +The script will keep track of the state of each tar file they are processed and will make use of the following "processingState" API endpoints, which allow the state of each file to be checked or modified. + +The possible states are new, done, skip, processing, and failed. + +Setting the state to "skip" will prevent the file from being processed if the developer needs to analyze the contents. + +"failed" files will be re-tried in a later run. + +"done" files are successful and will be ignored going forward. + +The files currently being processed will have the state "processing". + +The script will process the newest set of log files (merging files from multiple nodes) and call Counter Processor. + +APIs to manage the states include GET, POST, and DELETE (for testing), as shown below. + +Note: ``yearMonth`` must be in the format ``yyyymm`` or ``yyyymmdd``. +Note: If running the new script on multiple servers add the query parameter &server=serverName on the first POST call. The server name can not be changed once set. To clear the name out you must delete the state and post a new one. + +``curl -X GET http://localhost:8080/api/admin/makeDataCount/{yearMonth}/processingState`` + +``curl -X POST http://localhost:8080/api/admin/makeDataCount/{yearMonth}/processingState?state=processing&server=server1`` +``curl -X POST http://localhost:8080/api/admin/makeDataCount/{yearMonth}/processingState?state=done`` + +``curl -X DELETE http://localhost:8080/api/admin/makeDataCount/{yearMonth}/processingState`` Resources --------- diff --git a/doc/sphinx-guides/source/developers/making-library-releases.rst b/doc/sphinx-guides/source/developers/making-library-releases.rst new file mode 100755 index 00000000000..be867f9196a --- /dev/null +++ b/doc/sphinx-guides/source/developers/making-library-releases.rst @@ -0,0 +1,146 @@ +======================= +Making Library Releases +======================= + +.. contents:: |toctitle| + :local: + +Introduction +------------ + +Note: See :doc:`making-releases` for Dataverse itself. + +We release Java libraries to Maven Central that are used by Dataverse (and perhaps `other `_ `software `_!): + +- https://central.sonatype.com/namespace/org.dataverse +- https://central.sonatype.com/namespace/io.gdcc + +We release JavaScript/TypeScript libraries to npm: + +- https://www.npmjs.com/package/@iqss/dataverse-design-system + +Maven Central (Java) +-------------------- + +From the perspective of the Maven Central, we are both `producers `_ because we publish/release libraries there and `consumers `_ because we pull down those libraries (and many others) when we build Dataverse. + +Releasing Existing Libraries to Maven Central +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you need to release an existing library, all the setup should be done already. The steps below assume that GitHub Actions are in place to do the heavy lifting for you, such as signing artifacts with GPG. + +Releasing a Snapshot Version to Maven Central +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +`Snapshot `_ releases are published automatically through GitHub Actions (e.g. through a `snapshot workflow `_ for the SWORD library) every time a pull request is merged (or the default branch, typically ``main``, is otherwise updated). + +That is to say, to make a snapshot release, you only need to get one or more commits into the default branch. + +Releasing a Release (Non-Snapshot) Version to Maven Central +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +From a pom.xml it may not be apparent that snapshots like ``6.0-SNAPSHOT`` might be changing under your feet. Browsing the snapshot repository (e.g. our `UNF 6.0-SNAPSHOT `_), may reveal versions changing over time. To finalize the code and stop it from changing, we publish/release what Maven calls a "`release version `_". This will remove ``-SNAPSHOT`` from the version (through an ``mvn`` command). + +Non-snapshot releases (`release `_ versions) are published automatically through GitHub Actions (e.g. through a `release workflow `_), kicked off locally by an ``mvn`` command that invokes the `Maven Release Plugin `_. + +First, run a clean: + +``mvn release:clean`` + +Then run a prepare: + +``mvn release:prepare`` + +The prepare step is interactive. You will be prompted for the following information: + +- the release version (e.g. `2.0.0 `_) +- the git tag to create and push (e.g. `sword2-server-2.0.0 `_) +- the next development (snapshot) version (e.g. `2.0.1-SNAPSHOT `_) + +These examples from the SWORD library. Below is what to expect from the interactive session. In many cases, you can just hit enter to accept the defaults. + +.. code-block:: bash + + [INFO] 5/17 prepare:map-release-versions + What is the release version for "SWORD v2 Common Server Library (forked)"? (sword2-server) 2.0.0: : + [INFO] 6/17 prepare:input-variables + What is the SCM release tag or label for "SWORD v2 Common Server Library (forked)"? (sword2-server) sword2-server-2.0.0: : + [INFO] 7/17 prepare:map-development-versions + What is the new development version for "SWORD v2 Common Server Library (forked)"? (sword2-server) 2.0.1-SNAPSHOT: : + [INFO] 8/17 prepare:rewrite-poms-for-release + +Note that a commit or two will be made and pushed but if you do a ``git status`` you will see that locally you are behind by that number of commits. To fix this, you can just do a ``git pull``. + +It can take some time for the jar to be visible on Maven Central. You can start by looking on the repo1 server, like this: https://repo1.maven.org/maven2/io/gdcc/sword2-server/2.0.0/ + +Don't bother putting the new version in a pom.xml until you see it on repo1. + +Note that the next snapshot release should be available as well, like this: https://s01.oss.sonatype.org/content/groups/staging/io/gdcc/sword2-server/2.0.1-SNAPSHOT/ + +Releasing a New Library to Maven Central +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +At a high level: + +- Start with a snapshot release. +- Use an existing pom.xml as a starting point. +- Use existing GitHub Actions workflows as a starting point. +- Create secrets in the new library's GitHub repo used by the workflow. +- If you need an entire new namespace, look at previous issues such as https://issues.sonatype.org/browse/OSSRH-94575 and https://issues.sonatype.org/browse/OSSRH-94577 + +Updating pom.xml for a Snapshot Release +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Before publishing a final version to Maven Central, you should publish a snapshot release or two. For each snapshot release you publish, the jar name will be unique each time (e.g. ``foobar-0.0.1-20240430.175110-3.jar``), so you can safely publish over and over with the same version number. + +We use the `Nexus Staging Maven Plugin `_ to push snapshot releases to https://s01.oss.sonatype.org/content/groups/staging/io/gdcc/ and https://s01.oss.sonatype.org/content/groups/staging/org/dataverse/ + +Add the following to your pom.xml: + +.. code-block:: xml + + 0.0.1-SNAPSHOT + + + + ossrh + https://s01.oss.sonatype.org/content/repositories/snapshots + + + ossrh + https://s01.oss.sonatype.org/service/local/staging/deploy/maven2/ + + + + + org.sonatype.plugins + nexus-staging-maven-plugin + ${nexus-staging.version} + true + + ossrh + https://s01.oss.sonatype.org + true + + + +Configuring Secrets +~~~~~~~~~~~~~~~~~~~ + +In GitHub, you will likely need to configure the following secrets: + +- DATAVERSEBOT_GPG_KEY +- DATAVERSEBOT_GPG_PASSWORD +- DATAVERSEBOT_SONATYPE_TOKEN +- DATAVERSEBOT_SONATYPE_USERNAME + +Note that some of these secrets might be configured at the org level (e.g. gdcc or IQSS). + +Many of the automated tasks are performed by the dataversebot account on GitHub: https://github.com/dataversebot + +npm (JavaScript/TypeScript) +--------------------------- + +Currently, publishing `@iqss/dataverse-design-system `_ to npm done manually. We plan to automate this as part of https://github.com/IQSS/dataverse-frontend/issues/140 + +https://www.npmjs.com/package/js-dataverse is the previous 1.0 version of js-dataverse. No 1.x releases are planned. We plan to publish 2.0 (used by the new frontend) as discussed in https://github.com/IQSS/dataverse-frontend/issues/13 diff --git a/doc/sphinx-guides/source/developers/making-releases.rst b/doc/sphinx-guides/source/developers/making-releases.rst index cbd88b1a357..028b80e2892 100755 --- a/doc/sphinx-guides/source/developers/making-releases.rst +++ b/doc/sphinx-guides/source/developers/making-releases.rst @@ -5,75 +5,405 @@ Making Releases .. contents:: |toctitle| :local: -Use the number of the milestone with a "v" in front for the release tag. For example: ``v4.6.2``. +Introduction +------------ -Create the release GitHub issue and branch ------------------------------------------- +This document is about releasing the main Dataverse app (https://github.com/IQSS/dataverse). See :doc:`making-library-releases` for how to release our various libraries. Other projects have their own release documentation. -Use the GitHub issue number and the release tag for the name of the branch. -For example: 4734-update-v-4.8.6-to-4.9 +Below you'll see branches like "develop" and "master" mentioned. For more on our branching strategy, see :doc:`version-control`. -**Note:** the changes below must be the very last commits merged into the develop branch before it is merged into master and tagged for the release! +Regular or Hotfix? +------------------ -Make the following changes in the release branch: +Early on, make sure it's clear what type of release this is. The steps below describe making both regular releases and hotfix releases. -1. Bump Version Numbers -======================= +- regular -Increment the version number to the milestone (e.g. 4.6.2) in the following two files: + - e.g. 6.5 (minor) + - e.g. 7.0 (major) -- pom.xml -- doc/sphinx-guides/source/conf.py (two places) +- hotfix -Add the version being released to the lists in the following two files: + - e.g. 6.4.1 (patch) + - e.g. 7.0.1 (patch) -- doc/sphinx-guides/source/versions.rst -- scripts/database/releases.txt +Ensure Issues Have Been Created +------------------------------- -Here's an example commit where three of the four files above were updated at once: https://github.com/IQSS/dataverse/commit/99e23f96ec362ac2f524cb5cd80ca375fa13f196 +Some of the steps in this document are well-served by having their own dedicated GitHub issue. You'll see a label like this on them: -2. Check in the Changes Above... -================================ +|dedicated| -... into the release branch, make a pull request and merge the release branch into develop. +There are a variety of reasons why a step might deserve its own dedicated issue: +- The step can be done by a team member other than the person doing the release. +- Stakeholders might be interested in the status of a step (e.g. has the release been deployed to the demo site). -Merge "develop" into "master" ------------------------------ +Steps don't get their own dedicated issue if it would be confusing to have multiple people involved. Too many cooks in the kitchen, as they say. Also, some steps are so small the overhead of an issue isn't worth it. + +Before the release even begins you can coordinate with the project manager about the creation of these issues. + +.. |dedicated| raw:: html + + + Dedicated Issue +   + +Declare a Code Freeze +--------------------- + +The following steps are made more difficult if code is changing in the "develop" branch. Declare a code freeze until the release is out. Do not allow pull requests to be merged. + +For a hotfix, a code freeze (no merging) is necessary not because we want code to stop changing in the branch being hotfix released, but because bumping the version used in Jenkins/Ansible means that API tests will fail in pull requests until the version is bumped in those pull requests. + +Conduct Performance Testing +--------------------------- -The "develop" branch should be merged into "master" before tagging. See also the branching strategy described in the :doc:`version-control` section. +|dedicated| + +See :doc:`/qa/performance-tests` for details. + +Conduct Regression Testing +--------------------------- + +|dedicated| + +See :doc:`/qa/testing-approach` for details. +Refer to the provided regression checklist for the list of items to verify during the testing process: `Regression Checklist `_. + +.. _write-release-notes: Write Release Notes ------------------- -Developers should express the need for an addition to release notes by creating a file in ``/doc/release-notes`` containing the name of the issue they're working on. The name of the branch could be used for the filename with ".md" appended (release notes are written in Markdown) such as ``5053-apis-custom-homepage.md``. +|dedicated| + +Developers express the need for an addition to release notes by creating a "release note snippet" in ``/doc/release-notes`` containing the name of the issue they're working on. The name of the branch could be used for the filename with ".md" appended (release notes are written in Markdown) such as ``5053-apis-custom-homepage.md``. See :ref:`writing-release-note-snippets` for how this is described for contributors. + +The task at or near release time is to collect these snippets into a single file. + +- Find the issue in GitHub that tracks the work of creating release notes for the upcoming release. +- Create a branch, add a .md file for the release (ex. 5.10.1 Release Notes) in ``/doc/release-notes`` and write the release notes, making sure to pull content from the release note snippets mentioned above. Snippets may not include any issue number or pull request number in the text so be sure to copy the number from the filename of the snippet into the final release note. +- Delete (``git rm``) the release note snippets as the content is added to the main release notes file. +- Include instructions describing the steps required to upgrade the application from the previous version. These must be customized for release numbers and special circumstances such as changes to metadata blocks and infrastructure. +- Make a pull request. Here's an example: https://github.com/IQSS/dataverse/pull/11613 +- Note that we won't merge the release notes until after we have confirmed that the upgrade instructions are valid by performing a couple upgrades. + +For a hotfix, don't worry about release notes yet. + +Deploy Release Candidate to Internal +------------------------------------ + +|dedicated| + +First, build the release candidate. For a regular release, you will use the "develop" branch, as shown below. For a hotfix, you will use whatever branch name is used for the hotfix. + +Go to https://jenkins.dataverse.org/job/IQSS_Dataverse_Internal/ and make the following adjustments to the config: + +- Repository URL: ``https://github.com/IQSS/dataverse.git`` +- Branch Specifier (blank for 'any'): ``*/develop`` +- Execute shell: Update version in filenames to ``dataverse-5.10.war`` (for example) + +Click "Save" then "Build Now". The release candidate war file will be available at https://jenkins.dataverse.org/job/IQSS_Dataverse_Internal/ws/target/ + +ssh into the dataverse-internal server and download the release candidate war file from the URL above. + +Go to /doc/release-notes, open the release-notes.md file for the release we're working on, and perform all the steps under "Upgrade Instructions". Note that for regular releases, we haven't bumped the version yet so you won't be able to follow the steps exactly. (For hotfix releases, the version will be bumped already.) + +Deploy Release Candidate to Demo +-------------------------------- + +|dedicated| + +Deploy the same war file to https://demo.dataverse.org using the same upgrade instructions as above. + +Merge Release Notes (Once Ready) +-------------------------------- + +If the upgrade instructions are perfect, simply merge the release notes. + +If the upgrade instructions aren't quite right, work with the authors of the release notes until they are good enough, and then merge. + +For a hotfix, there are no release notes to merge yet. + +Prepare Release Branch +---------------------- + +|dedicated| + +The release branch will have the final changes such as bumping the version number. + +Usually we branch from the "develop" branch to create the release branch. If we are creating a hotfix for a particular version (5.11, for example), we branch from the tag (e.g. ``v5.11``). + +Create a release branch named after the issue that tracks bumping the version with a descriptive name like "10852-bump-to-6.4" from https://github.com/IQSS/dataverse/pull/10871. + +**Note:** the changes below must be the very last commits merged into the develop branch before it is merged into master and tagged for the release! + +Make the following changes in the release branch. + +Increment the version number to the milestone (e.g. 5.10.1) in the following two files: + +- modules/dataverse-parent/pom.xml -> ```` -> ```` (e.g. `pom.xml commit `_) +- doc/sphinx-guides/source/conf.py + +In the following ``versions.rst`` file: + +- doc/sphinx-guides/source/versions.rst - Below the ``- |version|`` bullet (``|version|`` comes from the ``conf.py`` file you just edited), add a bullet for what is soon to be the previous release. + +Return to the parent pom and make the following change, which is necessary for proper tagging of images: + +- modules/dataverse-parent/pom.xml -> ```` -> profile "ct" -> ```` -> Set ```` to ``${revision}`` + +When testing the version change in Docker note that you will have to build the base image manually. See :ref:`base-image-build-instructions`. + +(Before you make this change the value should be ``${parsedVersion.majorVersion}.${parsedVersion.nextMinorVersion}``. Later on, after cutting a release, we'll change it back to that value.) + +For a regular release, make the changes above in the release branch you created, but hold off for a moment on making a pull request because Jenkins will fail because it will be testing the previous release. + +In the dataverse-ansible repo bump the version in `jenkins.yml `_ and make a pull request such as https://github.com/gdcc/dataverse-ansible/pull/386. Wait for it to be merged. Note that bumping on the Jenkins side like this will mean that all pull requests will show failures in Jenkins until they are updated to the version we are releasing. + +Once dataverse-ansible has been merged, return to the branch you created above ("10852-bump-to-6.4" or whatever) and make a pull request. Ensure that all tests are passing and then put the PR through the normal review and QA process. + +If you are making a hotfix release, ```` should already be set to ``${revision}``. If so, leave it alone. Go ahead and do the normal bumping of version numbers described above. Make the pull request against the "master" branch. Put it through review and QA. Do not delete the branch after merging because we will later merge it into the "develop" branch to pick up the hotfix. More on this later. + +Merge "develop" into "master" (non-hotfix only) +----------------------------------------------- + +If this is a regular (non-hotfix) release, create a pull request to merge the "develop" branch into the "master" branch using this "compare" link: https://github.com/IQSS/dataverse/compare/master...develop + +Once important tests have passed (compile, unit tests, etc.), merge the pull request (skipping code review is ok). Don't worry about style tests failing such as for shell scripts. + +If this is a hotfix release, skip this whole "merge develop to master" step (the "develop" branch is not involved until later). + +Add Milestone to Pull Requests and Issues +----------------------------------------- + +Often someone is making sure that the proper milestone (e.g. 5.10.1) is being applied to pull requests and issues, but sometimes this falls between the cracks. + +Check for merged pull requests that have no milestone by going to https://github.com/IQSS/dataverse/pulls and entering `is:pr is:merged no:milestone `_ as a query. If you find any, add the milestone to the pull request and any issues it closes. This includes the "merge develop into master" pull request above. + +(Optional) Test Docker Images +----------------------------- + +After the "master" branch has been updated and the GitHub Action to build and push Docker images has run (see `PR #9776 `_), go to https://hub.docker.com/u/gdcc and make sure the "latest" tag for the following images has been updated: + +- https://hub.docker.com/r/gdcc/base +- https://hub.docker.com/r/gdcc/dataverse +- https://hub.docker.com/r/gdcc/configbaker + +TODO: Get https://github.com/gdcc/api-test-runner working. + +.. _build-guides: + +Build the Guides for the Release +-------------------------------- + +Go to https://jenkins.dataverse.org/job/guides.dataverse.org/ and make the following adjustments to the config: + +- Repository URL: ``https://github.com/IQSS/dataverse.git`` +- Branch Specifier (blank for 'any'): ``*/master`` +- ``VERSION`` (under "Build Steps"): bump to the next release. Don't prepend a "v". Use ``5.10.1`` (for example) + +Click "Save" then "Build Now". + +Make sure the guides directory appears in the expected location such as https://guides.dataverse.org/en/5.10.1/ + +As described below, we'll soon point the "latest" symlink to that new directory. + +Create a Draft Release on GitHub +-------------------------------- -At or near release time: +Go to https://github.com/IQSS/dataverse/releases/new to start creating a draft release. -- Create an issue in Github to track the work of creating release notes for the upcoming release -- Create a branch, add a .md file for the release (ex. 4.16 Release Notes) in ``/doc/release-notes`` and write the release notes, making sure to pull content from the issue-specific release notes mentioned above -- Delete the previously-created, issue-specific release notes as the content is added to the main release notes file -- Take the release notes .md through the regular Code Review and QA process -- Create a draft release at https://github.com/IQSS/dataverse/releases/new -- The "tag version" and "title" should be the number of the milestone with a "v" in front (i.e. v4.16). -- Copy in the content from the .md file -- For the description, follow post-4.16 examples at https://github.com/IQSS/dataverse/releases +- Under "Choose a tag" you will be creating a new tag. Have it start with a "v" such as ``v5.10.1``. Click "Create new tag on publish". +- Under "Target", choose "master". This commit will appear in ``/api/info/version`` from a running installation. +- Under "Release title" use the same name as the tag such as ``v5.10.1``. +- In the description, copy and paste the content from the release notes .md file created in the "Write Release Notes" steps above. +- Click "Save draft" because we do not want to publish the release yet. +At this point you can send around the draft release for any final feedback. Links to the guides for this release should be working now, since you build them above. + +Make corrections to the draft, if necessary. It will be out of sync with the .md file, but that's ok (`#7988 `_ is tracking this). + +.. _run-build-create-war: + +Run a Build to Create the War File +---------------------------------- + +ssh into the dataverse-internal server and undeploy the current war file. + +Go to https://jenkins.dataverse.org/job/IQSS_Dataverse_Internal/ and make the following adjustments to the config: + +- Repository URL: ``https://github.com/IQSS/dataverse.git`` +- Branch Specifier (blank for 'any'): ``*/master`` +- Execute shell: Update version in filenames to ``dataverse-5.10.1.war`` (for example) + +Click "Save" then "Build Now". + +This will build the war file, and then automatically deploy it on dataverse-internal. Verify that the application has deployed successfully. + +The build number will appear in ``/api/info/version`` (along with the commit mentioned above) from a running installation (e.g. ``{"version":"5.10.1","build":"907-b844672``). + +Note that the build number comes from the following script in an early Jenkins build step... + +.. code-block:: bash + + COMMIT_SHA1=`echo $GIT_COMMIT | cut -c-7` + echo "build.number=${BUILD_NUMBER}-${COMMIT_SHA1}" > $WORKSPACE/src/main/java/BuildNumber.properties + +... but we can explore alternative methods of specifying the build number, as described in :ref:`auto-custom-build-number`. + +Build Installer (dvinstall.zip) +------------------------------- + +ssh into the dataverse-internal server and do the following: + +- In a git checkout of the dataverse source switch to the master branch and pull the latest. +- Copy the war file from the previous step to the ``target`` directory in the root of the repo (create it, if necessary): +- ``mkdir target`` +- ``cp /tmp/dataverse-5.10.1.war target`` +- ``cd scripts/installer`` +- ``make clean`` +- ``make`` + +A zip file called ``dvinstall.zip`` should be produced. + +Alternatively, you can build the installer on your own dev. instance. But make sure you use the war file produced in the step above, not a war file build from master on your own system! That's because we want the released application war file to contain the build number described above. Download the war file directly from Jenkins, or from dataverse-internal. Make Artifacts Available for Download ------------------------------------- Upload the following artifacts to the draft release you created: -- war file (``mvn package`` from Jenkins) -- installer (``cd scripts/installer && make``) -- other files as needed, such as updated Solr schema and config files +- the war file (e.g. ``dataverse-5.10.1.war``, from above) +- the installer (``dvinstall.zip``, from above) +- other files as needed: -Publish Release ---------------- + - updated Solr schema + - metadata block tsv files + - config files + +Publish the Release +------------------- Click the "Publish release" button. ----- +Update Guides Link +------------------ + +"latest" at https://guides.dataverse.org/en/latest/ is a symlink to the directory with the latest release. That directory (e.g. ``5.10.1``) was put into place by the Jenkins "guides" job described above. + +ssh into the guides server and update the symlink to point to the latest release, as in the example below. + +.. code-block:: bash + + cd /var/www/html/en + ln -s 5.10.1 latest + +This step could be done before publishing the release if you'd like to double check that links in the release notes work. + +Close Milestone on GitHub and Create a New One +---------------------------------------------- + +You can find our milestones at https://github.com/IQSS/dataverse/milestones + +Now that we've published the release, close the milestone and create a new one for the **next** release, the release **after** the one we're working on, that is. + +Note that for milestones we use just the number without the "v" (e.g. "5.10.1"). + +On the project board at https://github.com/orgs/IQSS/projects/34 edit the tab (view) that shows the milestone to show the next milestone. + +.. _base_image_post_release: + +Update the Container Base Image Version Property +------------------------------------------------ + +|dedicated| + +Create a new branch (any name is fine but ``prepare-next-iteration`` is suggested) and update the following files to prepare for the next development cycle: + +- modules/dataverse-parent/pom.xml -> ```` -> profile "ct" -> ```` -> Set ```` to ``${parsedVersion.majorVersion}.${parsedVersion.nextMinorVersion}`` + +Create a pull request and put it through code review, like usual. Give it a milestone of the next release, the one **after** the one we're working on. Once the pull request has been approved, merge it. It should be the first PR merged of the next release. + +For more background, see :ref:`base-image-supported-tags`. For an example, see https://github.com/IQSS/dataverse/pull/10896 + +For a hotfix, we will do this later and in a different branch. See below. + +Deploy Final Release on Demo +---------------------------- + +|dedicated| + +Above you already did the hard work of deploying a release candidate to https://demo.dataverse.org. It should be relatively straightforward to undeploy the release candidate and deploy the final release. + +.. _update-schemaspy: + +Update SchemaSpy +---------------- + +We maintain SchemaSpy at URLs like https://guides.dataverse.org/en/latest/schemaspy/index.html and (for example) https://guides.dataverse.org/en/6.6/schemaspy/index.html + +Get the attention of the core team and ask someone to update it for the new release. + +Consider updating `the thread `_ on the mailing list once the update is in place. + +See also :ref:`schemaspy`. + +Alert Translators About the New Release +--------------------------------------- + +Create an issue at https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs/issues to say a new release is out and that we would love for the properties files for English to be added. + +For example, for 6.4 we wrote "Update en_US/Bundle.properties etc. for Dataverse 6.4" at https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs/issues/125 + +Add the Release to the Dataverse Roadmap +---------------------------------------- + +Add an entry to the list of releases at https://www.iq.harvard.edu/roadmap-dataverse-project + +Announce the Release on the Dataverse Blog +------------------------------------------ + +Make a blog post at https://dataverse.org/blog + +Announce the Release on the Mailing List +---------------------------------------- + +Post a message at https://groups.google.com/g/dataverse-community + +Announce the Release on Zulip +----------------------------- + +Post a message under #community at https://dataverse.zulipchat.com + +For Hotfixes, Merge Hotfix Branch into "develop" +------------------------------------------------ + +Note: this only applies to hotfixes! + +We've merged the hotfix into the "master" branch but now we need the fixes (and version bump) in the "develop" branch. + +Make a new branch off the hotfix branch. You can call it something like "6.7.1-merge-hotfix-to-develop". + +In that branch, do the :ref:`base_image_post_release` step you skipped above. Now is the time. + +Create a pull request against develop. Merge conflicts are possible and this pull request should go through review and QA like normal. Afterwards it's fine to delete this branch and the hotfix branch that was merged into master. + +For Hotfixes, Tell Developers to Merge "develop" into Their Branches and Rename SQL Scripts +------------------------------------------------------------------------------------------- + +Note: this only applies to hotfixes! + +Because we have merged a version bump from the hotfix into the "develop" branch, any SQL scripts in the "develop" branch should be renamed (from "5.11.0" to "5.11.1" for example). (To read more about our naming conventions for SQL scripts, see :doc:`sql-upgrade-scripts`.) + +Look at ``src/main/resources/db/migration`` in the "develop" branch and if any SQL scripts have the wrong version, make a pull request (or ask a developer to) to update them (all at once in a single PR is fine). + +Tell developers to merge the "develop" into their open pull requests (to pick up the new version and any fixes) and rename SQL scripts (if any) with the new version. + +Lift the Code Freeze and Encourage Developers to Update Their Branches +---------------------------------------------------------------------- + +It's now safe to lift the code freeze. We can start merging pull requests into the "develop" branch for the next release. -Previous: :doc:`containers` | Next: :doc:`tools` +Let developers know that they should merge the latest from the "develop" branch into any branches they are working on. (For hotfixes we've already told them this.) diff --git a/doc/sphinx-guides/source/developers/metadataexport.rst b/doc/sphinx-guides/source/developers/metadataexport.rst new file mode 100644 index 00000000000..63630b64c44 --- /dev/null +++ b/doc/sphinx-guides/source/developers/metadataexport.rst @@ -0,0 +1,91 @@ +======================= +Metadata Export Formats +======================= + +.. contents:: |toctitle| + :local: + +Introduction +------------ + +Dataverse ships with a number of metadata export formats available for published datasets. A given metadata export +format may be available for user download (via the UI and API) and/or be available for use in Harvesting between +Dataverse instances. + +As of v5.14, Dataverse provides a mechanism for third-party developers to create new metadata Exporters than implement +new metadata formats or that replace existing formats. All the necessary dependencies are packaged in an interface JAR file +available from Maven Central. Developers can distribute their new Exporters as JAR files which can be dynamically loaded +into Dataverse instances - see :ref:`external-exporters`. Developers are encouraged to work with the core Dataverse team +(see :ref:`getting-help-developers`) to distribute these JAR files via Maven Central. See the +`Croissant `_ and +`Debug `_ artifacts as examples. You may find other examples +under :ref:`inventory-of-external-exporters` in the Installation Guide. + +Exporter Basics +--------------- + +New Exports must implement the ``io.gdcc.spi.export.Exporter`` interface. The interface includes a few methods for the Exporter +to provide Dataverse with the format it produces, a display name, format mimetype, and whether the format is for download +and/or harvesting use, etc. It also includes a main ``exportDataset(ExportDataProvider dataProvider, OutputStream outputStream)`` +method through which the Exporter receives metadata about the given dataset (via the ``ExportDataProvider``, described further +below) and writes its output (as an OutputStream). + +Exporters that create an XML format must implement the ``io.gdcc.spi.export.XMLExporter`` interface (which extends the Exporter +interface). XMLExporter adds a few methods through which the XMLExporter provides information to Dataverse about the XML +namespace and version being used. + +Exporters also need to use the ``@AutoService(Exporter.class)`` which makes the class discoverable as an Exporter implementation. + +The ``ExportDataProvider`` interface provides several methods through which your Exporter can receive dataset and file metadata +in various formats. Your exporter would parse the information in one or more of these inputs to retrieve the values needed to +generate the Exporter's output format. + +The most important methods/input formats are: + +- ``getDatasetJson()`` - metadata in the internal Dataverse JSON format used in the native API and available via the built-in JSON metadata export. +- ``getDatasetORE()`` - metadata in the OAI_ORE format available as a built-in metadata format and as used in Dataverse's BagIT-based Archiving capability. +- ``getDatasetFileDetails`` - detailed file-level metadata for ingested tabular files. + +The first two of these provide ~complete metadata about the dataset along with the metadata common to all files. This includes all metadata +entries from all metadata blocks, PIDs, tags, Licenses and custom terms, etc. Almost all built-in exporters today use the JSON input. +The newer OAI_ORE export, which is JSON-LD-based, provides a flatter structure and references metadata terms by their external vocabulary ids +(e.g. http://purl.org/dc/terms/title) which may make it a prefereable starting point in some cases. + +The last method above provides a new JSON-formatted serialization of the variable-level file metadata Dataverse generates during ingest of tabular files. +This information has only been included in the built-in DDI export, as the content of a ``dataDscr`` element. (Hence inspecting the edu.harvard.iq.dataverse.export.DDIExporter and related classes would be a good way to explore how the JSON is structured.) + +The interface also provides + +- ``getDatasetSchemaDotOrg();`` and +- ``getDataCiteXml();``. + +These provide subsets of metadata in the indicated formats. They may be useful starting points if your exporter will, for example, only add one or two additional fields to the given format. + +If an Exporter cannot create a requested metadata format for some reason, it should throw an ``io.gdcc.spi.export.ExportException``. + +Building an Exporter +-------------------- + +The examples at https://github.com/gdcc/exporter-croissant and https://github.com/gdcc/exporter-debug provide a Maven pom.xml file suitable for building an Exporter JAR file and those repositories provide additional development guidance. + +There are four dependencies needed to build an Exporter: + +- ``io.gdcc dataverse-spi`` library containing the interfaces discussed above and the ExportException class +- ``com.google.auto.service auto-service``, which provides the @AutoService annotation +- ``jakarta.json jakarata.json-api`` for JSON classes +- ``jakarta.ws.rs jakarta.ws.rs-api``, which provides a MediaType enumeration for specifying mime types. + +Specifying a Prerequisite Export +-------------------------------- + +An advanced feature of the Exporter mechanism allows a new Exporter to specify that it requires, as input, +the output of another Exporter. An example of this is the builting HTMLExporter which requires the output +of the DDI XML Exporter to produce an HTML document with the same DDI content. + +This is configured by providing the metadata format name via the ``Exporter.getPrerequisiteFormatName()`` method. +When this method returns a non-empty format name, Dataverse will provide the requested format to the Exporter via +the ``ExportDataProvider.getPrerequisiteInputStream()`` method. + +Developers and administrators deploying Exporters using this mechanism should be aware that, since metadata formats +can be changed by other Exporters, the InputStream received may not hold the expected metadata. Developers should clearly +document their compatability with the built-in or third-party Exporters they support as prerequisites. diff --git a/doc/sphinx-guides/source/developers/performance.rst b/doc/sphinx-guides/source/developers/performance.rst new file mode 100644 index 00000000000..6c864bec257 --- /dev/null +++ b/doc/sphinx-guides/source/developers/performance.rst @@ -0,0 +1,201 @@ +Performance +=========== + +`Performance is a feature `_ was a mantra when Stack Overflow was being developed. We endeavor to do the same with Dataverse! + +In this section we collect ideas and share practices for improving performance. + +.. contents:: |toctitle| + :local: + +Problem Statement +----------------- + +Performance has always been important to the Dataverse Project, but results have been uneven. We've seen enough success in the marketplace that performance must be adequate, but internally we sometimes refer to Dataverse as a pig. 🐷 + +Current Practices +----------------- + +We've adopted a number of practices to help us maintain our current level of performance and most should absolutely continue in some form, but challenges mentioned throughout should be addressed to further improve performance. + +Cache When You Can +~~~~~~~~~~~~~~~~~~ + +The Metrics API, for example, caches values for 7 days by default. We took a look at JSR 107 (JCache - Java Temporary Caching API) in `#2100 `_. We're aware of the benefits of caching. + +Use Async +~~~~~~~~~ + +We index datasets (and all objects) asynchronously. That is, we let changes persist in the database and afterward copy the data into Solr. + +Use a Queue +~~~~~~~~~~~ + +We use a JMS queue for when ingesting tabular files. We've talked about adding a queue (even `an external queue `_) for indexing, DOI registration, and other services. + +Offload Expensive Operations Outside the App Server +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When operations are computationally expensive, we have realized performance gains by offloading them to systems outside of the core code. For example, rather than having files pass through our application server when they are downloaded, we use direct download so that client machines download files directly from S3. (We use the same trick with upload.) When a client downloads multiple files, rather than zipping them within the application server as before, we now have a separate "zipper" process that does this work out of band. + +Drop to Raw SQL as Necessary +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We aren't shy about writing raw SQL queries when necessary. We've written `querycount `_  scripts to help identify problematic queries and mention slow query log at :doc:`/admin/monitoring`. + +Add Indexes to Database Tables +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There was a concerted effort in `#1880 `_ to add indexes to a large number of columns, but it's something we're mindful of, generally. Perhaps we could use some better detection of when indexes would be valuable. + +Find Bottlenecks with a Profiler +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +VisualVM is popular and bundled with Netbeans. Many options are available including `JProfiler `_. + +Warn Developers in Code Comments +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For code that has been optimized for performance, warnings are sometimes inserted in the form of comments for future developers to prevent backsliding. + +Write Docs for Devs about Perf +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Like this doc. :) + +Sometimes perf is written about in other places, such as :ref:`avoid-efficiency-issues-with-render-logic-expressions`. + +Horizontal Scaling of App Server +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We've made it possible to run more than one application server, though it requires some special configuration. This way load can be spread out across multiple servers. For details, see :ref:`multiple-app-servers` in the Installation Guide. + +Code Review and QA +~~~~~~~~~~~~~~~~~~ + +Before code is merged, while it is in review or QA, if a performance problem is detected (usually on an ad hoc basis), the code is returned to the developer for improvement. Developers and reviewers typically do not have many tools at their disposal to test code changes against anything close to production data. QA maintains a machine with a copy of production data but tests against smaller data unless a performance problem is suspected. + +A new QA guide is coming in https://github.com/IQSS/dataverse/pull/10103 + +Locust Testing at Release Time +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As one of the final steps in preparing for a release, QA runs performance tests using a tool called Locust as explained the Developer Guide (see :ref:`locust`). The tests are not comprehensive, testing only a handful of pages with anonymous users, but they increase confidence that the upcoming release is not drastically slower than previous releases. + +Issue Tracking and Prioritization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Performance issues are tracked in our issue tracker under the `Feature: Performance & Stability `_ label (e.g. `#7788 `_). That way, we can track performance problems throughout the application. Unfortunately, the pain is often felt by users in production before we realize there is a problem. As needed, performance issues are prioritized to be included in a sprint, to \ `speed up the collection page `_, for example. + +Document Performance Tools +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the :doc:`/admin/monitoring` page section of the Admin Guide we describe how to set up Munin for monitoring performance of an operating system. We also explain how to set up Performance Insights to monitor AWS RDS (PostgreSQL as a service, in our case). In the :doc:`/developers/tools` section of the Developer Guide, we have documented how to use Eclipse Memory Analyzer Tool (MAT), SonarQube, jmap, and jstat. + +Google Analytics +~~~~~~~~~~~~~~~~ + +Emails go to a subset of the team monthly with subjects like "Your September Search performance for https://dataverse.harvard.edu" with a link to a report but it's mostly about the number clicks, not how fast the site is. It's unclear if it provides any value with regard to performance. + +Abandoned Tools and Practices +----------------------------- + +New Relic +~~~~~~~~~ + +For many years Harvard Dataverse was hooked up to New Relic, a tool that promises all-in-one observability, according to their `website `_. In practice, we didn't do much with `the data `_. + +Areas of Particular Concern +--------------------------- + +Command Engine Execution Rate Metering +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We'd like to rate limit commands (CreateDataset, etc.) so that we can keep them at a reasonable level (`#9356 `_). This is similar to how many APIs are rate limited, such as the GitHub API. + +Solr +~~~~ + +While in the past Solr performance hasn't been much of a concern, in recent years we've noticed performance problems when Harvard Dataverse is under load. Improvements were made in `PR #10050 `_, for example. + +We are tracking performance problems in `#10469 `_. + +In a meeting with a Solr expert on 2024-05-10 we were advised to avoid joins as much as possible. (It was acknowledged that many Solr users make use of joins because they have to, like we do, to keep some documents private.) Toward that end we have added two feature flags called ``avoid-expensive-solr-join`` and ``add-publicobject-solr-field`` as explained under :ref:`feature-flags`. It was confirmed experimentally that performing the join on all the public objects (published collections, datasets and files), i.e., the bulk of the content in the search index, was indeed very expensive, especially on a large instance the size of the IQSS prod. archive, especially under indexing load. We confirmed that it was in fact unnecessary and were able to replace it with a boolean field directly in the indexed documents, which is achieved by the two feature flags above. However, as of writing this, this mechanism should still be considered experimental. +Another flag, ``reduce-solr-deletes``, avoids deleting solr documents for files in a dataset prior to sending updates. It also eliminates several causes of orphan permission documents. This is expected to improve indexing performance to some extent and is a step towards avoiding unnecessary updates (i.e. when a doc would not change). + +Datasets with Large Numbers of Files or Versions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We'd like to scale Dataverse to better handle large number of files or versions. Progress was made in `PR #9883 `_. + +Withstanding Bots +~~~~~~~~~~~~~~~~~ + +Google bot, etc. + +Suggested Practices +------------------- + +Many of our current practices should remain in place unaltered. Others could use some refinement. Some new practices should be adopted as well. Here are some suggestions. + +Implement the Frontend Plan for Performance +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The `Dataverse - SPA MVP Definition doc `_  has some ideas around how to achieve good performance for the new front end in the areas of rendering, monitoring,file upload/download, pagination, and caching. We should create as many issues as necessary in the frontend repo and work on them in time. The doc recommends the use of `React Profiler `_ and other tools. Not mentioned is https://pagespeed.web.dev but we can investigate it as well. See also `#183 `_, a parent issue about performance. In `#184 `_  we plan to compare the performance of the old JSF UI vs. the new React UI. Cypress plugins for load testing could be investigated. + +Set up Query Counter in Jenkins +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +See countquery script above. See also https://jenkins.dataverse.org/job/IQSS-dataverse-develop/ws/target/query_count.out + +Show the plot over time. Make spikes easily apparent. 320,035 queries as of this writing. + +Count Database Queries per API Test +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Is it possible? Just a thought. + +Teach Developers How to Do Performance Testing Locally +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Do developers know how to use a profiler? Should they use `JMeter `_? `statsd-jvm-profiler `_? How do you run our :ref:`locust` tests? Should we continue using that tool? Give developers time and space to try out tools and document any tips along the way. For this stage, small data is fine. + +Automate Performance Testing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We are already using two excellent continuous integration (CI) tools, Jenkins and GitHub Actions, to test our code. We should add performance testing into the mix (`#4201 `_ is an old issue for this but we can open a fresh one). Currently we test every commit on every PR and we should consider if this model makes sense since performance testing will likely take longer to run than regular tests. Once developers are comfortable with their favorite tools, we can pick which ones to automate. + +Make Production Data or Equivalent Available to Developers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If developers are only testing small amounts of data on their laptops, it's hard to detect performance problems. Not every bug fix requires access to data similar to production, but it should be made available. This is not a trivial task! If we are to use actual production data, we need to be very careful to de-identify it. If we start with our `sample-data `_  repo instead, we'll need to figure out how to make sure we cover cases like many files, many versions, etc. + +Automate Performance Testing with Production Data or Equivalent +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Hopefully the environment developers use with production data or equivalent can be made available to our CI tools. Perhaps these tests don't need to be run on every commit to every pull request, but they should be run regularly. + +Use Monitoring as Performance Testing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Monitoring can be seen as a form of testing. How long is a round trip ping to production? What is the Time to First Byte? First Contentful Paint? Largest Contentful Paint? Time to Interactive? We now have a beta server that we could monitor continuously to know if our app is getting faster or slower over time. Should our monitoring of production servers be improved? + +Learn from Training and Conferences +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Most likely there is training available that is oriented toward performance. The subject of performance often comes up at conferences as well. + +Learn from the Community How They Monitor Performance +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Some members of the Dataverse community are likely users of newish tools like the ELK stack (Elasticsearch, Logstash, and Kibana), the TICK stack (Telegraph InfluxDB Chronograph and Kapacitor), GoAccess, Prometheus, Graphite, and more we haven't even heard of. In the :doc:`/admin/monitoring` section of the Admin Guide, we already encourage the community to share findings, but we could dedicate time to this topic at our annual meeting or community calls. + +Teach the Community to Do Performance Testing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We have a worldwide community of developers. We should do what we can in the form of documentation and other resources to help them develop performant code. + +Conclusion +---------- + +Given its long history, Dataverse has encountered many performance problems over the years. The core team is conversant in how to make the app more performant, but investment in learning additional tools and best practices would likely yield dividends. We should automate our performance testing, catching more problems before code is merged. diff --git a/doc/sphinx-guides/source/developers/remote-users.rst b/doc/sphinx-guides/source/developers/remote-users.rst index 4a517c1beb2..38b3edab772 100755 --- a/doc/sphinx-guides/source/developers/remote-users.rst +++ b/doc/sphinx-guides/source/developers/remote-users.rst @@ -1,6 +1,6 @@ -==================== -Shibboleth and OAuth -==================== +========================== +Shibboleth, OAuth and OIDC +========================== .. contents:: |toctitle| :local: @@ -8,9 +8,9 @@ Shibboleth and OAuth Shibboleth and OAuth -------------------- -If you are working on anything related to users, please keep in mind that your changes will likely affect Shibboleth and OAuth users. For some background on user accounts in Dataverse, see "Auth Modes: Local vs. Remote vs. Both" in the :doc:`/installation/config` section of the Installation Guide. +If you are working on anything related to users, please keep in mind that your changes will likely affect Shibboleth and OAuth users. For some background on user accounts in the Dataverse Software, see :ref:`auth-modes` section of Configuration in the Installation Guide. -Rather than setting up Shibboleth on your laptop, developers are advised to simply add a value to their database to enable Shibboleth "dev mode" like this: +Rather than setting up Shibboleth on your laptop, developers are advised to add the Shibboleth auth provider (see "Add the Shibboleth Authentication Provider to Your Dataverse Installation" at :doc:`/installation/shibboleth`) and add a value to their database to enable Shibboleth "dev mode" like this: ``curl http://localhost:8080/api/admin/settings/:DebugShibAccountType -X PUT -d RANDOM`` @@ -26,8 +26,52 @@ In addition to setting up OAuth on your laptop for real per above, you can also For a list of possible values, please "find usages" on the settings key above and look at the enum. -Now when you go to http://localhost:8080/oauth2/firstLogin.xhtml you should be prompted to create a Shibboleth account. +Now when you go to http://localhost:8080/oauth2/firstLogin.xhtml you should be prompted to create an OAuth account. ---- -Previous: :doc:`unf/index` | Next: :doc:`geospatial` +.. _oidc-dev: + +OpenID Connect (OIDC) +--------------------- + +STOP! ``oidc-keycloak-auth-provider.json`` was changed from http://localhost:8090 to http://keycloak.mydomain.com:8090 to test :ref:`bearer-tokens`. In addition, ``docker-compose-dev.yml`` in the root of the repo was updated to start up Keycloak. To use these, you should add ``127.0.0.1 keycloak.mydomain.com`` to your ``/etc/hosts file``. If you'd like to use the docker compose as described below (``conf/keycloak/docker-compose.yml``), you should revert the change to ``oidc-keycloak-auth-provider.json``. + +If you are working on the OpenID Connect (OIDC) user authentication flow, you do not need to connect to a remote provider (as explained in :doc:`/installation/oidc`) to test this feature. Instead, you can use the available configuration that allows you to run a test Keycloak OIDC identity management service locally through a Docker container. + +(Please note! The client secret (``94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8``) is hard-coded in ``test-realm.json`` and ``oidc-keycloak-auth-provider.json``. Do not use this config in production! This is only for developers.) + +You can find this configuration in ``conf/keycloak``. There are two options available in this directory to run a Keycloak container: bash script or docker-compose. + +To run the container via bash script, execute the following command (positioned in ``conf/keycloak``): + +``./run-keycloak.sh`` + +The script will create a Keycloak container or restart it if the container was already created and stopped. Once the script is executed, Keycloak should be accessible from http://localhost:8090/ + +Now load the configuration defined in ``oidc-keycloak-auth-provider.json`` into your Dataverse installation to enable Keycloak as an authentication provider. + +``curl -X POST -H 'Content-type: application/json' --upload-file oidc-keycloak-auth-provider.json http://localhost:8080/api/admin/authenticationProviders`` + +You should see the new provider, called "OIDC-Keycloak", under "Other options" on the Log In page. + +You should be able to log into Keycloak with the one of the following credentials: + +.. list-table:: + + * - Username + - Password + * - admin + - admin + * - curator + - curator + * - user + - user + * - affiliate + - affiliate + +In case you want to stop and remove the Keycloak container, just run the other available bash script: + +``./rm-keycloak.sh`` + +Note: the Keycloak admin to login at the admin console is ``kcadmin:kcpassword`` diff --git a/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst b/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst new file mode 100644 index 00000000000..a8f87f13375 --- /dev/null +++ b/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst @@ -0,0 +1,299 @@ +Direct DataFile Upload/Replace API +================================== + +The direct Datafile Upload API is used internally to support direct upload of files to S3 storage and by tools such as the DVUploader. + +.. contents:: |toctitle| + :local: + +Overview +-------- + +Direct upload involves a series of three activities, each involving interacting with the server for a Dataverse installation: + +* Requesting initiation of a transfer from the server +* Use of the pre-signed URL(s) returned in that call to perform an upload/multipart-upload of the file to S3 +* A call to the server to register the file/files as part of the dataset/replace a file in the dataset or to cancel the transfer + +This API is only enabled when a Dataset is configured with a data store supporting direct S3 upload. +Administrators should be aware that partial transfers, where a client starts uploading the file/parts of the file and does not contact the server to complete/cancel the transfer, will result in data stored in S3 that is not referenced in the Dataverse installation (e.g. should be considered temporary and deleted.) + + +Requesting Direct Upload of a DataFile +-------------------------------------- +To initiate a transfer of a file to S3, make a call to the Dataverse installation indicating the size of the file to upload. The response will include a pre-signed URL(s) that allow the client to transfer the file. Pre-signed URLs include a short-lived token authorizing the action represented by the URL. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV + export SIZE=1000000000 + + curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/uploadurls?persistentId=$PERSISTENT_IDENTIFIER&size=$SIZE" + +The response to this call, assuming direct uploads are enabled, will be one of two forms: + +Single URL: when the file is smaller than the size at which uploads must be broken into multiple parts + +.. code-block:: bash + + { + "status":"OK", + "data":{ + "url":"...", + "partSize":1073741824, + "storageIdentifier":"s3://demo-dataverse-bucket:177883619b8-892ca9f7112e" + } + } + +Multiple URLs: when the file must be uploaded in multiple parts. The part size is set by the Dataverse installation and, for AWS-based storage, range from 5 MB to 5 GB + +.. code-block:: bash + + { + "status":"OK", + "data":{ + "urls":{ + "1":"...", + "2":"...", + "3":"...", + "4":"...", + "5":"..." + } + "abort":"/api/datasets/mpupload?...", + "complete":"/api/datasets/mpupload?..." + "partSize":1073741824, + "storageIdentifier":"s3://demo-dataverse-bucket:177883b000e-49cedef268ac" + } + +The call will return a 400 (BAD REQUEST) response if the file is larger than what is allowed by the :ref:`:MaxFileUploadSizeInBytes`) and/or a quota (see :doc:`/admin/collectionquotas`). + +In the example responses above, the URLs, which are very long, have been omitted. These URLs reference the S3 server and the specific object identifier that will be used, starting with, for example, https://demo-dataverse-bucket.s3.amazonaws.com/10.5072/FK2FOQPJS/177883b000e-49cedef268ac?... + +.. _direct-upload-to-s3: + +Upload Files to S3 +------------------ + +The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file. + +In the single part case, only one call to the supplied URL is required: + +.. code-block:: bash + + curl -i -H 'x-amz-tagging:dv-state=temp' -X PUT -T "" + +Or, if you have disabled S3 tagging (see :ref:`s3-tagging`), you should omit the header like this: + +.. code-block:: bash + + curl -i -X PUT -T "" + +Note that without the ``-i`` flag, you should not expect any output from the command above. With the ``-i`` flag, you should expect to see a "200 OK" response. + +In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a slice of the total file, with the last part containing the remaining bytes. +The responses from the S3 server for these calls will include the 'eTag' for the uploaded part. + +To successfully conclude the multipart upload, the client must call the 'complete' URI, sending a json object including the part eTags: + +.. code-block:: bash + + curl -X PUT "$SERVER_URL/api/datasets/mpload?..." -d '{"1":"","2":"","3":"","4":"","5":""}' + +If the client is unable to complete the multipart upload, it should call the abort URL: + +.. code-block:: bash + + curl -X DELETE "$SERVER_URL/api/datasets/mpload?..." + +.. note:: + If you encounter an ``HTTP 501 Not Implemented`` error, ensure the ``Content-Length`` header is correctly set to the file or chunk size. This issue may arise when streaming files or chunks asynchronously to S3 via ``PUT`` requests, particularly if the library or tool you're using doesn't set the ``Content-Length`` header automatically. + +.. _direct-add-to-dataset-api: + +Adding the Uploaded File to the Dataset +--------------------------------------- + +Once the file exists in the s3 bucket, a final API call is needed to add it to the Dataset. This call is the same call used to upload a file to a Dataverse installation but, rather than sending the file bytes, additional metadata is added using the "jsonData" parameter. +jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must also include values for: + +* "storageIdentifier" - String, as specified in prior calls +* "fileName" - String +* "mimeType" - String +* fixity/checksum: either: + + * "md5Hash" - String with MD5 hash value, or + * "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings + +The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512 + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV + export JSON_DATA="{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}" + + curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA" + +Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide. +With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above. + +To Add Multiple Uploaded Files to the Dataset +--------------------------------------------- + +Once the files exists in the s3 bucket, a final API call is needed to add all the files to the Dataset. In this API call, additional metadata is added using the "jsonData" parameter. +jsonData for this call is an array of objects that normally include information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must also include values for: + +* "description" - A description of the file +* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset +* "storageIdentifier" - String +* "fileName" - String +* "mimeType" - String +* "fixity/checksum" either: + + * "md5Hash" - String with MD5 hash value, or + * "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings + +The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512 + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV + export JSON_DATA="[{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}, \ + {'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53', 'fileName':'file2.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123789'}}]" + + curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/addFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA" + +Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide. +With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above. + +Replacing an Existing File in the Dataset +----------------------------------------- + +Once the file exists in the s3 bucket, a final API call is needed to register it as a replacement of an existing file. This call is the same call used to replace a file to a Dataverse installation but, rather than sending the file bytes, additional metadata is added using the "jsonData" parameter. +jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, whether to allow the mimetype to change (forceReplace=true), etc. For direct uploads, the jsonData object must include values for: + +* "storageIdentifier" - String, as specified in prior calls +* "fileName" - String +* "mimeType" - String +* fixity/checksum: either: + + * "md5Hash" - String with MD5 hash value, or + * "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings + +The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512. +Note that the API call does not validate that the file matches the hash value supplied. If a Dataverse instance is configured to validate file fixity hashes at publication time, a mismatch would be caught at that time and cause publication to fail. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export FILE_IDENTIFIER=5072 + export JSON_DATA='{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "forceReplace":"true", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}}' + + curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA" + +Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide. +With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above. + +Replacing Multiple Existing Files in the Dataset +------------------------------------------------ + +Once the replacement files exist in the s3 bucket, a final API call is needed to register them as replacements for existing files. In this API call, additional metadata is added using the "jsonData" parameter. +jsonData for this call is array of objects that normally include information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must include some additional values: + +* "fileToReplaceId" - the id of the file being replaced +* "forceReplace" - whether to replace a file with one of a different mimetype (optional, default is false) +* "description" - A description of the file +* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset +* "storageIdentifier" - String +* "fileName" - String +* "mimeType" - String +* "fixity/checksum" either: + + * "md5Hash" - String with MD5 hash value, or + * "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings + + +The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512 + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV + export JSON_DATA='[{"fileToReplaceId": 10, "description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}},{"fileToReplaceId": 11, "forceReplace": true, "description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123789"}}]' + + curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/replaceFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA" + +The JSON object returned as a response from this API call includes a "data" that indicates how many of the file replacements succeeded and provides per-file error messages for those that don't, e.g. + +.. code-block:: + + { + "status": "OK", + "data": { + "Files": [ + { + "storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", + "errorMessage": "Bad Request:The file to replace does not belong to this dataset.", + "fileDetails": { + "fileToReplaceId": 10, + "description": "My description.", + "directoryLabel": "data/subdir1", + "categories": [ + "Data" + ], + "restrict": "false", + "storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", + "fileName": "file1.Bin", + "mimeType": "application/octet-stream", + "checksum": { + "@type": "SHA-1", + "@value": "123456" + } + } + }, + { + "storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53", + "successMessage": "Replaced successfully in the dataset", + "fileDetails": { + "description": "My description.", + "label": "file2.txt", + "restricted": false, + "directoryLabel": "data/subdir1", + "categories": [ + "Data" + ], + "dataFile": { + "persistentId": "", + "pidURL": "", + "filename": "file2.txt", + "contentType": "text/plain", + "filesize": 2407, + "description": "My description.", + "storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53", + "rootDataFileId": 11, + "previousDataFileId": 11, + "checksum": { + "type": "SHA-1", + "value": "123789" + } + } + } + } + ], + "Result": { + "Total number of files": 2, + "Number of files successfully replaced": 1 + } + } + } + + +Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide. +With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above. diff --git a/doc/sphinx-guides/source/developers/search-services.rst b/doc/sphinx-guides/source/developers/search-services.rst new file mode 100644 index 00000000000..9a124babeb0 --- /dev/null +++ b/doc/sphinx-guides/source/developers/search-services.rst @@ -0,0 +1,141 @@ +Search Services +=============== + +Dataverse supports configurable search services, allowing developers to integrate additional search engines dynamically. This guide outlines the design and provides details on how to use the interfaces and classes involved. + +Design Overview +--------------- +The configurable search services feature is designed to allow: + +1. Dynamic addition of new search engines +2. Configuration of the Dataverse UI to use a specified search engine +3. Use of different search engines via the API +4. Discovery of installed search engines + +Key Components +-------------- + +1. SearchService Interface +^^^^^^^^^^^^^^^^^^^^^^^^^^ +The ``SearchService`` interface is the core of the configurable search services. It defines the methods that any search engine implementation must provide. (The methods below are accurate as of this writing.) + +.. code-block:: java + + public interface SearchService { + String getServiceName(); + String getDisplayName(); + + SolrQueryResponse search(DataverseRequest dataverseRequest, List dataverses, String query, + List filterQueries, String sortField, String sortOrder, int paginationStart, + boolean onlyDatatRelatedToMe, int numResultsPerPage, boolean retrieveEntities, String geoPoint, + String geoRadius, boolean addFacets, boolean addHighlights) throws SearchException; + + default void setSolrSearchService(SearchService solrSearchService); + } + +The interface allows you to provide a service name and display name, and to respond to the same search parameters that are normally sent to the Solr search engine. + +The ``setSolrSearchService`` method is used by Dataverse to give your class a reference to the ``SolrSearchService``, allowing your class to perform Solr queries as needed. (See the ``ExternalSearchServices`` for an example.) + +2. ConfigurableSearchService Interface +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``ConfigurableSearchService`` interface extends the ``SearchService`` interface and adds a method for Dataverse to set the ``SettingsServiceBean``. This allows search services to be configurable through Dataverse settings. + +.. code-block:: java + + public interface ConfigurableSearchService extends SearchService { + void setSettingsService(SettingsServiceBean settingsService); + } + +The ``GetExternalSearchServiceBean`` and ``PostExternalSearchServiceBean`` classes provide a use case for this. + +3. JVM Options for Search Configuration +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Dataverse uses two JVM options to configure the search functionality: + +- ``dataverse.search.services.directory``: Specifies the local directory where jar files with search engines (classes implementing the ``SearchService`` interface) can be found. Dataverse will dynamically load engines from this directory. + +- ``dataverse.search.default-service``: The ``serviceName`` of the service that should be used in the Dataverse UI. + +Example configuration: + +.. code-block:: bash + + ./asadmin create-jvm-options "-Ddataverse.search.services.directory=/var/lib/dataverse/searchServices" + ./asadmin create-jvm-options "-Ddataverse.search.default-service=solr" + +Remember to restart your Payara server after modifying these JVM options for the changes to take effect. + +4. Using Different Search Engines via API +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The loaded search services can be discovered using the ``/api/search/services`` endpoint. + +Queries can be made to different engines by including the optional ``search_service=`` query parameter. + +Use of these endpoints is described for end users in the API Guide under :ref:`search-services`. + +Available Search Services +------------------------- + +The class definitions for four example search services are included in the Dataverse repository. +They are not included in the Dataverse .war file but can be built as three separate .jar files using + +.. code-block:: bash + + mvn clean package -DskipTests=true -Pexternal-search-get -Pexternal-search-post + +or + +.. code-block:: bash + + mvn clean package -DskipTests=true -Ptrivial-search-examples + +1. GetExternalSearchServiceBean +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +2. PostExternalSearchServiceBean +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These classes implement the ``ConfigurableSearchService`` interface. +They make a GET or POST call (respectively) to an external search engine that must return a JSON array of objects with "PID" (preferred) or "DOI" and "Distance" keys. +The query sent to the external engine use the same query parameters as the Dataverse search API (GET) or have a JSON payload with those keys (POST). +The results they return are then searched for using the solr search engine which enforces access control and provides the standard formatting expected by the Dataverse UI and API. +The distance values are used to order the results, smallest distances first. + +They can be configured via two settings each: + +- GET + + - :GetExternalSearchUrl - the URL to send GET search queries to + - :GetExternalSearchName - the display name to use for this configuration + +- POST + + - :PostExternalSearchUrl - the URL to send POST search queries to + - :PostExternalSearchName - the display name to use for this configuration + +As these classes use PIDs as identifiers, they cannot reference collections or, unless file PIDs are enabled, files. +Similar classes, or extensions of these classes could search by database ids instead, etc. to support the additional types. + +3. GoldenOldiesSearchServiceBean +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +4. OddlyEnoughSearchServiceBean +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These classes implement the ``SearchService`` interface. +They are intended only as code examples and simple tests of the design and are not intended for production use. +The former simply replaces the user query with a query for entities with a db id < 1000. It demonstrates how a class can leverage the solr engine and achieve results solely by modifying/replacing the user query. +The latter only returns hits from the user's query that also have an odd database id. Since the filtering in the class changes the number of total hits available and pagination, this class demonstrates one way a developer can adjust those aspects of the Solr response. + +Notes +----- + +1. Unless you use the Solr engine to provide access control, you must implement proper access control in your search engine +2. The design currently limits search results to be in the format returned by Solr and the hits are expected to be collections, datasets, or files - other classes are not supported. +3. Search services could be designed to completely replace Solr or to just support certain use cases (e.g. the external search classes only handling datasets). +4. While search services can be deployed as independent jar files, they currently import multiple Dataverse classes and, unlike exporters, cannot be built using just the Dataverse SPI. +5. As with other experimental features, we expect the ``SearchService`` interface may change over time as we learn about how people use it. Please keep in touch if you are developing search services. + diff --git a/doc/sphinx-guides/source/developers/security.rst b/doc/sphinx-guides/source/developers/security.rst new file mode 100755 index 00000000000..09b80a4c840 --- /dev/null +++ b/doc/sphinx-guides/source/developers/security.rst @@ -0,0 +1,34 @@ +======== +Security +======== + +This section describes security practices and procedures for the Dataverse team. + +.. contents:: |toctitle| + :local: + +Intake of Security Issues +------------------------- + +As described under :ref:`reporting-security-issues`, we encourage the community to email security@dataverse.org if they have any security concerns. These emails go into our private ticket tracker (RT_). + +.. _RT: https://help.hmdc.harvard.edu + +We use a private GitHub issue tracker at https://github.com/IQSS/dataverse-security/issues for security issues. + +Sending Security Notices +------------------------ + +When drafting the security notice, it might be helpful to look at `previous examples`_. + +.. _previous examples: https://drive.google.com/drive/folders/0B_qMYwdHFZghaDZIU2hWQnBDZVE?resourcekey=0-SYjuhCohAIM7_pmysVc3Xg&usp=sharing + +Gather email addresses from the following sources (these are also described under :ref:`ongoing-security` in the Installation Guide): + +- "contact_email" in the `public installation spreadsheet`_ +- "Other Security Contacts" in the `private installation spreadsheet`_ + +Once you have the emails, include them as bcc. + +.. _public installation spreadsheet: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0 +.. _private installation spreadsheet: https://docs.google.com/spreadsheets/d/1EWDwsj6eptQ7nEr-loLvdU7I6Tm2ljAplfNSVWR42i0/edit?usp=sharing diff --git a/doc/sphinx-guides/source/developers/selinux.rst b/doc/sphinx-guides/source/developers/selinux.rst index d7f5b0d7519..1d3d01610fe 100644 --- a/doc/sphinx-guides/source/developers/selinux.rst +++ b/doc/sphinx-guides/source/developers/selinux.rst @@ -8,7 +8,7 @@ SELinux Introduction ------------ -The ``shibboleth.te`` file below that is mentioned in the :doc:`/installation/shibboleth` section of the Installation Guide was created on CentOS 6 as part of https://github.com/IQSS/dataverse/issues/3406 but may need to be revised for future versions of RHEL/CentOS (pull requests welcome!). The file is versioned with the docs and can be found in the following location: +The ``shibboleth.te`` file below that was mentioned in the :doc:`/installation/shibboleth` section of the Installation Guide was created on CentOS 6 as part of https://github.com/IQSS/dataverse/issues/3406 but may need to be revised for future versions of RHEL/CentOS (pull requests welcome!). The file is versioned with the docs and can be found in the following location: ``doc/sphinx-guides/source/_static/installation/files/etc/selinux/targeted/src/policy/domains/misc/shibboleth.te`` @@ -44,7 +44,7 @@ Use ``semodule -l | grep shibboleth`` to see if the ``shibboleth.te`` rules are Exercising SELinux denials ~~~~~~~~~~~~~~~~~~~~~~~~~~ -As of this writing, there are two optional components of Dataverse that are known not to work with SELinux out of the box with SELinux: Shibboleth and rApache. +As of this writing, the only component of the Dataverse Software which is known not to work with SELinux out of the box is Shibboleth. We will be exercising SELinux denials with Shibboleth, and the SELinux-related issues are expected out the box: @@ -109,7 +109,3 @@ Once your updated SELinux rules are in place, try logging in with Shibboleth aga Keep iterating until it works and then create a pull request based on your updated file. Good luck! Many thanks to Bill Horka from IQSS for his assistance in explaining how to construct a SELinux Type Enforcement (TE) file! - ----- - -Previous: :doc:`geospatial` diff --git a/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst b/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst index 07f65f6828a..409242101b8 100644 --- a/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst +++ b/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst @@ -2,7 +2,7 @@ SQL Upgrade Scripts =================== -The database schema for Dataverse is constantly evolving and we have adopted a tool called Flyway to help keep your development environment up to date and in working order. As you make changes to the database schema (changes to ``@Entity`` classes), you must write SQL upgrade scripts when needed and follow Flyway file naming conventions. +The database schema for the Dataverse Software is constantly evolving and we have adopted a tool called Flyway to help keep your development environment up to date and in working order. As you make changes to the database schema (changes to ``@Entity`` classes), you must write SQL upgrade scripts when needed and follow Flyway file naming conventions. .. contents:: |toctitle| :local: @@ -17,20 +17,24 @@ In the past (before adopting Flyway) we used to keep SQL upgrade scripts in ``sc How to Determine if You Need to Create a SQL Upgrade Script ----------------------------------------------------------- -If you are creating a new database table (which maps to an ``@Entity`` in JPA), you do not need to create or update a SQL upgrade script. The reason for this is that we use ``create-tables`` in ``src/main/resources/META-INF/persistence.xml`` so that new tables are automatically created by Glassfish when you deploy your war file. +If you are creating a new database table (which maps to an ``@Entity`` in JPA), you do not need to create or update a SQL upgrade script. The reason for this is that we use ``create-tables`` in ``src/main/resources/META-INF/persistence.xml`` so that new tables are automatically created by the app server when you deploy your war file. If you are doing anything other than creating a new database table such as adding a column to an existing table, you must create or update a SQL upgrade script. +.. _create-sql-script: + How to Create a SQL Upgrade Script ---------------------------------- We assume you have already read the :doc:`version-control` section and have been keeping your feature branch up to date with the "develop" branch. -Create a new file called something like ``V4.11.0.1__5565-sanitize-directory-labels.sql`` in the ``src/main/resources/db/migration`` directory. Use a version like "4.11.0.1" in the example above where the previously released version was 4.11, ensuring that the version number is unique. Note that this is not the version that you expect the code changes to be included in (4.12 in this example). For the "description" you should the name of your branch, which should include the GitHub issue you are working on, as in the example above. To read more about Flyway file naming conventions, see https://flywaydb.org/documentation/migrations#naming +Create a new SQL file in the ``src/main/resources/db/migration`` directory and put a short, meaningful comment at the top. Make the filename something like ``V6.1.0.1.sql``. In this example ``6.1`` represents the current version of Dataverse, with the last digit representing number of the script for that version. (The zero in this example is a placeholder in case the current version has a third number like ``6.1.1``.) Should a newer version be merged while you work on your pull request (PR), you must update your script to the next available number such as ``V6.1.0.2.sql``. + +Previously, we used longer, more descriptive file naming conventions supported by Flyway. However, this approach occasionally led to inadvertent merging of multiple scripts with the same version, such as ``V6.0.0.1__0000-wonderful-pr.sql`` and ``V6.0.0.1__0001-lovely-pr.sql`` where ``V6.0.0.1`` must be unique. After careful consideration, we agreed to adopt the convention mentioned above for naming files. This helps us detect conflicts before merging a PR, preventing the develop branch from being undeployable due to a Flyway conflict. For more information on Flyway file naming conventions, see https://documentation.red-gate.com/fd/migrations-184127470.html The SQL migration script you wrote will be part of the war file and executed when the war file is deployed. To see a history of Flyway database migrations that have been applied, look at the ``flyway_schema_history`` table. -As with any task related to Dataverse development, if you need any help writing SQL upgrade scripts, please reach out using any of the channels mentioned under "Getting Help" in the :doc:`intro` section. +As with any task related to the development of the Dataverse Software, if you need any help writing SQL upgrade scripts, please reach out using any of the channels mentioned under "Getting Help" in the :doc:`intro` section. Troubleshooting --------------- @@ -38,10 +42,6 @@ Troubleshooting Renaming SQL Upgrade Scripts ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Please note that if you need to rename your script (because a new version of Dataverse was released, for example), you will see the error "FlywayException: Validate failed: Detected applied migration not resolved locally" when you attempt to deploy and deployment will fail. +Please note that if you need to rename your script (because a new version of the Dataverse Software was released, for example), you will see the error "FlywayException: Validate failed: Detected applied migration not resolved locally" when you attempt to deploy and deployment will fail. To resolve this problem, delete the old migration from the ``flyway_schema_history`` table and attempt to redeploy. - ----- - -Previous: :doc:`version-control` | Next: :doc:`testing` diff --git a/doc/sphinx-guides/source/developers/testing.rst b/doc/sphinx-guides/source/developers/testing.rst index fc3910ac36a..1690864d453 100755 --- a/doc/sphinx-guides/source/developers/testing.rst +++ b/doc/sphinx-guides/source/developers/testing.rst @@ -2,10 +2,10 @@ Testing ======= -In order to keep our codebase healthy, the Dataverse project encourages developers to write automated tests in the form of unit tests and integration tests. We also welcome ideas for how to improve our automated testing. +In order to keep our codebase healthy, the Dataverse Project encourages developers to write automated tests in the form of unit tests and integration tests. We also welcome ideas for how to improve our automated testing. .. contents:: |toctitle| - :local: + :local: The Health of a Codebase ------------------------ @@ -23,34 +23,38 @@ Testing in Depth `Security in depth `_ might mean that your castle has a moat as well as high walls. Likewise, when testing, you should consider testing a various layers of the stack using both unit tests and integration tests. -When writing tests, you may find it helpful to first map out which functions of your code you want to test, and then write a functional unit test for each which can later comprise a larger integration test. +When writing tests, you may find it helpful to first map out which functions of your code you want to test, and then write a functional unit test for each which can later comprise a larger integration test. Unit Tests ---------- Creating unit tests for your code is a helpful way to test what you've built piece by piece. -Unit tests can be executed without runtime dependencies on PostgreSQL, Solr, or any other external system. They are the lowest level of testing and are executed constantly on developers' laptops as part of the build process and via continous integration services in the cloud. +Unit tests can be executed without runtime dependencies on PostgreSQL, Solr, or any other external system. They are the lowest level of testing and are executed constantly on developers' laptops as part of the build process and via continuous integration services in the cloud. -A unit test should execute an operation of your code in a controlled fashion. You must make an assertion of what the expected response gives back. It's important to test optimistic output and assertions (the "happy path"), as well as unexpected input that leads to failure conditions. Know how your program should handle anticipated errors/exceptions and confirm with your test(s) that it does so properly. +A unit test should execute an operation of your code in a controlled fashion. You must make an assertion of what the expected response gives back. It's important to test optimistic output and assertions (the "happy path"), as well as unexpected input that leads to failure conditions. Know how your program should handle anticipated errors/exceptions and confirm with your test(s) that it does so properly. Unit Test Automation Overview ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -We use a variety of tools to write, execute, and measure the code coverage of unit tests, including Maven, JUnit, Jacoco, GitHub, Travis, and Coveralls. We'll explain the role of each tool below, but here's an overview of what you can expect from the automation we've set up. +We use a variety of tools to write, execute, and measure the code coverage of unit tests, including Maven, JUnit, Jacoco, GitHub, and Coveralls. We'll explain the role of each tool below, but here's an overview of what you can expect from the automation we've set up. -As you prepare to make a pull request, as described in the :doc:`version-control` section, you will be working on a new branch you create from the "develop" branch. Let's say your branch is called ``1012-private-url``. As you work, you are constantly invoking Maven to build the war file. When you do a "clean and build" in Netbeans, Maven runs all the unit tests (anything ending with ``Test.java``) and the runs the results through a tool called Jacoco that calculates code coverage. When you push your branch to GitHub and make a pull request, a web service called Travis CI runs Maven and Jacoco on your branch and pushes the results to Coveralls, which is a web service that tracks changes to code coverage over time. - -To make this more concrete, observe that https://github.com/IQSS/dataverse/pull/3111 has comments from a GitHub user called ``coveralls`` saying things like "Coverage increased (+0.5%) to 5.547% when pulling dd6ceb1 on 1012-private-url into be5b26e on develop." Clicking on the comment should lead you to a URL such as https://coveralls.io/builds/7013870 which shows how code coverage has gone up or down. That page links to a page such as https://travis-ci.org/IQSS/dataverse/builds/144840165 which shows the build on the Travis side that pushed the results ton Coveralls. +As you prepare to make a pull request, as described in the :doc:`version-control` section, you will be working on a new branch you create from the "develop" branch. Let's say your branch is called ``1012-private-url``. As you work, you are constantly invoking Maven to build the war file. When you do a "clean and build" in Netbeans, Maven runs all the unit tests (anything ending with ``Test.java``) and then runs the results through a tool called Jacoco that calculates code coverage. When you push your branch to GitHub and make a pull request, GitHub Actions runs Maven and Jacoco on your branch and pushes the results to Coveralls, which is a web service that tracks changes to code coverage over time. Note that we have configured Coveralls to not mark small decreases in code coverage as a failure. You can find the Coveralls reports at https://coveralls.io/github/IQSS/dataverse The main takeaway should be that we care about unit testing enough to measure the changes to code coverage over time using automation. Now let's talk about how you can help keep our code coverage up by writing unit tests with JUnit. Writing Unit Tests with JUnit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -We are aware that there are newer testing tools such as TestNG, but we use `JUnit `_ because it's tried and true. +We are aware that there are newer testing tools such as TestNG, but we use `JUnit `_ because it's tried and true. +We support JUnit 5 based testing and require new tests written with it. +(Since Dataverse 6.0, we migrated all of our tests formerly based on JUnit 4.) -If writing tests is new to you, poke around existing unit tests which all end in ``Test.java`` and live under ``src/test``. Each test is annotated with ``@Test`` and should have at least one assertion which specifies the expected result. In Netbeans, you can run all the tests in it by clicking "Run" -> "Test File". From the test file, you should be able to navigate to the code that's being tested by right-clicking on the file and clicking "Navigate" -> "Go to Test/Tested class". Likewise, from the code, you should be able to use the same "Navigate" menu to go to the tests. +If writing tests is new to you, poke around existing unit tests which all end in ``Test.java`` and live under ``src/test``. +Each test is annotated with ``@Test`` and should have at least one assertion which specifies the expected result. +In Netbeans, you can run all the tests in it by clicking "Run" -> "Test File". +From the test file, you should be able to navigate to the code that's being tested by right-clicking on the file and clicking "Navigate" -> "Go to Test/Tested class". +Likewise, from the code, you should be able to use the same "Navigate" menu to go to the tests. NOTE: Please remember when writing tests checking possibly localized outputs to check against ``en_US.UTF-8`` and ``UTC`` l10n strings! @@ -60,16 +64,60 @@ Refactoring Code to Make It Unit-Testable Existing code is not necessarily written in a way that lends itself to easy testing. Generally speaking, it is difficult to write unit tests for both JSF "backing" beans (which end in ``Page.java``) and "service" beans (which end in ``Service.java``) because they require the database to be running in order to test them. If service beans can be exercised via API they can be tested with integration tests (described below) but a good technique for making the logic testable it to move code to "util beans" (which end in ``Util.java``) that operate on Plain Old Java Objects (POJOs). ``PrivateUrlUtil.java`` is a good example of moving logic from ``PrivateUrlServiceBean.java`` to a "util" bean to make the code testable. -Parameterized Tests and JUnit Theories -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Often times you will want to test a method multiple times with similar values. In order to avoid test bloat (writing a test for every data combination), JUnit offers Data-driven unit tests with ``Parameterized.class`` and ``Theories.class``. This allows a test to be run for each set of defined data values. For reference, take a look at issue https://github.com/IQSS/dataverse/issues/5619 . +Parameterized Tests +^^^^^^^^^^^^^^^^^^^ + +Often times you will want to test a method multiple times with similar values. +In order to avoid test bloat (writing a test for every data combination), +JUnit offers Data-driven unit tests. This allows a test to be run for each set +of defined data values. + +JUnit 5 offers great parameterized testing. Some guidance how to write those: + +- https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests +- https://www.baeldung.com/parameterized-tests-junit-5 +- https://blog.codefx.org/libraries/junit-5-parameterized-tests/ +- See also many examples in our codebase. + +Note that JUnit 5 also offers support for custom test parameter resolvers. This enables keeping tests cleaner, +as preparation might happen within some extension and the test code is more focused on the actual testing. +See https://junit.org/junit5/docs/current/user-guide/#extensions-parameter-resolution for more information. + +JUnit 5 Test Helper Extensions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Our codebase provides little helpers to ease dealing with state during tests. +Some tests might need to change something which should be restored after the test ran. + +For unit tests, the most interesting part is to set a JVM setting just for the current test or a whole test class. +(Which might be an inner class, too!). Please make use of the ``@JvmSetting(key = JvmSettings.XXX, value = "")`` +annotation and also make sure to annotate the test class with ``@LocalJvmSettings``. + +Inspired by JUnit's ``@MethodSource`` annotation, you may use ``@JvmSetting(key = JvmSettings.XXX, method = "zzz")`` +to reference a static method located in the same test class by name (i. e. ``private static String zzz() {}``) to allow +retrieving dynamic data instead of String constants only. (Note the requirement for a *static* method!) + +If you want to delete a setting, simply provide a ``null`` value. This can be used to override a class-wide setting +or some other default that is present for some reason. + +To set arbitrary system properties for the current test, a similar extension ``@SystemProperty(key = "", value = "")`` +has been added. (Note: it does not support method references.) + +Both extensions will ensure the global state of system properties is non-interfering for +test executions. Tests using these extensions will be executed in serial. + +This settings helper may be extended at a later time to manipulate settings in a remote instance during integration +or end-to-end testing. Stay tuned! Observing Changes to Code Coverage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once you've written some tests, you're probably wondering how much you've helped to increase the code coverage. In Netbeans, do a "clean and build." Then, under the "Projects" tab, right-click "dataverse" and click "Code Coverage" -> "Show Report". For each Java file you have open, you should be able to see the percentage of code that is covered by tests and every line in the file should be either green or red. Green indicates that the line is being exercised by a unit test and red indicates that it is not. -In addition to seeing code coverage in Netbeans, you can also see code coverage reports by opening ``target/site/jacoco/index.html`` in your browser. +In addition to seeing code coverage in Netbeans, you can also see code coverage reports by opening ``target/site/jacoco-X-test-coverage-report/index.html`` in your browser. +Depending on the report type you want to look at, let ``X`` be one of ``unit``, ``integration`` or ``merged``. +"Merged" will display combined coverage of both unit and integration test, but does currently not cover API tests. + Testing Commands ^^^^^^^^^^^^^^^^ @@ -85,105 +133,159 @@ In addition, there is a writeup on "The Testable Command" at https://github.com/ Running Non-Essential (Excluded) Unit Tests ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -You should be aware that some unit tests have been deemed "non-essential" and have been annotated with ``@Category(NonEssentialTests.class)`` and are excluded from the "dev" Maven profile, which is the default profile. All unit tests (that have not been annotated with ``@Ignore``), including these non-essential tests, are run from continuous integration systems such as Jenkins and Travis CI with the following ``mvn`` command that invokes a non-default profile: +You should be aware that some unit tests have been deemed "non-essential" and have been annotated with ``@Tag(Tags.NOT_ESSENTIAL_UNITTESTS)`` and are excluded from the "dev" Maven profile, which is the default profile. +All unit tests (that have not been annotated with ``@Disable``), including these non-essential tests, are run from continuous integration systems such as Jenkins and GitHub Actions with the following ``mvn`` command that invokes a non-default profile: ``mvn test -P all-unit-tests`` -Typically https://travis-ci.org/IQSS/dataverse will show a higher number of unit tests executed because it uses the profile above. +Generally speaking, unit tests have been flagged as non-essential because they are slow or because they require an Internet connection. +You should not feel obligated to run these tests continuously but you can use the ``mvn`` command above to run them. +To iterate on the unit test in Netbeans and execute it with "Run -> Test File", you must temporarily comment out the annotation flagging the test as non-essential. -Generally speaking, unit tests have been flagged as non-essential because they are slow or because they require an Internet connection. You should not feel obligated to run these tests continuously but you can use the ``mvn`` command above to run them. To iterate on the unit test in Netbeans and execute it with "Run -> Test File", you must temporarily comment out the annotation flagging the test as non-essential. +.. _integration-tests: Integration Tests ----------------- -Unit tests are fantastic for low level testing of logic but aren't especially real-world-applicable because they do not exercise Dataverse as it runs in production with a database and other runtime dependencies. We test in-depth by also writing integration tests to exercise a running system. +Unit tests are fantastic for low level testing of logic but aren't especially real-world-applicable because they do not exercise the Dataverse Software as it runs in production with a database and other runtime dependencies. We test in-depth by also writing integration tests to exercise a running system. -Unfortunately, the term "integration tests" can mean different things to different people. For our purposes, an integration test has the following qualities: +Unfortunately, the term "integration tests" can mean different things to +different people. For our purposes, an integration test can have two flavors: -- Integration tests exercise Dataverse APIs. -- Integration tests are not automatically on developers' laptops. -- Integration tests operate on an installation of Dataverse that is running and able to talk to both PostgreSQL and Solr. -- Integration tests are written using REST Assured. +1. Be an API Test: -Running the full API test suite using Docker -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + - Exercise the Dataverse Software APIs. + - Running not automatically on developers' laptops. + - Operate on a Dataverse installation that is running and able to talk to both PostgreSQL and Solr. + - Written using REST Assured. -To run the full suite of integration tests on your laptop, we recommend using the "all in one" Docker configuration described in ``conf/docker-aio/readme.txt`` in the root of the repo. +2. Be a `Testcontainers `__ Test: -Alternatively, you can run tests against Glassfish running on your laptop by following the "getting set up" steps below. + - Operates any dependencies via the Testcontainers API, using containers. + - Written as a JUnit test, using all things necessary to test. + - Makes use of the Testcontainers framework. + - Able to run anywhere having Docker around (podman support under construction). -Getting Set Up to Run REST Assured Tests -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Running the Full API Test Suite Using EC2 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Prerequisite:** To run the API test suite in an EC2 instance you should first follow the steps in the :doc:`deployment` section to get set up with the AWS binary to launch EC2 instances. If you're here because you just want to spin up a branch, you'll still want to follow the AWS deployment setup steps, but may find the `ec2-create README.md `_ Quick Start section helpful. + +You may always retrieve a current copy of the ec2-create-instance.sh script and accompanying group_var.yml file from the `dataverse-ansible repo `_. Since we want to run the test suite, let's grab the group_vars used by Jenkins: -Unit tests are run automatically on every build, but dev environments and servers require special setup to run REST Assured tests. In short, Dataverse needs to be placed into an insecure mode that allows arbitrary users and datasets to be created and destroyed. This differs greatly from the out-of-the-box behavior of Dataverse, which we strive to keep secure for sysadmins installing the software for their institutions in a production environment. +- `ec2-create-instance.sh `_ +- `jenkins.yml `_ -The :doc:`dev-environment` section currently refers developers here for advice on getting set up to run REST Assured tests, but we'd like to add some sort of "dev" flag to the installer to put Dataverse in "insecure" mode, with lots of scary warnings that this dev mode should not be used in production. +Edit ``jenkins.yml`` to set the desired GitHub repo and branch, and to adjust any other options to meet your needs: -The instructions below assume a relatively static dev environment on a Mac. There is a newer "all in one" Docker-based approach documented in the :doc:`/developers/containers` section under "Docker" that you may like to play with as well. +- ``dataverse_repo: https://github.com/IQSS/dataverse.git`` +- ``dataverse_branch: develop`` +- ``dataverse.api.test_suite: true`` +- ``dataverse.unittests.enabled: true`` +- ``dataverse.sampledata.enabled: true`` + +If you wish, you may pass the script a ``-l`` flag with a local relative path in which the script will `copy various logs `_ at the end of the test suite for your review. + +Finally, run the script: + +.. code-block:: bash + + $ ./ec2-create-instance.sh -g jenkins.yml -l log_dir + +Running the Full API Test Suite Using Docker +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To run the full suite of integration tests on your laptop, we recommend running Dataverse and its dependencies in Docker, as explained in the :doc:`/container/dev-usage` section of the Container Guide. This environment provides additional services (such as S3) that are used in testing. + +Running the APIs Without Docker (Classic Dev Env) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +While it is possible to run a good number of API tests without using Docker in our :doc:`classic-dev-env`, we are transitioning toward including additional services (such as S3) in our Dockerized development environment (:doc:`/container/dev-usage`), so you will probably find it more convenient to it instead. + +Unit tests are run automatically on every build, but dev environments and servers require special setup to run API (REST Assured) tests. In short, the Dataverse software needs to be placed into an insecure mode that allows arbitrary users and datasets to be created and destroyed (this is done automatically in the Dockerized environment, as well as the steps described below). This differs greatly from the out-of-the-box behavior of the Dataverse software, which we strive to keep secure for sysadmins installing the software for their institutions in a production environment. The Burrito Key ^^^^^^^^^^^^^^^ -For reasons that have been lost to the mists of time, Dataverse really wants you to to have a burrito. Specifically, if you're trying to run REST Assured tests and see the error "Dataverse config issue: No API key defined for built in user management", you must run the following curl command (or make an equivalent change to your database): +For reasons that have been lost to the mists of time, the Dataverse software really wants you to to have a burrito. Specifically, if you're trying to run REST Assured tests and see the error "Dataverse config issue: No API key defined for built in user management", you must run the following curl command (or make an equivalent change to your database): ``curl -X PUT -d 'burrito' http://localhost:8080/api/admin/settings/BuiltinUsers.KEY`` -Without this "burrito" key in place, REST Assured will not be able to create users. We create users to create objects we want to test, such as dataverses, datasets, and files. +Without this "burrito" key in place, REST Assured will not be able to create users. We create users to create objects we want to test, such as collections, datasets, and files. -Root Dataverse Permissions -^^^^^^^^^^^^^^^^^^^^^^^^^^ +Root Collection Permissions +^^^^^^^^^^^^^^^^^^^^^^^^^^^ -In your browser, log in as dataverseAdmin (password: admin) and click the "Edit" button for your root dataverse. Navigate to Permissions, then the Edit Access button. Under "Who can add to this dataverse?" choose "Anyone with a dataverse account can add sub dataverses" if it isn't set to this already. +In your browser, log in as dataverseAdmin (password: admin) and click the "Edit" button for your root collection. Navigate to Permissions, then the Edit Access button. Under "Who can add to this collection?" choose "Anyone with a Dataverse installation account can add sub collections and datasets" if it isn't set to this already. Alternatively, this same step can be done with this script: ``scripts/search/tests/grant-authusers-add-on-root`` -Publish Root Dataverse -^^^^^^^^^^^^^^^^^^^^^^ +Publish Root Collection +^^^^^^^^^^^^^^^^^^^^^^^ -The root dataverse must be published for some of the REST Assured tests to run. +The root collection must be published for some of the REST Assured tests to run. dataverse.siteUrl ^^^^^^^^^^^^^^^^^ -When run locally (as opposed to a remote server), some of the REST Assured tests require the ``dataverse.siteUrl`` JVM option to be set to ``http://localhost:8080``. See "JVM Options" under the :doc:`/installation/config` section of the Installation Guide for advice changing JVM options. First you should check to check your JVM options with: +When run locally (as opposed to a remote server), some of the REST Assured tests require the ``dataverse.siteUrl`` JVM option to be set to ``http://localhost:8080``. See :ref:`jvm-options` section in the Installation Guide for advice changing JVM options. First you should check to check your JVM options with: ``./asadmin list-jvm-options | egrep 'dataverse|doi'`` If ``dataverse.siteUrl`` is absent, you can add it with: -``./asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8080"`` +``./asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8080"`` + +dataverse.oai.server.maxidentifiers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The OAI Harvesting tests require that the paging limit for ListIdentifiers must be set to 2, in order to be able to trigger this paging behavior without having to create and export too many datasets: + +``./asadmin create-jvm-options "-Ddataverse.oai.server.maxidentifiers=2"`` + +dataverse.oai.server.maxrecords +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Identifier Generation +The OAI Harvesting tests require that the paging limit for ListRecords must be set to 2, in order to be able to trigger this paging behavior without having to create and export too many datasets: + +``./asadmin create-jvm-options "-Ddataverse.oai.server.maxrecords=2"`` + +Identifier Generation ^^^^^^^^^^^^^^^^^^^^^ ``DatasetsIT.java`` exercises the feature where the "identifier" of a DOI can be a digit and requires a sequence to be added to your database. See ``:IdentifierGenerationStyle`` under the :doc:`/installation/config` section for adding this sequence to your installation of PostgreSQL. -Writing Integration Tests with REST Assured -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Writing API Tests with REST Assured +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Before writing any new REST Assured tests, you should get the tests to pass in an existing REST Assured test file. ``BuiltinUsersIT.java`` is relatively small and requires less setup than other test files. +Before writing any new REST Assured tests, you should get the tests to pass in an existing REST Assured test file. ``BuiltinUsersIT.java`` is relatively small and requires less setup than other test files. You do not have to reinvent the wheel. There are many useful methods you can call in your own tests -- especially within UtilIT.java -- when you need your test to create and/or interact with generated accounts, files, datasets, etc. Similar methods can subsequently delete them to get them out of your way as desired before the test has concluded. For example, if you’re testing your code’s operations with user accounts, the method ``UtilIT.createRandomUser();`` can generate an account for your test to work with. The same account can then be deleted by your program by calling the ``UtilIT.deleteUser();`` method on the imaginary friend your test generated. -Remember, it’s only a test (and it's not graded)! Some guidelines to bear in mind: +Remember, it’s only a test (and it's not graded)! Some guidelines to bear in mind: - Map out which logical functions you want to test - Understand what’s being tested and ensure it’s repeatable - Assert the conditions of success / return values for each operation - * A useful resource would be `HTTP status codes `_ + * A useful resource would be `HTTP status codes `_ - Let the code do the labor; automate everything that happens when you run your test file. +- If you need to test an optional service (S3, etc.), add it to our docker compose file. See :doc:`/container/dev-usage`. - Just as with any development, if you’re stuck: ask for help! -To execute existing integration tests on your local Dataverse, a helpful command line tool to use is `Maven `_. You should have Maven installed as per the `Development Environment `_ guide, but if not it’s easily done via Homebrew: ``brew install maven``. +To execute existing integration tests on your local Dataverse installation from the command line, use Maven. You should have Maven installed as per :doc:`dev-environment`, but if not it's easily done via Homebrew: ``brew install maven``. + +Once installed, you may run commands with ``mvn [options] [] []``. + ++ If you want to run just one particular API test class: -Once installed, you may run commands with ``mvn [options] [] []``. + ``mvn test -Dtest=UsersIT`` -+ If you want to run just one particular API test, it’s as easy as you think: ++ If you want to run just one particular API test method, - ``mvn test -Dtest=FileRecordJobIT`` + ``mvn test -Dtest=UsersIT#testMergeAccounts`` + To run more than one test at a time, separate by commas: @@ -193,38 +295,84 @@ Once installed, you may run commands with ``mvn [options] [] [`. +If you are adding a new test class, be sure to add it to :download:`tests/integration-tests.txt <../../../../tests/integration-tests.txt>` so that our automated testing knows about it. -Measuring Coverage of Integration Tests ---------------------------------------- -Measuring the code coverage of integration tests with Jacoco requires several steps. In order to make these steps clear we'll use "/usr/local/glassfish4" as the Glassfish directory and "glassfish" as the Glassfish Unix user. +Writing and Using a Testcontainers Test +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Add jacocoagent.jar to Glassfish -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Most scenarios of integration testing involve having dependent services running. +This is where `Testcontainers `__ kicks in by +providing a JUnit interface to drive them before and after executing your tests. + +Test scenarios are endless. Some examples are migration scripts, persistance, +storage adapters etc. + +To run a test with Testcontainers, you will need to write a JUnit 5 test. +`The upstream project provides some documentation about this. `_ + +Please make sure to: + +1. End your test class with ``IT`` +2. Annotate the test class with two tags: + + .. code:: java + + /** A very minimal example for a Testcontainers integration test class. */ + @Testcontainers(disabledWithoutDocker = true) + @Tag(edu.harvard.iq.dataverse.util.testing.Tags.INTEGRATION_TEST) + @Tag(edu.harvard.iq.dataverse.util.testing.Tags.USES_TESTCONTAINERS) + class MyExampleIT { /* ... */ } + +If using upstream modules, e.g. for PostgreSQL or similar, you will need to add +a dependency to ``pom.xml`` if not present. `See the PostgreSQL module example. `_ + +To run these tests, simply call out to Maven: -In order to get code coverage reports out of Glassfish we'll be adding jacocoagent.jar to the Glassfish "lib" directory. +.. code:: + + mvn verify + +Notes: + +1. Remember to have Docker ready to serve or tests will fail. +2. You can skip running unit tests by adding ``-DskipUnitTests`` +3. You can choose to ignore test with Testcontainers by adding ``-Dit.groups='integration & !testcontainers'`` + Learn more about `filter expressions in the JUnit 5 guide `_. + + +Measuring Coverage of API Tests +------------------------------- + +Measuring the code coverage of API tests with Jacoco requires several steps. In order to make these steps clear we'll use "/usr/local/payara6" as the Payara directory and "dataverse" as the Payara Unix user. + +Please note that this was tested under Glassfish 4 but it is hoped that the same steps will work with Payara. + +Add jacocoagent.jar to Payara +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to get code coverage reports out of Payara we'll be adding jacocoagent.jar to the Payara "lib" directory. First, we need to download Jacoco. Look in pom.xml to determine which version of Jacoco we are using. As of this writing we are using 0.8.1 so in the example below we download the Jacoco zip from https://github.com/jacoco/jacoco/releases/tag/v0.8.1 -Note that we are running the following commands as the user "glassfish". In short, we stop Glassfish, add the Jacoco jar file, and start up Glassfish again. +Note that we are running the following commands as the user "dataverse". In short, we stop Payara, add the Jacoco jar file, and start up Payara again. .. code-block:: bash - su - glassfish - cd /home/glassfish + su - dataverse + cd /home/dataverse mkdir -p local/jacoco-0.8.1 cd local/jacoco-0.8.1 wget https://github.com/jacoco/jacoco/releases/download/v0.8.1/jacoco-0.8.1.zip unzip jacoco-0.8.1.zip - /usr/local/glassfish4/bin/asadmin stop-domain - cp /home/glassfish/local/jacoco-0.8.1/lib/jacocoagent.jar /usr/local/glassfish4/glassfish/lib - /usr/local/glassfish4/bin/asadmin start-domain + /usr/local/payara6/bin/asadmin stop-domain + cp /home/dataverse/local/jacoco-0.8.1/lib/jacocoagent.jar /usr/local/payara6/glassfish/lib + /usr/local/payara6/bin/asadmin start-domain Add jacococli.jar to the WAR File ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As the "glassfish" user download :download:`instrument_war_jacoco.bash <../_static/util/instrument_war_jacoco.bash>` (or skip ahead to the "git clone" step to get the script that way) and give it two arguments: +As the "dataverse" user download :download:`instrument_war_jacoco.bash <../_static/util/instrument_war_jacoco.bash>` (or skip ahead to the "git clone" step to get the script that way) and give it two arguments: - path to your pristine WAR file - path to the new WAR file the script will create with jacococli.jar in it @@ -238,40 +386,41 @@ Deploy the Instrumented WAR File Please note that you'll want to undeploy the old WAR file first, if necessary. -Run this as the "glassfish" user. +Run this as the "dataverse" user. .. code-block:: bash - /usr/local/glassfish4/bin/asadmin deploy dataverse-jacoco.war + /usr/local/payara6/bin/asadmin deploy dataverse-jacoco.war -Note that after deployment the file "/usr/local/glassfish4/glassfish/domains/domain1/config/jacoco.exec" exists and is empty. +Note that after deployment the file "/usr/local/payara6/glassfish/domains/domain1/config/jacoco.exec" exists and is empty. -Run Integration Tests -~~~~~~~~~~~~~~~~~~~~~ +Run API Tests to Determine Code Coverage +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Note that even though you see "docker-aio" in the command below, we assume you are not necessarily running the test suite within Docker. (Some day we'll probably move this script to another directory.) For this reason, we pass the URL with the normal port (8080) that Glassfish runs on to the ``run-test-suite.sh`` script. +Note that if you are looking for how to run API tests generally, you should refer to :ref:`integration-tests`. -Note that "/usr/local/glassfish4/glassfish/domains/domain1/config/jacoco.exec" will become non-empty after you stop and start Glassfish. You must stop and start Glassfish before every run of the integration test suite. +Note that "/usr/local/payara6/glassfish/domains/domain1/config/jacoco.exec" will become non-empty after you stop and start Payara. You must stop and start Payara before every run of the integration test suite. .. code-block:: bash - /usr/local/glassfish4/bin/asadmin stop-domain - /usr/local/glassfish4/bin/asadmin start-domain + /usr/local/payara6/bin/asadmin stop-domain + /usr/local/payara6/bin/asadmin start-domain git clone https://github.com/IQSS/dataverse.git cd dataverse - conf/docker-aio/run-test-suite.sh http://localhost:8080 + TESTS=$(` and a `GitHub webhook `; build output is viewable at https://travis-ci.org/IQSS/dataverse/builds +The Dataverse Project currently makes use of two Continuous Integration platforms, Jenkins and GitHub Actions. Our Jenkins config is a work in progress and may be viewed at https://github.com/IQSS/dataverse-jenkins/ A corresponding GitHub webhook is required. Build output is viewable at https://jenkins.dataverse.org/ +GitHub Actions jobs can be found in ``.github/workflows``. + As always, pull requests to improve our continuous integration configurations are welcome. Enhance build time by caching dependencies ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the future, CI builds in ephemeral build environments and Docker builds can benefit from caching all dependencies and plugins. -As Dataverse is a huge project, build times can be enhanced by avoiding re-downloading everything when the Maven POM is unchanged. +As the Dataverse Project is a huge project, build times can be enhanced by avoiding re-downloading everything when the Maven POM is unchanged. To seed the cache, use the following Maven goal before using Maven in (optional) offline mode in your scripts: .. code:: shell @@ -324,41 +477,15 @@ reduced anyway. You will obviously have to utilize caching functionality of your CI service or do proper Docker layering. -The Phoenix Server -~~~~~~~~~~~~~~~~~~ - -How the Phoenix Tests Work -^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A server at http://phoenix.dataverse.org has been set up to test the latest code from the develop branch. Testing is done using chained builds of Jenkins jobs: - -- A war file is built from the latest code in develop: https://build.hmdc.harvard.edu:8443/job/phoenix.dataverse.org-build-develop/ -- The resulting war file is depoyed to the Phoenix server: https://build.hmdc.harvard.edu:8443/job/phoenix.dataverse.org-deploy-develop/ -- REST Assured Tests are run across the wire from the Jenkins server to the Phoenix server: https://build.hmdc.harvard.edu:8443/job/phoenix.dataverse.org-apitest-develop/ - -How to Run the Phoenix Tests -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -- Take a quick look at http://phoenix.dataverse.org to make sure the server is up and running Dataverse. If it's down, fix it. -- Log into Jenkins and click "Build Now" at https://build.hmdc.harvard.edu:8443/job/phoenix.dataverse.org-build-develop/ -- Wait for all three chained Jenkins jobs to complete and note if they passed or failed. If you see a failure, open a GitHub issue or at least get the attention of some developers. - -List of Tests Run Against the Phoenix Server -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -We haven't thought much about a good way to publicly list the "IT" classes that are executed against the phoenix server. (Currently your best bet is to look at the ``Executing Maven`` line at the top of the "Full Log" of "Console Output" of ``phoenix.dataverse.org-apitest-develop`` Jenkins job mentioned above.) We endeavor to keep the list of tests in the "all-in-one" Docker environment described above in sync with the list of tests configured in Jenkins. That is to say, refer to :download:`run-test-suite.sh <../../../../conf/docker-aio/run-test-suite.sh>` mentioned in ``conf/docker-aio/readme.txt`` for the current list of IT tests that are expected to pass. Here's a dump of that file: - -.. literalinclude:: ../../../../conf/docker-aio/run-test-suite.sh - Accessibility Testing --------------------- Accessibility Policy ~~~~~~~~~~~~~~~~~~~~ -Dataverse aims to improve the user experience for those with disabilities, and are in the process of following the recommendations of the `Harvard University Digital Accessibility Policy `__, which use the Worldwide Web Consortium’s Web Content Accessibility Guidelines version 2.1, Level AA Conformance (WCAG 2.1 Level AA) as the standard. +The Dataverse Project aims to improve the user experience for those with disabilities, and are in the process of following the recommendations of the `Harvard University Digital Accessibility Policy `__, which use the Worldwide Web Consortium’s Web Content Accessibility Guidelines version 2.1, Level AA Conformance (WCAG 2.1 Level AA) as the standard. -To report an accessibility issue with Dataverse, you can create a new issue in our GitHub repo at: https://github.com/IQSS/dataverse/issues/ +To report an accessibility issue with the Dataverse Software, you can create a new issue in our GitHub repo at: https://github.com/IQSS/dataverse/issues/ Accessibility Tools ~~~~~~~~~~~~~~~~~~~ @@ -372,14 +499,14 @@ There are browser developer tools such as the `Wave toolbar '_ asking for ideas from the community, and discussion at 'this GitHub issue. '_ +We'd like to make improvements to our automated testing. See also 'this thread from our mailing list '_ asking for ideas from the community, and discussion at 'this GitHub issue. '_ Future Work on Unit Tests ~~~~~~~~~~~~~~~~~~~~~~~~~ - Review pull requests from @bencomp for ideas for approaches to testing: https://github.com/IQSS/dataverse/pulls?q=is%3Apr+author%3Abencomp - Come up with a way to test commands: http://irclog.iq.harvard.edu/dataverse/2015-11-04#i_26750 -- Test EJBs using Arquillian, embedded Glassfish, or similar. @bmckinney kicked the tires on Arquillian at https://github.com/bmckinney/bio-dataverse/commit/2f243b1db1ca704a42cd0a5de329083763b7c37a +- Test EJBs using Arquillian, embedded app servers, or similar. @bmckinney kicked the tires on Arquillian at https://github.com/bmckinney/bio-dataverse/commit/2f243b1db1ca704a42cd0a5de329083763b7c37a Future Work on Integration Tests ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -387,8 +514,7 @@ Future Work on Integration Tests - Automate testing of dataverse-client-python: https://github.com/IQSS/dataverse-client-python/issues/10 - Work with @leeper on testing the R client: https://github.com/IQSS/dataverse-client-r - Review and attempt to implement "API Test Checklist" from @kcondon at https://docs.google.com/document/d/199Oq1YwQ4pYCguaeW48bIN28QAitSk63NbPYxJHCCAE/edit?usp=sharing -- Attempt to use @openscholar approach for running integration tests using Travis https://github.com/openscholar/openscholar/blob/SCHOLAR-3.x/.travis.yml (probably requires using Ubuntu rather than CentOS) -- Generate code coverage reports for **integration** tests: https://github.com/pkainulainen/maven-examples/issues/3 and http://www.petrikainulainen.net/programming/maven/creating-code-coverage-reports-for-unit-and-integration-tests-with-the-jacoco-maven-plugin/ +- Generate code coverage reports for **integration** tests: https://github.com/pkainulainen/maven-examples/issues/3 and https://www.petrikainulainen.net/programming/maven/creating-code-coverage-reports-for-unit-and-integration-tests-with-the-jacoco-maven-plugin/ - Consistent logging of API Tests. Show test name at the beginning and end and status codes returned. - expected passing and known/expected failing integration tests: https://github.com/IQSS/dataverse/issues/4438 @@ -400,22 +526,16 @@ Browser-Based Testing Installation Testing ~~~~~~~~~~~~~~~~~~~~ -- Run `vagrant up` on a server to test the installer: http://guides.dataverse.org/en/latest/developers/tools.html#vagrant . We haven't been able to get this working in Travis: https://travis-ci.org/IQSS/dataverse/builds/96292683 . Perhaps it would be possible to use AWS as a provider from Vagrant judging from https://circleci.com/gh/critical-alert/circleci-vagrant/6 . -- Work with @lwo to automate testing of https://github.com/IQSS/dataverse-puppet . Consider using Travis: https://github.com/IQSS/dataverse-puppet/issues/10 -- Work with @donsizemore to automate testing of https://github.com/IQSS/dataverse-ansible with Travis or similar. +- Work with @donsizemore to automate testing of https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible Future Work on Load/Performance Testing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Clean up and copy stress tests code, config, and docs into main repo from https://github.com/IQSS/dataverse-helper-scripts/tree/master/src/stress_tests -- Marcel Duran created a command-line wrapper for the WebPagetest API that can be used to test performance in your continuous integration pipeline (TAP, Jenkins, Travis-CI, etc): https://github.com/marcelduran/webpagetest-api/wiki/Test-Specs#jenkins-integration +- Marcel Duran created a command-line wrapper for the WebPagetest API that can be used to test performance in your continuous integration pipeline (TAP, Jenkins, etc.): https://github.com/marcelduran/webpagetest-api/wiki/Test-Specs#jenkins-integration - Create top-down checklist, building off the "API Test Coverage" spreadsheet at https://github.com/IQSS/dataverse/issues/3358#issuecomment-256400776 Future Work on Accessibility Testing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Using https://github.com/IQSS/dataverse-ansible and hooks available from accessibily testing tools, automate the running of accessibility tools on PRs so that developers will receive quicker feedback on proposed code changes that reduce the accessibility of the application. - ----- - -Previous: :doc:`sql-upgrade-scripts` | Next: :doc:`documentation` +- Using https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible and hooks available from accessibility testing tools, automate the running of accessibility tools on PRs so that developers will receive quicker feedback on proposed code changes that reduce the accessibility of the application. diff --git a/doc/sphinx-guides/source/developers/tips.rst b/doc/sphinx-guides/source/developers/tips.rst index 9084f3d1993..9295f3a8d12 100755 --- a/doc/sphinx-guides/source/developers/tips.rst +++ b/doc/sphinx-guides/source/developers/tips.rst @@ -2,7 +2,7 @@ Tips ==== -If you just followed the steps in :doc:`dev-environment` for the first time, you will need to get set up to deploy code to Glassfish. Below you'll find other tips as well. +If you just followed the steps in :doc:`classic-dev-env` for the first time, you will need to get set up to deploy code to your app server. Below you'll find other tips as well. .. contents:: |toctitle| :local: @@ -10,40 +10,38 @@ If you just followed the steps in :doc:`dev-environment` for the first time, you Iterating on Code and Redeploying --------------------------------- -When you followed the steps in the :doc:`dev-environment` section, the war file was deployed to Glassfish by the ``install`` script. That's fine but once you're ready to make a change to the code you will need to get comfortable with undeploying and redeploying code (a war file) to Glassfish. +When you followed the steps in the :doc:`classic-dev-env` section, the war file was deployed to Payara by the Dataverse Software installation script. That's fine but once you're ready to make a change to the code you will need to get comfortable with undeploying and redeploying code (a war file) to Payara. -It's certainly possible to manage deployment and undeployment of the war file via the command line using the ``asadmin`` command that ships with Glassfish (that's what the ``install`` script uses and the steps are documented below), but we recommend getting set up with and IDE such as Netbeans to manage deployment for you. +It's certainly possible to manage deployment and undeployment of the war file via the command line using the ``asadmin`` command that ships with Payara (that's what the Dataverse Software installation script uses and the steps are documented below), but we recommend getting set up with an IDE such as Netbeans to manage deployment for you. -Undeploy the war File from the ``install`` Script -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Undeploy the war File from the Dataverse Software Installation Script +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Because the initial deployment of the war file was done outside of Netbeans by the ``install`` script, it's a good idea to undeploy that war file to give Netbeans a clean slate to work with. +Because the initial deployment of the war file was done outside of Netbeans by the Dataverse Software installation script, it's a good idea to undeploy that war file to give Netbeans a clean slate to work with. -Assuming you installed Glassfish in ``/usr/local/glassfish4``, run the following ``asadmin`` command to see the version of Dataverse that the ``install`` script deployed: +Assuming you installed Payara in ``/usr/local/payara6``, run the following ``asadmin`` command to see the version of the Dataverse Software that the Dataverse Software installation script deployed: -``/usr/local/glassfish4/bin/asadmin list-applications`` +``/usr/local/payara6/bin/asadmin list-applications`` -You will probably see something like ``dataverse-4.8.5 `` as the output. To undeploy, use whichever version you see like this: +You will probably see something like ``dataverse-5.0 `` as the output. To undeploy, use whichever version you see like this: -``/usr/local/glassfish4/bin/asadmin undeploy dataverse-4.8.5`` +``/usr/local/payara6/bin/asadmin undeploy dataverse-5.0`` -Now that Glassfish doesn't have anything deployed, we can proceed with getting Netbeans set up to deploy the code. +Now that Payara doesn't have anything deployed, we can proceed with getting Netbeans set up to deploy the code. -Add Glassfish 4.1 as a Server in Netbeans -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Add Payara as a Server in Netbeans +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Dataverse only works with a specific version of Glassfish (see https://github.com/IQSS/dataverse/issues/2628 ) so you need to make sure Netbeans is deploying to that version rather than a newer version of Glassfish that may have come bundled with Netbeans. +Launch Netbeans and click "Tools" and then "Servers". Click "Add Server" and select "Payara Server" and set the installation location to ``/usr/local/payara6``. The defaults are fine so you can click "Next" and "Finish". -Launch Netbeans and click "Tools" and then "Servers". Click "Add Server" and select "Glassfish Server" and set the installation location to ``/usr/local/glassfish4``. The default are fine so you can click "Next" and "Finish". If you are running Netbeans 8.2 and there is already a bundled version of Glassfish, you should probably remove it because (again) you need a specific version of Glassfish (4.1 as of this writing). +Please note that if you are on a Mac, Netbeans may be unable to start Payara due to proxy settings in Netbeans. Go to the "General" tab in Netbeans preferences and click "Test connection" to see if you are affected. If you get a green checkmark, you're all set. If you get a red exclamation mark, change "Proxy Settings" to "No Proxy" and retest. A more complicated answer having to do with changing network settings is available at https://discussions.apple.com/thread/7680039?answerId=30715103022#30715103022 and the bug is also described at https://netbeans.org/bugzilla/show_bug.cgi?id=268076 -Please note that if you are on a Mac, Netbeans may be unable to start Glassfish due to proxy settings in Netbeans. Go to the "General" tab in Netbeans preferences and click "Test connection" to see if you are affected. If you get a green checkmark, you're all set. If you get a red exclamation mark, change "Proxy Settings" to "No Proxy" and retest. A more complicated answer having to do with changing network settings is available at https://discussions.apple.com/thread/7680039?answerId=30715103022#30715103022 and the bug is also described at https://netbeans.org/bugzilla/show_bug.cgi?id=268076 +At this point you can manage Payara using Netbeans. Click "Window" and then "Services". Expand "Servers" and right-click Payara to stop and then start it so that it appears in the Output window. Note that you can expand "Payara" and "Applications" to see if any applications are deployed. -At this point you can manage Glassfish using Netbeans. Click "Window" and then "Services". Expand "Servers" and right-click Glassfish to stop and then start it so that it appears in the Output window. Note that you can expand "Glassfish" and "Applications" to see if any applications are deployed. +Ensure that the Dataverse Software Will Be Deployed to Payara +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Ensure that Dataverse Will Be Deployed to Glassfish 4.1 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Click "Window" and then "Projects". Click "File" and then "Project Properties (dataverse)". Click "Run" and change "Server" from "No Server Selected" to your installation of Glassfish 4.1. Click OK. +Click "Window" and then "Projects". Click "File" and then "Project Properties (dataverse)". Click "Run" and change "Server" from "No Server Selected" to your installation of Payara. Click OK. .. _custom_build_num_script: @@ -60,6 +58,8 @@ From the root of the git repo, run the following command to set the build number This should update or place a file at ``src/main/java/BuildNumber.properties``. +(See also :ref:`auto-custom-build-number` for other ways of changing the build number.) + Then, from Netbeans, click "Run" and then "Clean and Build Project (dataverse)". After this completes successfully, click "Run" and then "Run Project (dataverse)" Confirm the Change Was Deployed @@ -80,61 +80,135 @@ Netbeans Connector Chrome Extension For faster iteration while working on JSF pages, it is highly recommended that you install the Netbeans Connector Chrome Extension listed in the :doc:`tools` section. When you save XHTML or CSS files, you will see the changes immediately. Hipsters call this "hot reloading". :) +Thumbnails +---------- + +In order for thumnails to be generated for PDFs, you need to install ImageMagick and configure Dataverse to use the ``convert`` binary. + +Assuming you're using Homebrew: + +``brew install imagemagick`` + +Then configure the JVM option mentioned in :ref:`install-imagemagick` to the path to ``convert`` which for Homebrew is usually ``/usr/local/bin/convert``. + Database Schema Exploration --------------------------- -With over 100 tables, the Dataverse PostgreSQL database ("dvndb") can be somewhat daunting for newcomers. Here are some tips for coming up to speed. (See also the :doc:`sql-upgrade-scripts` section.) +With over 100 tables, the Dataverse PostgreSQL database can be somewhat daunting for newcomers. Here are some tips for coming up to speed. (See also the :doc:`sql-upgrade-scripts` section.) + +.. _db-name-creds: + +Database Name and Credentials +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The default database name and credentials depends on how you set up your dev environment. + +.. list-table:: + :header-rows: 1 + :align: left + + * - MPCONFIG Key + - Docker + - Classic + * - dataverse.db.name + - ``dataverse`` + - ``dvndb`` + * - dataverse.db.user + - ``dataverse`` + - ``dvnapp`` + * - dataverse.db.password + - ``secret`` + - ``secret`` + +Here's an example of using these credentials from within the PostgreSQL container (see :doc:`/container/index`): + +.. code-block:: bash + + pdurbin@beamish dataverse % docker exec -it postgres-1 bash + root@postgres:/# export PGPASSWORD=secret + root@postgres:/# psql -h localhost -U dataverse dataverse + psql (16.3 (Debian 16.3-1.pgdg120+1)) + Type "help" for help. + + dataverse=# select id,alias from dataverse limit 1; + id | alias + ----+------- + 1 | root + (1 row) + +See also :ref:`database-persistence` in the Installation Guide. pgAdmin -~~~~~~~~ +~~~~~~~ + +If you followed the :doc:`classic-dev-env` section, we had you install pgAdmin, which can help you explore the tables and execute SQL commands. It's also listed in the :doc:`tools` section. -Back in the :doc:`dev-environment` section, we had you install pgAdmin, which can help you explore the tables and execute SQL commands. It's also listed in the :doc:`tools` section. +.. _schemaspy: SchemaSpy ~~~~~~~~~ SchemaSpy is a tool that creates a website of entity-relationship diagrams based on your database. -As part of our build process for running integration tests against the latest code in the "develop" branch, we drop the database on the "phoenix" server, recreate the database by deploying the latest war file, and run SchemaSpy to create the following site: http://phoenix.dataverse.org/schemaspy/latest/relationships.html +As part of our release process (:ref:`update-schemaspy`), we run SchemaSpy and publish the output at https://guides.dataverse.org/en/latest/schemaspy/index.html and (for example) https://guides.dataverse.org/en/6.6/schemaspy/index.html -To run this command on your laptop, download SchemaSpy and take a look at the syntax in ``scripts/deploy/phoenix.dataverse.org/post`` +To run SchemaSpy locally, you can try something like this (after downloading the jars from https://github.com/schemaspy/schemaspy/releases and https://jdbc.postgresql.org/download/): -To read more about the phoenix server, see the :doc:`testing` section. +``java -jar /tmp/schemaspy-6.2.4.jar -t pgsql -host localhost -db dvndb -u postgres -p secret -s public -dp /tmp/postgresql-42.7.5.jar -o /tmp/latest`` + +See also :ref:`db-name-creds`. Deploying With ``asadmin`` -------------------------- Sometimes you want to deploy code without using Netbeans or from the command line on a server you have ssh'ed into. -For the ``asadmin`` commands below, we assume you have already changed directories to ``/usr/local/glassfish4/glassfish/bin`` or wherever you have installed Glassfish. +For the ``asadmin`` commands below, we assume you have already changed directories to ``/usr/local/payara6/glassfish/bin`` or wherever you have installed Payara. There are four steps to this process: 1. Build the war file: ``mvn package`` -2. Check which version of Dataverse is deployed: ``./asadmin list-applications`` -3. Undeploy the Dataverse application (if necessary): ``./asadmin undeploy dataverse-VERSION`` +2. Check which version of the Dataverse Software is deployed: ``./asadmin list-applications`` +3. Undeploy the Dataverse Software (if necessary): ``./asadmin undeploy dataverse-VERSION`` 4. Copy the war file to the server (if necessary) 5. Deploy the new code: ``./asadmin deploy /path/to/dataverse-VERSION.war`` -Running the Dataverse ``install`` Script in Non-Interactive Mode ----------------------------------------------------------------- +Running the Dataverse Software Installation Script in Non-Interactive Mode +-------------------------------------------------------------------------- Rather than running the installer in "interactive" mode, it's possible to put the values in a file. See "non-interactive mode" in the :doc:`/installation/installation-main` section of the Installation Guide. -Preventing Glassfish from Phoning Home --------------------------------------- +Preventing Payara from Phoning Home +----------------------------------- -By default, Glassfish reports analytics information. The administration guide suggests this can be disabled with ``./asadmin create-jvm-options -Dcom.sun.enterprise.tools.admingui.NO_NETWORK=true``, should this be found to be undesirable for development purposes. +By default, Glassfish reports analytics information. The administration guide suggests this can be disabled with ``./asadmin create-jvm-options -Dcom.sun.enterprise.tools.admingui.NO_NETWORK=true``, should this be found to be undesirable for development purposes. It is unknown if Payara phones home or not. Solr ---- .. TODO: This section should be moved into a dedicated guide about Solr for developers. It should be extended with - information about the way Solr is used within Dataverse, ideally explaining concepts and links to upstream docs. + information about the way Solr is used within the Dataverse Software, ideally explaining concepts and links to upstream docs. + +Once some Dataverse collections, datasets, and files have been created and indexed, you can experiment with searches directly from Solr at http://localhost:8983/solr/#/collection1/query and look at the JSON output of searches, such as this wildcard search: http://localhost:8983/solr/collection1/select?q=*%3A*&wt=json&indent=true . You can also get JSON output of static fields Solr knows about: http://localhost:8983/solr/collection1/schema/fields + +You can simply double-click "start.jar" rather than running ``java -jar start.jar`` from the command line. Figuring out how to stop Solr after double-clicking it is an exercise for the reader. + +.. _update-solr-schema-dev: + +Updating the Solr Schema (Developers) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Both developers and sysadmins need to update the Solr schema from time to time. One difference is that developers will be committing changes to ``conf/solr/schema.xml`` in git. To prevent cross-platform differences in the git history, when running the ``update-fields.sh`` script, we ask all developers to run the script from within Docker. (See :doc:`/container/configbaker-image` for more on the image we'll use below.) + +.. code-block:: + + curl http://localhost:8080/api/admin/index/solr/schema | docker run -i --rm -v ./docker-dev-volumes/solr/data:/var/solr gdcc/configbaker:unstable update-fields.sh /var/solr/data/collection1/conf/schema.xml -Once some dataverses, datasets, and files have been created and indexed, you can experiment with searches directly from Solr at http://localhost:8983/solr/#/collection1/query and look at the JSON output of searches, such as this wildcard search: http://localhost:8983/solr/collection1/select?q=*%3A*&wt=json&indent=true . You can also get JSON output of static fields Solr knows about: http://localhost:8983/solr/collection1/schema/fields + cp docker-dev-volumes/solr/data/data/collection1/conf/schema.xml conf/solr/schema.xml -You can simply double-click "start.jar" rather that running ``java -jar start.jar`` from the command line. Figuring out how to stop Solr after double-clicking it is an exercise for the reader. +At this point you can do a ``git diff`` and see if your changes make sense before committing. + +Sysadmins are welcome to run ``update-fields.sh`` however they like. See :ref:`update-solr-schema` in the Admin Guide and :ref:`additional-metadata-blocks` in the Container Guide for details. Git --- @@ -155,6 +229,8 @@ Git on Mac On a Mac, you won't have git installed unless you have "Command Line Developer Tools" installed but running ``git clone`` for the first time will prompt you to install them. +.. _auto-custom-build-number: + Automation of Custom Build Number on Webpage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -164,14 +240,75 @@ commit id in your test deployment webpages on the bottom right corner next to th When you prefer manual updates, there is another script, see above: :ref:`custom_build_num_script`. +An alternative to that is using *MicroProfile Config* and set the option ``dataverse.build`` via a system property, +environment variable (``DATAVERSE_BUILD``) or `one of the other config sources +`__. + +You could even override the version itself with the option ``dataverse.version`` in the same way, which is usually +picked up from a build time source. + +See also discussion of version numbers in :ref:`run-build-create-war`. + Sample Data ----------- -You may want to populate your **non-production** installation(s) of Dataverse with sample data. You have a couple options: +You may want to populate your **non-production** Dataverse installations with sample data. -- Code in https://github.com/IQSS/dataverse-sample-data (recommended). This set of sample data includes several common data types, data subsetted from production datasets in dataverse.harvard.edu, datasets with file hierarchy, and more. -- Scripts called from ``scripts/deploy/phoenix.dataverse.org/post``. +https://github.com/IQSS/dataverse-sample-data includes several common data types, data subsetted from production datasets in dataverse.harvard.edu, datasets with file hierarchy, and more. ----- +Switching from Glassfish to Payara +---------------------------------- + +If you already have a working dev environment with Glassfish and want to switch to Payara, you must do the following: + +- Copy the "domain1" directory from Glassfish to Payara. + +UI Pages Development +-------------------- + +While most of the information in this guide focuses on service and backing beans ("the back end") development in Java, working on JSF/Primefaces xhtml pages presents its own unique challenges. + +.. _avoid-efficiency-issues-with-render-logic-expressions: + +Avoiding Inefficiencies in JSF Render Logic +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is important to keep in mind that the expressions in JSF ``rendered=`` attributes may be evaluated **multiple** times. So it is crucial not to use any expressions that require database lookups, or otherwise take any appreciable amount of time and resources. Render attributes should exclusively contain calls to methods in backing beans or caching service wrappers that perform any real work on the first call only, then keep returning the cached result on all the consecutive calls. This way it is irrelevant how many times PrimeFaces may need to call the method as any effect on the performance will be negligible. + +If you are ever in doubt as to how many times the method in your render logic expression is called, you can simply add a logging statement to the method in question. Or you can simply err on the side of assuming that it's going to be called a lot, and ensure that any repeated calls are not expensive to process. + +A simplest, trivial example would be a direct call to a method in SystemConfig service bean. For example, + +`` was including the f:metadata farther down in the tree rather than as a direct child of the view. +As of Payara 6.2025.2, it is not clear that this error was resulting in changes to UI behavior, but the error messages were in the log. +If you see these errors, this note and the examples in the PR will hopefully provide some insight as to how to fix them. diff --git a/doc/sphinx-guides/source/developers/tools.rst b/doc/sphinx-guides/source/developers/tools.rst index 767a4a91694..b0ade669d52 100755 --- a/doc/sphinx-guides/source/developers/tools.rst +++ b/doc/sphinx-guides/source/developers/tools.rst @@ -2,11 +2,16 @@ Tools ===== -These are handy tools for your :doc:`/developers/dev-environment/`. +These are handy tools for your :doc:`dev-environment`. .. contents:: |toctitle| :local: +Tools for Faster Deployment ++++++++++++++++++++++++++++ + +See :ref:`ide-trigger-code-deploy` in the Container Guide. + Netbeans Connector Chrome Extension +++++++++++++++++++++++++++++++++++ @@ -18,43 +23,30 @@ Unfortunately, while the Netbeans Connector Chrome Extension used to "just work" pgAdmin +++++++ -You probably installed pgAdmin when following the steps in the :doc:`dev-environment` section but if not, you can download it from https://www.pgadmin.org +You may have installed pgAdmin when following the steps in the :doc:`classic-dev-env` section but if not, you can download it from https://www.pgadmin.org Maven +++++ With Maven installed you can run ``mvn package`` and ``mvn test`` from the command line. It can be downloaded from https://maven.apache.org -Vagrant -+++++++ - -Vagrant allows you to spin up a virtual machine running Dataverse on your development workstation. You'll need to install Vagrant from https://www.vagrantup.com and VirtualBox from https://www.virtualbox.org. - -We assume you have already cloned the repo from https://github.com/IQSS/dataverse as explained in the :doc:`/developers/dev-environment` section. - -From the root of the git repo (where the ``Vagrantfile`` is), run ``vagrant up`` and eventually you should be able to reach an installation of Dataverse at http://localhost:8888 (the ``forwarded_port`` indicated in the ``Vagrantfile``). - -Please note that running ``vagrant up`` for the first time should run the ``downloads/download.sh`` script for you to download required software such as Glassfish and Solr and any patches. However, these dependencies change over time so it's a place to look if ``vagrant up`` was working but later fails. - -On Windows if you see an error like ``/usr/bin/perl^M: bad interpreter`` you might need to run ``dos2unix`` on the installation scripts. - PlantUML ++++++++ -PlantUML is used to create diagrams in the guides and other places. Download it from http://plantuml.com and check out an example script at https://github.com/IQSS/dataverse/blob/v4.6.1/doc/Architecture/components.sh . Note that for this script to work, you'll need the ``dot`` program, which can be installed on Mac with ``brew install graphviz``. +PlantUML is used to create diagrams in the guides and other places. Download it from https://plantuml.com and check out an example script at https://github.com/IQSS/dataverse/blob/v4.6.1/doc/Architecture/components.sh . Note that for this script to work, you'll need the ``dot`` program, which can be installed on Mac with ``brew install graphviz``. Eclipse Memory Analyzer Tool (MAT) ++++++++++++++++++++++++++++++++++ The Memory Analyzer Tool (MAT) from Eclipse can help you analyze heap dumps, showing you "leak suspects" such as seen at https://github.com/payara/Payara/issues/350#issuecomment-115262625 -It can be downloaded from http://www.eclipse.org/mat +It can be downloaded from https://www.eclipse.org/mat -If the heap dump provided to you was created with ``gcore`` (such as with ``gcore -o /tmp/gf.core $glassfish_pid``) rather than ``jmap``, you will need to convert the file before you can open it in MAT. Using ``gf.core.13849`` as example of the original 33 GB file, here is how you could convert it into a 26 GB ``gf.core.13849.hprof`` file. Please note that this operation took almost 90 minutes: +If the heap dump provided to you was created with ``gcore`` (such as with ``gcore -o /tmp/app.core $app_pid``) rather than ``jmap``, you will need to convert the file before you can open it in MAT. Using ``app.core.13849`` as example of the original 33 GB file, here is how you could convert it into a 26 GB ``app.core.13849.hprof`` file. Please note that this operation took almost 90 minutes: -``/usr/java7/bin/jmap -dump:format=b,file=gf.core.13849.hprof /usr/java7/bin/java gf.core.13849`` +``/usr/java7/bin/jmap -dump:format=b,file=app.core.13849.hprof /usr/java7/bin/java app.core.13849`` -A file of this size may not "just work" in MAT. When you attempt to open it you may see something like "An internal error occurred during: "Parsing heap dump from '/tmp/heapdumps/gf.core.13849.hprof'". Java heap space". If so, you will need to increase the memory allocated to MAT. On Mac OS X, this can be done by editing ``MemoryAnalyzer.app/Contents/MacOS/MemoryAnalyzer.ini`` and increasing the value "-Xmx1024m" until it's high enough to open the file. See also http://wiki.eclipse.org/index.php/MemoryAnalyzer/FAQ#Out_of_Memory_Error_while_Running_the_Memory_Analyzer +A file of this size may not "just work" in MAT. When you attempt to open it you may see something like "An internal error occurred during: "Parsing heap dump from '/tmp/heapdumps/app.core.13849.hprof'". Java heap space". If so, you will need to increase the memory allocated to MAT. On Mac OS X, this can be done by editing ``MemoryAnalyzer.app/Contents/MacOS/MemoryAnalyzer.ini`` and increasing the value "-Xmx1024m" until it's high enough to open the file. See also https://wiki.eclipse.org/index.php/MemoryAnalyzer/FAQ#Out_of_Memory_Error_while_Running_the_Memory_Analyzer PageKite ++++++++ @@ -69,9 +61,9 @@ Sign up at https://pagekite.net and follow the installation instructions or simp The first time you run ``./pagekite.py`` a file at ``~/.pagekite.rc`` will be created. You can edit this file to configure PageKite to serve up port 8080 -(the default GlassFish HTTP port) or the port of your choosing. +(the default app server HTTP port) or the port of your choosing. -According to https://pagekite.net/support/free-for-foss/ PageKite (very generously!) offers free accounts to developers writing software the meets http://opensource.org/docs/definition.php such as Dataverse. +According to https://pagekite.net/support/free-for-foss/ PageKite (very generously!) offers free accounts to developers writing software the meets https://opensource.org/docs/definition.php such as the Dataverse Project. MSV +++ @@ -109,7 +101,7 @@ Download SonarQube from https://www.sonarqube.org and start look in the `bin` di -Dsonar.test.exclusions='src/test/**,src/main/webapp/resources/**' \ -Dsonar.issuesReport.html.enable=true \ -Dsonar.issuesReport.html.location='sonar-issues-report.html' \ - -Dsonar.jacoco.reportPath=target/jacoco.exec + -Dsonar.jacoco.reportPath=target/coverage-reports/jacoco-unit.exec Once the analysis is complete, you should be able to access http://localhost:9000/dashboard?id=edu.harvard.iq%3Adataverse to see the report. To learn about resource leaks, for example, click on "Bugs", the "Tag", then "leak" or "Rule", then "Resources should be closed". @@ -129,7 +121,7 @@ Look for "RESOURCE_LEAK", for example. lsof ++++ -If file descriptors are not closed, eventually the open but unused resources can cause problems with system (glassfish in particular) stability. +If file descriptors are not closed, eventually the open but unused resources can cause problems with system (app servers in particular) stability. Static analysis and heap dumps are not always sufficient to identify the sources of these problems. For a quick sanity check, it can be helpful to check that the number of file descriptors does not increase after a request has finished processing. @@ -145,6 +137,14 @@ For example... would be consistent with a file descriptor leak on the dataset page. +JProfiler ++++++++++ + +Tracking down resource drainage, bottlenecks etc gets easier using a profiler. + +We thank EJ Technologies for granting us a free open source project license for their Java profiler +`JProfiler `_. + jmap and jstat ++++++++++++++ @@ -153,12 +153,12 @@ jmap allows you to look at the contents of the java heap. It can be used to crea .. code-block:: bash - $ jmap -histo + $ jmap -histo will output a list of all classes, sorted by the number of instances of each individual class, with the size in bytes. This can be very useful when looking for memory leaks in the application. Another useful tool is ``jstat``, that can be used in combination with ``jmap`` to monitor the effectiveness of garbage collection in reclaiming allocated memory. -In the example script below we stress running Dataverse applicatione with GET requests to a specific dataverse page, use ``jmap`` to see how many Dataverse, Dataset and DataFile class object get allocated, then run ``jstat`` to see how the numbers are affected by both "Young Generation" and "Full" garbage collection runs (``YGC`` and ``FGC`` respectively): +In the example script below we stress running Dataverse Software applicatione with GET requests to a specific page in a Dataverse installation, use ``jmap`` to see how many Dataverse collection, Dataset and DataFile class object get allocated, then run ``jstat`` to see how the numbers are affected by both "Young Generation" and "Full" garbage collection runs (``YGC`` and ``FGC`` respectively): (This is script is provided **as an example only**! You will have to experiment and expand it to suit any specific needs and any specific problem you may be trying to diagnose, and this is just to give an idea of how to go about it) @@ -166,12 +166,12 @@ In the example script below we stress running Dataverse applicatione with GET re #!/bin/sh - # the script takes the numeric id of the glassfish process as the command line argument: + # the script takes the numeric id of the app server process as the command line argument: id=$1 while : do - # Access the dataverse xxx 10 times in a row: + # Access the Dataverse collection xxx 10 times in a row: for ((i = 0; i < 10; i++)) do # hide the output, standard and stderr: @@ -184,7 +184,7 @@ In the example script below we stress running Dataverse applicatione with GET re jmap -histo $id > /tmp/jmap.histo.out - # grep the output for Dataverse, Dataset and DataFile classes: + # grep the output for Dataverse Collection, Dataset and DataFile classes: grep '\.Dataverse$' /tmp/jmap.histo.out grep '\.Dataset$' /tmp/jmap.histo.out grep '\.DataFile$' /tmp/jmap.histo.out @@ -235,7 +235,7 @@ The script above will run until you stop it, and will output something like: How to analyze the output, what to look for: -First, look at the numbers in the jmap output. In the example above, you can immediately see, after the first three iterations, that every 10 dataverse page loads results in the increase of the number of Dataset classes by 160. I.e., each page load leaves 16 of these on the heap. We can also see that each of the 10 page load cycles increased the heap by roughly 3GB; that each cycle resulted in a couple of YG (young generation) garbage collections, and in the old generation allocation being almost 70% full. These numbers in the example are clearly quite high and are an indication of some problematic memory allocation by the dataverse page - if this is the result of something you have added to the page, you probably would want to investigate and fix it. However, overly generous memory use **is not the same as a leak** necessarily. What you want to see now is how much of this allocation can be reclaimed by "Full GC". If all of it gets freed by ``FGC``, it is not the end of the world (even though you do not want your system to spend too much time running ``FGC``; it costs CPU cycles, and actually freezes the application while it's in progress!). It is however a **really** serious problem, if you determine that a growing portion of the old. gen. memory (``"O"`` in the ``jmap`` output) is not getting freed, even by ``FGC``. This *is* a real leak now, i.e. something is leaving behind some objects that are still referenced and thus off limits to garbage collector. So look for the lines where the ``FGC`` counter is incremented. For example, the first ``FGC`` in the example output above: +First, look at the numbers in the jmap output. In the example above, you can immediately see, after the first three iterations, that every 10 Dataverse installation page loads results in the increase of the number of Dataset classes by 160. I.e., each page load leaves 16 of these on the heap. We can also see that each of the 10 page load cycles increased the heap by roughly 3GB; that each cycle resulted in a couple of YG (young generation) garbage collections, and in the old generation allocation being almost 70% full. These numbers in the example are clearly quite high and are an indication of some problematic memory allocation by the Dataverse installation page - if this is the result of something you have added to the page, you probably would want to investigate and fix it. However, overly generous memory use **is not the same as a leak** necessarily. What you want to see now is how much of this allocation can be reclaimed by "Full GC". If all of it gets freed by ``FGC``, it is not the end of the world (even though you do not want your system to spend too much time running ``FGC``; it costs CPU cycles, and actually freezes the application while it's in progress!). It is however a **really** serious problem, if you determine that a growing portion of the old. gen. memory (``"O"`` in the ``jmap`` output) is not getting freed, even by ``FGC``. This *is* a real leak now, i.e. something is leaving behind some objects that are still referenced and thus off limits to garbage collector. So look for the lines where the ``FGC`` counter is incremented. For example, the first ``FGC`` in the example output above: .. code-block:: none @@ -255,7 +255,7 @@ First, look at the numbers in the jmap output. In the example above, you can imm 0.00 100.00 71.95 20.12 ? 22 25.034 1 4.455 29.489 Wed Aug 14 23:21:40 EDT 2019 -We can see that the first ``FGC`` resulted in reducing the ``"O"`` by almost 7GB, from 15GB down to 8GB (from 88% to 20% full). The number of Dataset classes has not budged at all - it has grown by the same 160 objects as before (very suspicious!). To complicate matters, ``FGC`` does not **guarantee** to free everything that can be freed - it will balance how much the system needs memory vs. how much it is willing to spend in terms of CPU cycles performing GC (remember, the application freezes while ``FGC`` is running!). So you should not assume that the "20% full" number above means that you have 20% of your stack already wasted and unrecoverable. Instead, look for the next **minium** value of ``"O"``; then for the next, etc. Now compare these consecutive miniums. With the above test (this is an output of a real experiment, a particularly memory-hungry feature added to the dataverse page), the minimums sequence (of old. gen. usage, in %) was looking as follows: +We can see that the first ``FGC`` resulted in reducing the ``"O"`` by almost 7GB, from 15GB down to 8GB (from 88% to 20% full). The number of Dataset classes has not budged at all - it has grown by the same 160 objects as before (very suspicious!). To complicate matters, ``FGC`` does not **guarantee** to free everything that can be freed - it will balance how much the system needs memory vs. how much it is willing to spend in terms of CPU cycles performing GC (remember, the application freezes while ``FGC`` is running!). So you should not assume that the "20% full" number above means that you have 20% of your stack already wasted and unrecoverable. Instead, look for the next **minium** value of ``"O"``; then for the next, etc. Now compare these consecutive miniums. With the above test (this is an output of a real experiment, a particularly memory-hungry feature added to the Dataverse installation page), the minimums sequence (of old. gen. usage, in %) was looking as follows: .. code-block:: none @@ -274,10 +274,3 @@ We can see that the first ``FGC`` resulted in reducing the ``"O"`` by almost 7GB etc. ... It is clearly growing - so now we can conclude that indeed something there is using memory in a way that's not recoverable, and this is a clear problem. - - - - ----- - -Previous: :doc:`making-releases` | Next: :doc:`unf/index` diff --git a/doc/sphinx-guides/source/developers/troubleshooting.rst b/doc/sphinx-guides/source/developers/troubleshooting.rst index ec49b442016..2c437ca8b2e 100755 --- a/doc/sphinx-guides/source/developers/troubleshooting.rst +++ b/doc/sphinx-guides/source/developers/troubleshooting.rst @@ -2,7 +2,7 @@ Troubleshooting =============== -Over in the :doc:`dev-environment` section we described the "happy path" of when everything goes right as you set up your Dataverse development environment. Here are some common problems and solutions for when things go wrong. +Over in the :doc:`classic-dev-env` section we described the "happy path" of when everything goes right as you set up your Dataverse Software development environment. Here are some common problems and solutions for when things go wrong. .. contents:: |toctitle| :local: @@ -14,7 +14,7 @@ For unknown reasons, Netbeans will sometimes change the following line under ``s ``/`` -Sometimes Netbeans will change ``/`` to ``/dataverse``. Sometimes it will delete the line entirely. Either way, you will see very strange behavior when attempting to click around Dataverse in a browser. The homepage will load but icons will be missing. Any other page will fail to load entirely and you'll see a Glassfish error. +Sometimes Netbeans will change ``/`` to ``/dataverse``. Sometimes it will delete the line entirely. Either way, you will see very strange behavior when attempting to click around the Dataverse installation in a browser. The homepage will load but icons will be missing. Any other page will fail to load entirely and you'll see an app server error. The solution is to put the file back to how it was before Netbeans touched it. If anyone knows of an open Netbeans bug about this, please let us know. @@ -35,13 +35,13 @@ You can check the current SMTP server with the ``asadmin`` command: ``./asadmin get server.resources.mail-resource.mail/notifyMailSession.host`` -This command helps verify what host your domain is using to send mail. Even if it's the correct hostname, you may still need to adjust settings. If all else fails, there are some free SMTP service options available such as Gmail and MailGun. This can be configured from the GlassFish console or the command line. +This command helps verify what host your domain is using to send mail. Even if it's the correct hostname, you may still need to adjust settings. If all else fails, there are some free SMTP service options available such as Gmail and MailGun. This can be configured from the Payara console or the command line. -1. First, navigate to your Glassfish admin console: http://localhost:4848 +1. First, navigate to your Payara admin console: http://localhost:4848 2. From the left-side panel, select **JavaMail Sessions** 3. You should see one session named **mail/notifyMailSession** -- click on that. -From this window you can modify certain fields of your Dataverse's notifyMailSession, which is the JavaMail session for outgoing system email (such as on user signup or data publication). Two of the most important fields we need are: +From this window you can modify certain fields of your Dataverse installation's notifyMailSession, which is the JavaMail session for outgoing system email (such as on user sign up or data publication). Two of the most important fields we need are: - **Mail Host:** The DNS name of the default mail server (e.g. smtp.gmail.com) - **Default User:** The username provided to your Mail Host when you connect to it (e.g. johndoe@gmail.com) @@ -50,7 +50,7 @@ Most of the other defaults can safely be left as is. **Default Sender Address** If your user credentials for the SMTP server require a password, you'll need to configure some **Additional Properties** at the bottom. -**IMPORTANT:** Before continuing, it's highly recommended that your Default User account does NOT use a password you share with other accounts, as one of the additional properties includes entering the Default User's password (without concealing it on screen). For smtp.gmail.com you can safely use an `app password `_ or create an extra Gmail account for use with your Dataverse dev environment. +**IMPORTANT:** Before continuing, it's highly recommended that your Default User account does NOT use a password you share with other accounts, as one of the additional properties includes entering the Default User's password (without concealing it on screen). For smtp.gmail.com you can safely use an `app password `_ or create an extra Gmail account for use with your Dataverse Software development environment. Authenticating yourself to a Mail Host can be tricky. As an example, we'll walk through setting up our JavaMail Session to use smtp.gmail.com as a host by way of SSL on port 465. Use the Add Property button to generate a blank property for each name/value pair. @@ -67,16 +67,16 @@ mail.smtp.socketFactory.class javax.net.ssl.SSLSocketFactory **\*WARNING**: Entering a password here will *not* conceal it on-screen. It’s recommended to use an *app password* (for smtp.gmail.com users) or utilize a dedicated/non-personal user account with SMTP server auths so that you do not risk compromising your password. -Save these changes at the top of the page and restart your Glassfish server to try it out. +Save these changes at the top of the page and restart your app server to try it out. The mail session can also be set from command line. To use this method, you will need to delete your notifyMailSession and create a new one. See the below example: - Delete: ``./asadmin delete-javamail-resource mail/MyMailSession`` - Create (remove brackets and replace the variables inside): ``./asadmin create-javamail-resource --mailhost [smtp.gmail.com] --mailuser [test\@test\.com] --fromaddress [test\@test\.com] --property mail.smtp.auth=[true]:mail.smtp.password=[password]:mail.smtp.port=[465]:mail.smtp.socketFactory.port=[465]:mail.smtp.socketFactory.fallback=[false]:mail.smtp.socketFactory.class=[javax.net.ssl.SSLSocketFactory] mail/notifyMailSession`` -These properties can be tailored to your own preferred mail service, but if all else fails these settings work fine with Dataverse development environments for your localhost. +These properties can be tailored to your own preferred mail service, but if all else fails these settings work fine with Dataverse Software development environments for your localhost. -+ If you're seeing a "Relay access denied" error in your Glassfish logs when your app attempts to send an email, double check your user/password credentials for the Mail Host you're using. ++ If you're seeing a "Relay access denied" error in your app server logs when the Dataverse installation attempts to send an email, double check your user/password credentials for the Mail Host you're using. + If you're seeing a "Connection refused" / similar error upon email sending, try another port. As another example, here is how to create a Mail Host via command line for Amazon SES: @@ -84,19 +84,23 @@ As another example, here is how to create a Mail Host via command line for Amazo - Delete: ``./asadmin delete-javamail-resource mail/MyMailSession`` - Create (remove brackets and replace the variables inside): ``./asadmin create-javamail-resource --mailhost email-smtp.us-east-1.amazonaws.com --mailuser [test\@test\.com] --fromaddress [test\@test\.com] --transprotocol aws --transprotocolclass com.amazonaws.services.simpleemail.AWSJavaMailTransport --property mail.smtp.auth=true:mail.smtp.user=[aws_access_key]:mail.smtp.password=[aws_secret_key]:mail.transport.protocol=smtp:mail.smtp.port=587:mail.smtp.starttls.enable=true mail/notifyMailSession`` +.. _rebuilding-dev-environment: + Rebuilding Your Dev Environment ------------------------------- -If you have an old copy of the database and old Solr data and want to start fresh, here are the recommended steps: +A script called :download:`dev-rebuild.sh <../../../../scripts/dev/dev-rebuild.sh>` is available that does the following: -- drop your old database -- clear out your existing Solr index: ``scripts/search/clear`` -- run the installer script above - it will create the db, deploy the app, populate the db with reference data and run all the scripts that create the domain metadata fields. You no longer need to perform these steps separately. -- confirm you are using the latest Dataverse-specific Solr schema.xml and included XML files (schema_dv_cmb_[copies|fields].xml) -- confirm http://localhost:8080 is up -- If you want to set some dataset-specific facets, go to the root dataverse (or any dataverse; the selections can be inherited) and click "General Information" and make choices under "Select Facets". There is a ticket to automate this: https://github.com/IQSS/dataverse/issues/619 +- Drops the database. +- Clears our Solr. +- Deletes all data files uploaded by users (assuming you are using the default directory). +- Deploys the war file located in the ``target`` directory. +- Runs ``setup-all.sh`` in insecure mode so tests will pass. +- Runs post-install SQL statements. +- Publishes the root Dataverse collection. +- Adjusts permissions on on the root Dataverse collection so tests will pass. -You may also find https://github.com/IQSS/dataverse/blob/develop/scripts/deploy/phoenix.dataverse.org/deploy and related scripts interesting because they demonstrate how we have at least partially automated the process of tearing down a Dataverse installation and having it rise again, hence the name "phoenix." See also "Fresh Reinstall" in the :doc:`/installation/installation-main` section of the Installation Guide. +To execute the script, make sure you have built a war file already and then ``cd`` to the root of the source tree and run ``scripts/dev/dev-rebuild.sh``. Feedback on this script is welcome! DataCite -------- @@ -106,7 +110,3 @@ If you are seeing ``Response code: 400, [url] domain of URL is not allowed`` it' ``./asadmin delete-jvm-options '-Ddataverse.siteUrl=http\://localhost\:8080'`` ``./asadmin create-jvm-options '-Ddataverse.siteUrl=http\://demo.dataverse.org'`` - ----- - -Previous: :doc:`tips` | Next: :doc:`version-control` diff --git a/doc/sphinx-guides/source/developers/unf/index.rst b/doc/sphinx-guides/source/developers/unf/index.rst index af9265a7fc0..596bb0cf3bf 100644 --- a/doc/sphinx-guides/source/developers/unf/index.rst +++ b/doc/sphinx-guides/source/developers/unf/index.rst @@ -20,14 +20,14 @@ computed. The signature is thus independent of the storage format. E.g., the same data object stored in, say, SPSS and Stata, will have the same UNF. -Early versions of Dataverse were using the first released +Early versions of the Dataverse Software were using the first released implementation of the UNF algorithm (v.3, implemented in R). Starting -with Dataverse 2.0 and throughout the 3.* lifecycle, UNF v.5 -(implemented in Java) was used. Dataverse 4.0 uses the latest release, +with Dataverse Software 2.0 and throughout the 3.* lifecycle, UNF v.5 +(implemented in Java) was used. Dataverse Software 4.0 uses the latest release, UNF v.6. Two parallel implementation, in R and Java, will be available, for cross-validation. -Learn more: Micah Altman and Gary King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data.” D-Lib Magazine, 13. Publisher’s Version Copy at http://j.mp/2ovSzoT +Learn more: Micah Altman and Gary King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data.” D-Lib Magazine, 13. Publisher’s Version Copy at https://j.mp/2ovSzoT **Contents:** @@ -37,7 +37,3 @@ Learn more: Micah Altman and Gary King. 2007. “A Proposed Standard for the Sch unf-v3 unf-v5 unf-v6 - ----- - -Previous: :doc:`/developers/tools` | Next: :doc:`/developers/remote-users` diff --git a/doc/sphinx-guides/source/developers/unf/unf-v3.rst b/doc/sphinx-guides/source/developers/unf/unf-v3.rst index 3f0018d7fa5..98c07b398e0 100644 --- a/doc/sphinx-guides/source/developers/unf/unf-v3.rst +++ b/doc/sphinx-guides/source/developers/unf/unf-v3.rst @@ -34,11 +34,11 @@ For example, the number pi at five digits is represented as -3.1415e+, and the n 1. Terminate character strings representing nonmissing values with a POSIX end-of-line character. -2. Encode each character string with `Unicode bit encoding `_. Versions 3 through 4 use UTF-32BE; Version 4.1 uses UTF-8. +2. Encode each character string with `Unicode bit encoding `_. Versions 3 through 4 use UTF-32BE; Version 4.1 uses UTF-8. 3. Combine the vector of character strings into a single sequence, with each character string separated by a POSIX end-of-line character and a null byte. -4. Compute a hash on the resulting sequence using the standard MD5 hashing algorithm for Version 3 and using `SHA256 `_ for Version 4. The resulting hash is `base64 `_ encoded to support readability. +4. Compute a hash on the resulting sequence using the standard MD5 hashing algorithm for Version 3 and using `SHA256 `_ for Version 4. The resulting hash is `base64 `_ encoded to support readability. 5. Calculate the UNF for each lower-level data object, using a consistent UNF version and level of precision across the individual UNFs being combined. @@ -49,4 +49,4 @@ For example, the number pi at five digits is represented as -3.1415e+, and the n 8. Combine UNFs from multiple variables to form a single UNF for an entire data frame, and then combine UNFs for a set of data frames to form a single UNF that represents an entire research study. Learn more: -Software for computing UNFs is available in an R Module, which includes a Windows standalone tool and code for Stata and SAS languages. Also see the following for more details: Micah Altman and Gary King. 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data," D-Lib Magazine, Vol. 13, No. 3/4 (March). (Abstract: `HTML `_ | Article: `PDF `_) +Software for computing UNFs is available in an R Module, which includes a Windows standalone tool and code for Stata and SAS languages. Also see the following for more details: Micah Altman and Gary King. 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data," D-Lib Magazine, Vol. 13, No. 3/4 (March). (Abstract: `HTML `_ | Article: `PDF `_) diff --git a/doc/sphinx-guides/source/developers/unf/unf-v5.rst b/doc/sphinx-guides/source/developers/unf/unf-v5.rst index 29b4556b1f9..664ebe61da9 100644 --- a/doc/sphinx-guides/source/developers/unf/unf-v5.rst +++ b/doc/sphinx-guides/source/developers/unf/unf-v5.rst @@ -8,8 +8,8 @@ UNF Version 5 **Important Update:** -UNF Version 5 has been in use by the Dataverse project since 2009. It was built into every version of the DVN, starting with 2.0 and up to 3.6.2. However, some problems were recently found in that implementation. Namely, in certain cases data normalization is not implemented fully to the spec. UNF signatures it generates are still reasonably strong statistically; however, this means that at least some of our signatures are not independently verifiable. I.e., if somebody fully implements their own version of UNF calculator, for certain datasets it would calculate signatures different from those generated by the DVN. Unless of course they implement it with the exact same bugs as ours. +UNF Version 5 has been in use by the Dataverse Project since 2009. It was built into every version of The Dataverse Software, starting with 2.0 and up to 3.6.2. However, some problems were recently found in that implementation. Namely, in certain cases data normalization is not implemented fully to the spec. UNF signatures it generates are still reasonably strong statistically; however, this means that at least some of our signatures are not independently verifiable. I.e., if somebody fully implements their own version of UNF calculator, for certain datasets it would calculate signatures different from those generated by the DVN. Unless of course they implement it with the exact same bugs as ours. -To address this, the Project is about to release UNF Version 6. The release date is still being discussed. It may coincide with the release of Dataverse 4.0. Alternatively, the production version of DVN 3.6.3 may get upgraded to use UNF v6 prior to that. This will be announced shortly. In the process, we are solving another problem with UNF v5 - this time we've made an effort to offer very implementer-friendly documentation that describes the algorithm fully and unambiguously. So if you are interested in implementing your own version of a UNF calculator, (something we would like to encourage!) please proceed directly to the Version 6 documentation. +To address this, the Project is about to release UNF Version 6. The release date is still being discussed. It may coincide with the release of Dataverse Software 4.0. Alternatively, the production release of DVN 3.6.3 may get upgraded to use UNF v6 prior to that. This will be announced shortly. In the process, we are solving another problem with UNF v5 - this time we've made an effort to offer very implementer-friendly documentation that describes the algorithm fully and unambiguously. So if you are interested in implementing your own version of a UNF calculator, (something we would like to encourage!) please proceed directly to the Version 6 documentation. Going forward, we are going to offer a preserved version of the Version 5 library and, possibly, an online UNF v5 calculator, for the purposes of validating vectors and data sets for which published Version 5 UNFs exist. diff --git a/doc/sphinx-guides/source/developers/unf/unf-v6.rst b/doc/sphinx-guides/source/developers/unf/unf-v6.rst index b5c898a9a08..b2495ff3dd9 100644 --- a/doc/sphinx-guides/source/developers/unf/unf-v6.rst +++ b/doc/sphinx-guides/source/developers/unf/unf-v6.rst @@ -156,11 +156,11 @@ For example, to specify a non-default precision the parameter it is specified us | Allowed values are {``128`` , ``192`` , ``196`` , ``256``} with ``128`` being the default. | ``R1`` - **truncate** numeric values to ``N`` digits, **instead of rounding**, as previously described. -`Dr. Micah Altman's classic UNF v5 paper `_ mentions another optional parameter ``T###``, for specifying rounding of date and time values (implemented as stripping the values of entire components - fractional seconds, seconds, minutes, hours... etc., progressively) - but it doesn't specify its syntax. It is left as an exercise for a curious reader to contact the author and work out the details, if so desired. (Not implemented in UNF Version 6 by the Dataverse Project). +`Dr. Micah Altman's classic UNF v5 paper `_ mentions another optional parameter ``T###``, for specifying rounding of date and time values (implemented as stripping the values of entire components - fractional seconds, seconds, minutes, hours... etc., progressively) - but it doesn't specify its syntax. It is left as an exercise for a curious reader to contact the author and work out the details, if so desired. (Not implemented in UNF Version 6 by the Dataverse Project). Note: we do not recommend truncating character strings at fewer bytes than the default ``128`` (the ``X`` parameter). At the very least this number **must** be high enough so that the printable UNFs of individual variables or files are not truncated, when calculating combined UNFs of files or datasets, respectively. -It should also be noted that the Dataverse application never calculates UNFs with any non-default parameters. And we are not aware of anyone else actually doing so. If you are considering creating your own implementation of the UNF, it may be worth trying to create a simplified, defaults-only version first. Such an implementation would be sufficient to independently verify Dataverse-produced UNFs, among other things. +It should also be noted that the Dataverse Software never calculates UNFs with any non-default parameters. And we are not aware of anyone else actually doing so. If you are considering creating your own implementation of the UNF, it may be worth trying to create a simplified, defaults-only version first. Such an implementation would be sufficient to independently verify Dataverse Software-produced UNFs, among other things. .. _note2: diff --git a/doc/sphinx-guides/source/developers/version-control.rst b/doc/sphinx-guides/source/developers/version-control.rst index 618468392b7..fad8cac1400 100644 --- a/doc/sphinx-guides/source/developers/version-control.rst +++ b/doc/sphinx-guides/source/developers/version-control.rst @@ -2,16 +2,16 @@ Version Control ================== -The Dataverse Project uses git for version control and GitHub for hosting. On this page we'll explain where to find the code, our branching strategey, advice on how to make a pull request, and other git tips. +The Dataverse Project uses git for version control and GitHub for hosting. On this page we'll explain where to find the code, our branching strategy, advice on how to make a pull request, and other git tips. .. contents:: |toctitle| :local: -Where to Find the Dataverse Code --------------------------------- +Where to Find the Dataverse Software Code +----------------------------------------- -The main Dataverse code at https://github.com/IQSS/dataverse but as explained in the :doc:`intro` section under "Related Projects", there are many other code bases you can hack on if you wish! +The main Dataverse Software code is available at https://github.com/IQSS/dataverse but as explained in the :doc:`intro` section under "Related Projects", there are many other code bases you can hack on if you wish! Branching Strategy ------------------ @@ -19,12 +19,12 @@ Branching Strategy Goals ~~~~~ -The goals of the Dataverse branching strategy are: +The goals of the Dataverse Software branching strategy are: - allow for concurrent development - only ship stable code -We follow a simplified "git flow" model described at http://nvie.com/posts/a-successful-git-branching-model/ involving a "master" branch, a "develop" branch, and feature branches such as "1234-bug-fix". +We follow a simplified "git flow" model described at https://nvie.com/posts/a-successful-git-branching-model/ involving a "master" branch, a "develop" branch, and feature branches such as "1234-bug-fix". Branches ~~~~~~~~ @@ -32,7 +32,9 @@ Branches The "master" Branch ******************* -The "`master `_" branch represents released versions of Dataverse. As mentioned in the :doc:`making-releases` section, at release time we update the master branch to include all the code for that release. Commits are never made directly to master. Rather, master is updated only when we merge code into it from the "develop" branch. +The "`master `_" branch represents released versions of the Dataverse Software. As mentioned in the :doc:`making-releases` section, at release time we update the master branch to include all the code for that release. Commits are never made directly to master. Rather, master is updated only when we merge code into it from the "develop" branch. + +.. _develop-branch: The "develop" Branch ******************** @@ -46,6 +48,13 @@ Feature branches are used for both developing features and fixing bugs. They are "3728-doc-apipolicy-fix" is an example of a fine name for your feature branch. It tells us that you are addressing https://github.com/IQSS/dataverse/issues/3728 and the "slug" is short, descriptive, and starts with the issue number. +Hotfix Branches +*************** + +Hotfix branches are described under :doc:`making-releases`. + +.. _how-to-make-a-pull-request: + How to Make a Pull Request -------------------------- @@ -58,47 +67,197 @@ The example of creating a pull request below has to do with fixing an important Find or Create a GitHub Issue ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -For guidance on which issue to work on, please ask! Also, see https://github.com/IQSS/dataverse/blob/develop/CONTRIBUTING.md +An issue represents a bug (unexpected behavior) or a new feature in Dataverse. We'll use the issue number in the branch we create for our pull request. + +.. _finding-github-issues-to-work-on: + +Finding GitHub Issues to Work On +******************************** + +Assuming this is your first contribution to Dataverse, you should start with something small. The following issue labels might be helpful in your search: + +- `good first issue `_ (these appear at https://github.com/IQSS/dataverse/contribute ) +- `hacktoberfest `_ +- `Help Wanted: Code `_ +- `Help Wanted: Documentation `_ + +For guidance on which issue to work on, please ask! :ref:`getting-help-developers` explains how to get in touch. + +Creating GitHub Issues to Work On +********************************* -Let's say you want to tackle https://github.com/IQSS/dataverse/issues/3728 which points out a typo in a page of Dataverse's documentation. +You are very welcome to create a GitHub issue to work on. However, for significant changes, please reach out (see :ref:`getting-help-developers`) to make sure the team and community agree with the proposed change. + +For small changes and especially typo fixes, please don't worry about reaching out first. + +Communicate Which Issue You Are Working On +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the issue you can simply leave a comment to say you're working on it. If you tell us your GitHub username we are happy to add you to the "read only" team at https://github.com/orgs/IQSS/teams/dataverse-readonly/members so that we can assign the issue to you while you're working on it. You can also tell us if you'd like to be added to the `Dataverse Community Contributors spreadsheet `_. -Create a New Branch off the develop Branch +.. _create-branch-for-pr: + +Create a New Branch Off the develop Branch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing, and ends with a short, descriptive name. Dashes ("-") and underscores ("_") in your branch name are ok, but please try to avoid other special characters such as ampersands ("&") than have special meaning in Unix shells. +Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing (e.g. `#3728 `_) and ends with a short, descriptive name. Dashes ("-") and underscores ("_") in your branch name are ok, but please try to avoid other special characters such as ampersands ("&") that have special meaning in Unix shells. Please do not call your branch "develop" as it can cause maintainers :ref:`trouble `. Commit Your Change to Your New Branch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Making a commit (or several commits) to that branch. Ideally the first line of your commit message includes the number of the issue you are addressing, such as ``Fixed BlockedApiPolicy #3728``. +For each commit to that branch, try to include the issue number along with a summary in the first line of the commit message, such as ``Fixed BlockedApiPolicy #3728``. You are welcome to write longer descriptions in the body as well! + +.. _writing-release-note-snippets: + +Writing a Release Note Snippet +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We highly value your insight as a contributor when in comes to describing your work in our release notes. Not every pull request will be mentioned in release notes but most are. + +As described at :ref:`write-release-notes`, at release time we compile together release note "snippets" into the final release notes. + +Here's how to add a release note snippet to your pull request: + +- Create a Markdown file under ``doc/release-notes``. You can reuse the name of your branch and append ".md" to it, e.g. ``3728-doc-apipolicy-fix.md`` +- Edit the snippet to include anything you think should be mentioned in the release notes. Please include the following if they apply: + + - Descriptions of new features or bug fixed, including a link to the HTML preview of the docs you wrote (e.g. https://dataverse-guide--9939.org.readthedocs.build/en/9939/installation/config.html#smtp-email-configuration ) and the phrase "For more information, see #3728" (the issue number). If you know the PR number, you can add that too. + - New configuration settings + - Upgrade instructions + - Etc. + +Release note snippets do not need to be long. For a new feature, a single line description might be enough. Please note that your release note will likely be edited (expanded or shortened) when the final release notes are being created. Push Your Branch to GitHub ~~~~~~~~~~~~~~~~~~~~~~~~~~ -Push your feature branch to your fork of Dataverse. Your git command may look something like ``git push origin 3728-doc-apipolicy-fix``. +Push your feature branch to your fork of the Dataverse Software. Your git command may look something like ``git push origin 3728-doc-apipolicy-fix``. Make a Pull Request ~~~~~~~~~~~~~~~~~~~ -Make a pull request to get approval to merge your changes into the develop branch. Note that once a pull request is created, we'll remove the corresponding issue from our kanban board so that we're only tracking one card. +Make a pull request to get approval to merge your changes into the develop branch. + +Feedback on the pull request template we use is welcome! + +Here's an example of a pull request for issue #9729: https://github.com/IQSS/dataverse/pull/10474 + +Replace Issue with Pull Request +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If the pull request closes an issue that has been prioritized, someone from the core team will do the following: + +- Move the open issue to the "Done" column of the `project board`_. We do this to track only one card, the pull request, on the project board. Merging the pull request will close the issue because we use the "closes #1234" `keyword `_ . +- Copy all labels from the issue to the pull request with the exception of the "size" label. +- Add a size label to the pull request that reflects the amount of review and QA time needed. +- Move the pull request to the "Ready for Review" column. -Feedback on the pull request template we use is welcome! Here's an example of a pull request for issue #3827: https://github.com/IQSS/dataverse/pull/3827 +.. _project board: https://github.com/orgs/IQSS/projects/34 Make Sure Your Pull Request Has Been Advanced to Code Review ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Now that you've made your pull request, your goal is to make sure it appears in the "Code Review" column at https://github.com/orgs/IQSS/projects/2. +Now that you've made your pull request, your goal is to make sure it appears in the "Code Review" column on the `project board`_. + +Look at :ref:`getting-help-developers` for various ways to reach out to developers who have enough access to the GitHub repo to move your issue and pull request to the "Code Review" column. + +Summary of Git commands +~~~~~~~~~~~~~~~~~~~~~~~ + +This section provides sequences of Git commands for three scenarios: + +* preparing the first request, when the IQSS Dataverse Software repository and the forked repository are identical +* creating an additional request after some time, when the IQSS Dataverse Software repository is ahead of the forked repository +* while your pull requests are in review the develop branch has been updated, so you have to keep your code base synchronized with the current state of develop branch + +In the examples we use 123-COOL-FEATURE as the name of the feature branch, and https://github.com/YOUR_NAME/dataverse.git as your forked repository's URL. In practice modify both accordingly. + +**1st scenario: preparing the first pull request** + +.. code-block:: bash + + # clone Dataverse at Github.com ... then + + git clone https://github.com/YOUR_NAME/dataverse.git dataverse_fork + cd dataverse_fork + + # create a new branch locally for the pull request + git checkout -b 123-COOL-FEATURE + + # working on the branch ... then commit changes + git commit -am "#123 explanation of changes" + + # upload the new branch to https://github.com/YOUR_NAME/dataverse + git push -u origin 123-COOL-FEATURE + + # ... then create pull request at github.com/YOUR_NAME/dataverse + + +**2nd scenario: preparing another pull request some month later** + +.. code-block:: bash + + # register IQSS Dataverse repo + git remote add upstream https://github.com/IQSS/dataverse.git + + git checkout develop + + # update local develop branch from https://github.com/IQSS/dataverse + git fetch upstream develop + git rebase upstream/develop + + # update remote develop branch at https://github.com/YOUR_NAME/dataverse + git push + + # create a new branch locally for the pull request + git checkout -b 123-COOL-FEATURE + + # work on the branch and commit changes + git commit -am "#123 explanation of changes" + + # upload the new branch to https://github.com/YOUR_NAME/dataverse + git push -u origin 123-COOL-FEATURE + + # ... then create pull request at github.com/YOUR_NAME/dataverse + + +**3rd scenario: synchronize your branch with develop branch** + +.. code-block:: bash + + git checkout develop + + # update local develop branch from https://github.com/IQSS/dataverse + git fetch upstream develop + git rebase upstream/develop + + # update remote develop branch at https://github.com/YOUR_NAME/dataverse + git push + + # change to the already existing feature branch + git checkout 123-COOL-FEATURE + + # merge changes of develop to the feature branch + git merge develop + + # check if there are conflicts, if there are follow the next command, otherwise skip to next block + # 1. fix the relevant files (including testing) + # 2. commit changes + git add + git commit + + # update remote feature branch at https://github.com/YOUR_NAME/dataverse + git push -Look at https://github.com/IQSS/dataverse/blob/master/CONTRIBUTING.md for various ways to reach out to developers who have enough access to the GitHub repo to move your issue and pull request to the "Code Review" column. How to Resolve Conflicts in Your Pull Request --------------------------------------------- Unfortunately, pull requests can quickly become "stale" and unmergable as other pull requests are merged into the develop branch ahead of you. This is completely normal, and often occurs because other developers made their pull requests before you did. -The Dataverse team may ping you to ask you to merge the latest from the develop branch into your branch and resolve merge conflicts. If this sounds daunting, please just say so and we will assist you. +The Dataverse Project team may ping you to ask you to merge the latest from the develop branch into your branch and resolve merge conflicts. If this sounds daunting, please just say so and we will assist you. If you'd like to resolve the merge conflicts yourself, here are some steps to do so that make use of GitHub Desktop and Netbeans. @@ -110,7 +269,7 @@ If you'd like to resolve the merge conflicts yourself, here are some steps to do **In Netbeans:** -4. Click Window -> Favorites and open your local Dataverse project folder in the Favorites panel. +4. Click Window -> Favorites and open your local Dataverse Software project folder in the Favorites panel. 5. In this file browser, you can follow the red cylinder icon to find files with merge conflicts. 6. Double click the red merge conflicted file. 7. Right click on the red tab for that file and select Git -> Resolve Conflicts. @@ -123,26 +282,59 @@ If you'd like to resolve the merge conflicts yourself, here are some steps to do **In GitHub Issues:** -11. Leave a comment for the Dataverse team that you have resolved the merge conflicts. +11. Leave a comment for the Dataverse Project team that you have resolved the merge conflicts. Adding Commits to a Pull Request from a Fork -------------------------------------------- By default, when a pull request is made from a fork, "Allow edits from maintainers" is checked as explained at https://help.github.com/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork/ -This is a nice feature of GitHub because it means that the core dev team for Dataverse can make small (or even large) changes to a pull request from a contributor to help the pull request along on its way to QA and being merged. +This is a nice feature of GitHub because it means that the core dev team for the Dataverse Project can make small (or even large) changes to a pull request from a contributor to help the pull request along on its way to QA and being merged. -GitHub documents how to make changes to a fork at https://help.github.com/articles/committing-changes-to-a-pull-request-branch-created-from-a-fork/ but as of this writing the steps involve making a new clone of the repo. This works but you might find it more convenient to add a "remote" to your existing clone. The example below uses the fork at https://github.com/OdumInstitute/dataverse and the branch ``4709-postgresql_96`` but the technique can be applied to any fork and branch: +GitHub documents how to make changes to a fork at https://help.github.com/articles/committing-changes-to-a-pull-request-branch-created-from-a-fork/ but as of this writing the steps involve making a new clone of the repo. This works but you might find it more convenient to add a "remote" to your existing clone. The example below uses the fork at https://github.com/uncch-rdmc/dataverse and the branch ``4709-postgresql_96`` but the technique can be applied to any fork and branch: .. code-block:: bash - git remote add OdumInstitute git@github.com:OdumInstitute/dataverse.git - git fetch OdumInstitute + git remote add uncch-rdmc git@github.com:uncch-rdmc/dataverse.git + git fetch uncch-rdmc git checkout 4709-postgresql_96 vim path/to/file.txt git commit - git push OdumInstitute 4709-postgresql_96 + git push uncch-rdmc 4709-postgresql_96 + +.. _develop-into-develop: + +Handing a Pull Request from a "develop" Branch +---------------------------------------------- + +Note: this is something only maintainers of Dataverse need to worry about, typically. + +From time to time a pull request comes in from a fork of Dataverse that uses "develop" as the branch behind the PR. (We've started asking contributors not to do this. See :ref:`create-branch-for-pr`.) This is problematic because the "develop" branch is the main integration branch for the project. (See :ref:`develop-branch`.) ----- +If the PR is perfect and can be merged as-is, no problem. Just merge it. However, if you would like to push commits to the PR, you are likely to run into trouble with multiple "develop" branches locally. + +The following is a list of commands oriented toward the simple task of merging the latest from the "develop" branch into the PR but the same technique can be used to push other commits to the PR as well. In this example the PR is coming from username "coder123" on GitHub. At a high level, what we're doing is working in a safe place (/tmp) away from our normal copy of the repo. We clone the main repo from IQSS, check out coder123's version of "develop" (called "dev2" or "false develop"), merge the real "develop" into it, and push to the PR. + +If there's a better way to do this, please get in touch! + +.. code-block:: bash -Previous: :doc:`troubleshooting` | Next: :doc:`sql-upgrade-scripts` + # do all this in /tmp away from your normal code + cd /tmp + git clone git@github.com:IQSS/dataverse.git + cd dataverse + git remote add coder123 git@github.com:coder123/dataverse.git + git fetch coder123 + # check out coder123's "develop" to a branch with a different name ("dev2") + git checkout coder123/develop -b dev2 + # merge IQSS "develop" into coder123's "develop" ("dev2") + git merge origin/develop + # delete the IQSS "develop" branch locally (!) + git branch -d develop + # checkout "dev2" (false "develop") as "develop" for now + git checkout -b develop + # push the false "develop" to coder123's fork (to the PR) + git push coder123 develop + cd .. + # delete the tmp space (done! \o/) + rm -rf /tmp/dataverse diff --git a/doc/sphinx-guides/source/developers/windows.rst b/doc/sphinx-guides/source/developers/windows.rst index 0386713a161..3af844805b2 100755 --- a/doc/sphinx-guides/source/developers/windows.rst +++ b/doc/sphinx-guides/source/developers/windows.rst @@ -2,142 +2,112 @@ Windows Development =================== -Development on Windows is not well supported, unfortunately. You will have a much easier time if you develop on Mac or Linux as described under :doc:`dev-environment` section. - -If you want to try using Windows for Dataverse development, your best best is to use Vagrant, as described below. Minishift is also an option. These instructions were tested on Windows 10. - .. contents:: |toctitle| - :local: - -Running Dataverse in Vagrant ----------------------------- + :local: -Install Vagrant -~~~~~~~~~~~~~~~ +Running Dataverse in Windows WSL +-------------------------------- -Download and install Vagrant from https://www.vagrantup.com +The simplest method to run Dataverse in Windows 10 and 11 is using Docker and Windows Subsystem for Linux (WSL) - specifically WSL 2. +Once Docker and WSL are installed, you can follow the :ref:`quickstart instructions `. -Vagrant advises you to reboot but let's install VirtualBox first. +Please note: these instructions have not been extensively tested. They have been found to work with the Ubuntu-24.04 distribution for WSL. If you find any problems, please open an issue at https://github.com/IQSS/dataverse/issues and/or submit a PR to update this guide. -Install VirtualBox -~~~~~~~~~~~~~~~~~~ +Install Docker Desktop +~~~~~~~~~~~~~~~~~~~~~~ -Download and install VirtualBox from https://www.virtualbox.org +Follow the directions at https://www.docker.com to install Docker Desktop on Windows. If prompted, turn on WSL 2 during installation. -Note that we saw an error saying "Oracle VM VirtualBox 5.2.8 Setup Wizard ended prematurely" but then we re-ran the installer and it seemed to work. +Settings you may need in Docker Desktop: -Reboot -~~~~~~ +* **General/Expose daemon on tcp://localhost:2375 without TLS**: true +* **General/Use the WSL 2 based engine**: true +* **General/Add the \*.docker.internal names to the host's /etc/hosts file (Requires password)**: true +* **Resources/WSL Integration/Enable integration with my default WSL distro**: true +* **Resources/WSL Integration/Enable integration with additional distros**: select any you run Dataverse in -Again, Vagrant asks you to reboot, so go ahead. - -Install Git +Install WSL ~~~~~~~~~~~ +If you install Docker Desktop, you should already have WSL installed. If not, or if you wish to add an additional Linux distribution, open PowerShell. -Download and install Git from https://git-scm.com - -Configure Git to use Unix Line Endings -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Launch Git Bash and run the following commands: - -``git config --global core.autocrlf input`` - -Pro tip: Use Shift-Insert to paste into Git Bash. - -See also https://help.github.com/articles/dealing-with-line-endings/ - -If you skip this step you are likely to see the following error when you run ``vagrant up``. +If WSL itself is not installed run: + +.. code-block:: powershell + + wsl --install -``/tmp/vagrant-shell: ./install: /usr/bin/perl^M: bad interpreter: No such file or directory`` +For use with Docker, you should use WSL v2 - run: -Clone Git Repo -~~~~~~~~~~~~~~ +.. code-block:: powershell + + wsl --set-default-version 2 -From Git Bash, run the following command: +Install a specific Linux distribution. To see the list of possible distributions: -``git clone https://github.com/IQSS/dataverse.git`` +.. code-block:: powershell -vagrant up -~~~~~~~~~~ + wsl --list --online -From Git Bash, run the following commands: +Choose the distribution you would like. Then run the following command. These instructions were tested with ``Ubuntu 24.04 LTS``. -``cd dataverse`` +.. code-block:: powershell -The ``dataverse`` directory you changed is the one you just cloned. Vagrant will operate on a file called ``Vagrantfile``. + wsl --install -d -``vagrant up`` +You will be asked to create an initial Linux user. -After a long while you hopefully will have Dataverse installed at http://localhost:8888 +.. note:: + Using wsl --set-version to upgrade an existing distribution from WSL 1 to WSL 2 may not work - installing a new distribution using WSL 2 is recommended. -Running Dataverse in Minishift ------------------------------- - -Minishift is a dev environment for OpenShift, which is Red Hat's distribution of Kubernetes. The :doc:`containers` section contains much more detail but the essential steps for using Minishift on Windows are described below. - -Install VirtualBox -~~~~~~~~~~~~~~~~~~ - -Download and install VirtualBox from https://www.virtualbox.org - -Install Git +Prepare WSL ~~~~~~~~~~~ -Download and install Git from https://git-scm.com - -Install Minishift -~~~~~~~~~~~~~~~~~ - -Download Minishift from https://docs.openshift.org/latest/minishift/getting-started/installing.html . It should be a zip file. - -From Git Bash: +Once that you have WSL installed, You will need Java and MVN working inside WSL, how you go about this will depend on the Linux distribution you installed in WSL. -``cd ~/Downloads`` +Here is an example using SDKMAN, which is not required, but it is recommended for managing Java and other SDKs. -``unzip minishift*.zip`` +.. code-block:: bash -``mkdir ~/bin`` + sudo apt update + sudo apt install zip -``cp minishift*/minishift.exe ~/bin`` +.. code-block:: bash -Clone Git Repo -~~~~~~~~~~~~~~ + sudo apt update + sudo apt install unzip -From Git Bash, run the following commands: +.. code-block:: bash -``git config --global core.autocrlf input`` + curl -s "https://get.sdkman.io" | bash + source "$HOME/.sdkman/bin/sdkman-init.sh" -``git clone https://github.com/IQSS/dataverse.git`` +.. code-block:: bash -Start Minishift VM and Run Dataverse -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + sdk install java 17.0.7-tem -``minishift start --vm-driver=virtualbox --memory=8GB`` +.. code-block:: bash -``eval $(minishift oc-env)`` + sdk install maven -``oc new-project project1`` - -``cd ~/dataverse`` +Install Dataverse +~~~~~~~~~~~~~~~~~ -``oc new-app conf/openshift/openshift.json`` +Open a Linux terminal (e.g. use Windows Terminal and open a tab for the Linux distribution you selected). Then install Dataverse in WSL following the :ref:`quickstart instructions `. You should then have a working Dataverse instance. -``minishift console`` +We strongly recommend that you clone the Dataverse repository from WSL, not from Windows. This will ensure that builds are much faster. -This should open a web browser. In Microsoft Edge we saw ``INET_E_RESOURCE_NOT_FOUND`` so if you see that, try Chrome instead. A cert error is expected. Log in with the username "developer" and any password such as "asdf". +IDEs for Dataverse in Windows +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Under "Overview" you should see a URL that has "dataverse-project1" in it. You should be able to click it and log into Dataverse with the username "dataverseAdmin" and the password "admin". +You can use your favorite editor or IDE to edit Dataverse project files. Files in WSL are accessible from Windows for editing using the path ``\\wsl.localhost``. Your Linux distribution files should also be visible in File Explorer under the This PC/Linux entry. -Improving Windows Support -------------------------- +.. note:: FYI: For the best performance, it is recommended, with WSL 2, to store Dataverse files in the WSL/Linux file system and to access them from there with your Windows-based IDE (versus storing Dataverse files in your Windows file system and trying to run maven and build from Linux - access to /mnt/c files using WSL 2 is slow). -Windows Subsystem for Linux -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +pgAdmin in Windows for Dataverse +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -We have been unable to get Windows Subsystem for Linux (WSL) to work. We tried following the steps at https://docs.microsoft.com/en-us/windows/wsl/install-win10 but the "Get" button was greyed out when we went to download Ubuntu. +You can access the Dataverse database from Windows. -Discussion and Feedback -~~~~~~~~~~~~~~~~~~~~~~~ +Install pgAdmin from https://www.pgadmin.org/download/pgadmin-4-windows/ -For more discussion of Windows support for Dataverse development see our community list thread `"Do you want to develop on Windows?" `_ We would be happy to inconrporate feedback from Windows developers into this page. The :doc:`documentation` section describes how. +In pgAdmin, register a server using ``127.0.0.1`` with port ``5432``. For the database name, username, and password, see :ref:`db-name-creds`. Now you will be able to access, monitor, and update the Dataverse database. diff --git a/doc/sphinx-guides/source/developers/workflows.rst b/doc/sphinx-guides/source/developers/workflows.rst index b9090b86be3..38ca6f4e141 100644 --- a/doc/sphinx-guides/source/developers/workflows.rst +++ b/doc/sphinx-guides/source/developers/workflows.rst @@ -1,7 +1,7 @@ Workflows -================ +========= -Dataverse has a flexible workflow mechanism that can be used to trigger actions before and after Dataset publication. +The Dataverse Software has a flexible workflow mechanism that can be used to trigger actions before and after Dataset publication. .. contents:: |toctitle| :local: @@ -10,35 +10,37 @@ Dataverse has a flexible workflow mechanism that can be used to trigger actions Introduction ------------ -Dataverse can perform two sequences of actions when datasets are published: one prior to publishing (marked by a ``PrePublishDataset`` trigger), and one after the publication has succeeded (``PostPublishDataset``). The pre-publish workflow is useful for having an external system prepare a dataset for being publicly accessed (a possibly lengthy activity that requires moving files around, uploading videos to a streaming server, etc.), or to start an approval process. A post-publish workflow might be used for sending notifications about the newly published dataset. +The Dataverse Software can perform two sequences of actions when datasets are published: one prior to publishing (marked by a ``PrePublishDataset`` trigger), and one after the publication has succeeded (``PostPublishDataset``). The pre-publish workflow is useful for having an external system prepare a dataset for being publicly accessed (a possibly lengthy activity that requires moving files around, uploading videos to a streaming server, etc.), or to start an approval process. A post-publish workflow might be used for sending notifications about the newly published dataset. -Workflow steps are created using *step providers*. Dataverse ships with an internal step provider that offers some basic functionality, and with the ability to load 3rd party step providers. This allows installations to implement functionality they need without changing the Dataverse source code. +Workflow steps are created using *step providers*. The Dataverse Software ships with an internal step provider that offers some basic functionality, and with the ability to load 3rd party step providers (currently disabled). This allows installations to implement functionality they need without changing the Dataverse Software source code. -Steps can be internal (say, writing some data to the log) or external. External steps involve Dataverse sending a request to an external system, and waiting for the system to reply. The wait period is arbitrary, and so allows the external system unbounded operation time. This is useful, e.g., for steps that require human intervension, such as manual approval of a dataset publication. +Steps can be internal (say, writing some data to the log) or external. External steps involve the Dataverse Software sending a request to an external system, and waiting for the system to reply. The wait period is arbitrary, and so allows the external system unbounded operation time. This is useful, e.g., for steps that require human intervention, such as manual approval of a dataset publication. -The external system reports the step result back to dataverse, by sending a HTTP ``POST`` command to ``api/workflows/{invocation-id}``. The body of the request is passed to the paused step for further processing. +The external system reports the step result back to the Dataverse installation, by sending a HTTP ``POST`` command to ``api/workflows/{invocation-id}`` with Content-Type: text/plain. The body of the request is passed to the paused step for further processing. -If a step in a workflow fails, Dataverse make an effort to roll back all the steps that preceded it. Some actions, such as writing to the log, cannot be rolled back. If such an action has a public external effect (e.g. send an EMail to a mailing list) it is advisable to put it in the post-release workflow. +Steps can define messages to send to the log and to users. If defined, the message to users is sent as a user notification (creating an email and showing in the user notification tab) and will show once for the given user if/when they view the relevant dataset page. The latter provides a means for the asynchronous workflow execution to report success or failure analogous to the way the publication and other processes report on the page. + +If a step in a workflow fails, the Dataverse installation makes an effort to roll back all the steps that preceded it. Some actions, such as writing to the log, cannot be rolled back. If such an action has a public external effect (e.g. send an EMail to a mailing list) it is advisable to put it in the post-release workflow. .. tip:: - For invoking external systems using a REST api, Dataverse's internal step - provider offers a step for sending and receiving customizable HTTP requests. - It's called *http/sr*, and is detailed below. + For invoking external systems using a REST api, the Dataverse Software's internal step + provider offers two steps for sending and receiving customizable HTTP requests. + *http/sr* and *http/authExt*, detailed below, with the latter able to use the API to make changes to the dataset being processed. (Both lock the dataset to prevent other processes from changing the dataset between the time the step is launched to when the external process responds to the Dataverse instance.) Administration ~~~~~~~~~~~~~~ -A Dataverse instance stores a set of workflows in its database. Workflows can be managed using the ``api/admin/workflows/`` endpoints of the :doc:`/api/native-api`. Sample workflow files are available in ``scripts/api/data/workflows``. +A Dataverse installation stores a set of workflows in its database. Workflows can be managed using the ``api/admin/workflows/`` endpoints of the :doc:`/api/native-api`. Sample workflow files are available in ``scripts/api/data/workflows``. At the moment, defining a workflow for each trigger is done for the entire instance, using the endpoint ``api/admin/workflows/default/ÂĢtrigger typeÂģ``. -In order to prevent unauthorized resuming of workflows, Dataverse maintains a "white list" of IP addresses from which resume requests are honored. This list is maintained using the ``/api/admin/workflows/ip-whitelist`` endpoint of the :doc:`/api/native-api`. By default, Dataverse honors resume requests from localhost only (``127.0.0.1;::1``), so set-ups that use a single server work with no additional configuration. +In order to prevent unauthorized resuming of workflows, the Dataverse installation maintains a "white list" of IP addresses from which resume requests are honored. This list is maintained using the ``/api/admin/workflows/ip-whitelist`` endpoint of the :doc:`/api/native-api`. By default, the Dataverse installation honors resume requests from localhost only (``127.0.0.1;::1``), so set-ups that use a single server work with no additional configuration. Available Steps ~~~~~~~~~~~~~~~ -Dataverse has an internal step provider, whose id is ``:internal``. It offers the following steps: +The Dataverse Software has an internal step provider, whose id is ``:internal``. It offers the following steps: log +++ @@ -60,7 +62,8 @@ A step that writes data about the current workflow invocation to the instance lo pause +++++ -A step that pauses the workflow. The workflow is paused until a POST request is sent to ``/api/workflows/{invocation-id}``. +A step that pauses the workflow. The workflow is paused until a POST request is sent to ``/api/workflows/{invocation-id}``. Sending 'fail' in the POST body (Content-type:text/plain) will trigger a failure and workflow rollback. All other responses are considered as successes. +The pause step is intended for testing - the invocationId required to end the pause is only available in the log (and database). Adding a parameter (see log step) with the key/value "authorized":"true" will allow the invocationId to be used as a credential as with the http/authext step below. .. code:: json @@ -69,11 +72,29 @@ A step that pauses the workflow. The workflow is paused until a POST request is "stepType":"pause" } +pause/message ++++++++++++++ + +A variant of the pause step that pauses the workflow and allows the external process to send a success/failure message. The workflow is paused until a POST request is sent to ``/api/workflows/{invocation-id}``. +The response in the POST body (Content-type:application/json) should be a json object (the same as for the http/extauth step) containing: +- "status" - can be "success" or "failure" +- "reason" - a message that will be logged +- "message" - a message to send to the user that will be sent as a notification and as a banner on the relevant dataset page. +An unparsable reponse will be considered a Failure that will be logged with no user message. (See the http/authext step for an example POST call) + +.. code:: json + + { + "provider":":internal", + "stepType":"pause/message" + } + http/sr +++++++ A step that sends a HTTP request to an external system, and then waits for a response. The response has to match a regular expression specified in the step parameters. The url, content type, and message body can use data from the workflow context, using a simple markup language. This step has specific parameters for rollback. +The workflow is restarted when the external system replies with a POST request to ``/api/workflows/{invocation-id}``. Responses starting with "OK" (Content-type:text/plain) are considered successes. Other responses will be considered failures and trigger workflow rollback. .. code:: json @@ -103,12 +124,63 @@ Available variables are: * ``majorVersion`` * ``releaseStatus`` +http/authext +++++++++++++ + +Similar to the *http/sr* step. A step that sends a HTTP request to an external system, and then waits for a response. The receiver can use the invocationId of the workflow in lieu of an api key to perform work on behalf of the user launching the workflow. +The invocationId must be sent as an 'X-Dataverse-invocationId' HTTP Header or as an ?invocationId= query parameter. *Note that any external process started using this step then has the ability to access a Dataverse instance via the API as the user.* +Once this step completes and responds, the invocationId is invalidated and will not allow further access. + +The url, content type, and message body can use data from the workflow context, using a simple markup language. This step has specific parameters for rollback. +The workflow is restarted when the external system replies with a POST request to ``/api/workflows/{invocation-id}`` (Content-Type: application/json). + +The response has is expected to be a json object with three keys: +- "status" - can be "success" or "failure" +- "reason" - a message that will be logged +- "message" - a message to send to the user that will be sent as a notification and as a banner on the relevant dataset page. + +.. code-block:: bash + + export INVOCATION_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export MESSAGE={"status":"success", "reason":"Workflow completed in 10 seconds", "message":"An external workflow to virus check your data was successfully run prior to publication of your data"} + + curl -H 'Content-Type:application/json' -X POST -d $MESSAGE "$SERVER_URL/api/workflows/$INVOCATION_ID" + +.. code:: json + + { + "provider":":internal", + "stepType":"http/authext", + "parameters": { + "url":"http://localhost:5050/dump/${invocationId}", + "method":"POST", + "contentType":"text/plain", + "body":"START RELEASE ${dataset.id} as ${dataset.displayName}", + "rollbackUrl":"http://localhost:5050/dump/${invocationId}", + "rollbackMethod":"DELETE ${dataset.id}" + } + } + +Available variables are: + +* ``invocationId`` +* ``dataset.id`` +* ``dataset.identifier`` +* ``dataset.globalId`` +* ``dataset.displayName`` +* ``dataset.citation`` +* ``minorVersion`` +* ``majorVersion`` +* ``releaseStatus`` + + archiver ++++++++ -A step that sends an archival copy of a Dataset Version to a configured archiver, e.g. the DuraCloud interface of Chronopolis. See the `DuraCloud/Chronopolis Integration documentation `_ for further detail. +A step that sends an archival copy of a Dataset Version to a configured archiver, e.g. the DuraCloud interface of Chronopolis. See :ref:`rda-bagit-archiving` for further detail. -Note - the example step includes two settings required for any archiver and three (DuraCloud*) that are specific to DuraCloud. +Note - the example step includes two settings required for any archiver, three (DuraCloud*) that are specific to DuraCloud, and the optional BagGeneratorThreads setting that controls parallelism when creating the Bag. .. code:: json @@ -124,7 +196,36 @@ Note - the example step includes two settings required for any archiver and thre ":ArchiverSettings": "string", ":DuraCloudHost":"string", ":DuraCloudPort":"string", - ":DuraCloudContext":"string" + ":DuraCloudContext":"string", + ":BagGeneratorThreads":"string" + } + } + + +ldnannounce ++++++++++++ + +An experimental step that sends a Linked Data Notification (LDN) message to a specific LDN Inbox announcing the publication/availability of a dataset meeting certain criteria. + +The two parameters are +* ``:LDNAnnounceRequiredFields`` - a list of metadata fields that must exist to trigger the message. Currently, the message also includes the values for these fields but future versions may only send the dataset's persistent identifier (making the receiver responsible for making a call-back to get any metadata). +* ``:LDNTarget`` - a JSON object containing an ``inbox`` key whose value is the URL of the target LDN inbox to which messages should be sent, e.g. ``{"id": "https://dashv7-dev.lib.harvard.edu","inbox": "https://dashv7-api-dev.lib.harvard.edu/server/ldn/inbox","type": "Service"}`` ). + +The supported message format is desribed by `our preliminary specification `_. The format is expected to change in the near future to match the standard for relationship announcements being developed as part of `the COAR Notify Project `_. + + +.. code:: json + + + { + "provider":":internal", + "stepType":"ldnannounce", + "parameters": { + "stepName":"LDN Announce" + }, + "requiredSettings": { + ":LDNAnnounceRequiredFields": "string", + ":LDNTarget": "string" } } diff --git a/doc/sphinx-guides/source/index.rst b/doc/sphinx-guides/source/index.rst index 5ab11da54e6..0b1d85718b2 100755 --- a/doc/sphinx-guides/source/index.rst +++ b/doc/sphinx-guides/source/index.rst @@ -6,7 +6,7 @@ Dataverse Documentation v. |version| ==================================== -These documentation guides are for the |version| version of Dataverse. To find guides belonging to previous versions, :ref:`guides_versions` has a list of all available versions. +These documentation guides are for the |version| version of Dataverse. To find guides belonging to previous or future versions, :ref:`guides_versions` has a list of all available versions. .. toctree:: :glob: @@ -15,34 +15,40 @@ These documentation guides are for the |version| version of Dataverse. To find g user/index admin/index + ai/index api/index installation/index + contributor/index developers/index + container/index style/index + qa/index.md How the Guides Are Organized -============================ +---------------------------- The guides are documentation that explain how to use Dataverse, which are divided into the following sections: User Guide, -Installation Guide, Developer Guide, API Guide and Style Guide. The User Guide is further divided into primary activities: finding & using +Installation Guide, Developer Guide, API Guide, Style Guide and Container Guide. +The User Guide is further divided into primary activities: finding & using data, adding Datasets, administering dataverses or Datasets, and Dataset exploration/visualizations. Details on all of the above tasks can be found in the Users Guide. The Installation Guide is for people or organizations who want to host their -own Dataverse. The Developer Guide contains instructions for +own Dataverse. The Container Guide gives information on how to deploy Dataverse with containers. +The Developer Guide contains instructions for people who want to contribute to the Open Source Dataverse project or who want to modify the code to suit their own needs. Finally, the API Guide is for Developers that work on other applications and are interested in connecting with Dataverse through our APIs. Other Resources -=============== +--------------- **Dataverse Project Site** Additional information about the Dataverse Project itself including presentations, information about upcoming releases, data management and citation, and announcements can be found at -`http://dataverse.org/ `__ +`https://dataverse.org/ `__ **User Group** @@ -65,14 +71,12 @@ The support email address is `support@dataverse.org `__ -or use `GitHub pull requests `__, +or use `GitHub pull requests `__, if you have some code, scripts or documentation that you'd like to share. -If you have a **security issue** to report, please email `security@dataverse.org `__. +If you have a **security issue** to report, please email `security@dataverse.org `__. See also :ref:`reporting-security-issues`. Indices and Tables -================== +------------------ -* :ref:`genindex` -* :ref:`modindex` * :ref:`search` diff --git a/doc/sphinx-guides/source/installation/advanced.rst b/doc/sphinx-guides/source/installation/advanced.rst index a1f559af57d..bee289ecd5b 100644 --- a/doc/sphinx-guides/source/installation/advanced.rst +++ b/doc/sphinx-guides/source/installation/advanced.rst @@ -7,22 +7,24 @@ Advanced installations are not officially supported but here we are at least doc .. contents:: |toctitle| :local: -Multiple Glassfish Servers --------------------------- +.. _multiple-app-servers: -You should be conscious of the following when running multiple Glassfish servers. +Multiple App Servers +-------------------- -- Only one Glassfish server can be the dedicated timer server, as explained in the :doc:`/admin/timers` section of the Admin Guide. -- When users upload a logo or footer for their dataverse using the "theme" feature described in the :doc:`/user/dataverse-management` section of the User Guide, these logos are stored only on the Glassfish server the user happend to be on when uploading the logo. By default these logos and footers are written to the directory ``/usr/local/glassfish4/glassfish/domains/domain1/docroot/logos``. -- When a sitemp is created by a Glassfish server it is written to the filesystem of just that Glassfish server. By default the sitemap is written to the directory ``/usr/local/glassfish4/glassfish/domains/domain1/docroot/sitemap``. -- If Make Data Count is used, its raw logs must be copied from each Glassfish server to single instance of Counter Processor. See also the ``:MDCLogPath`` database setting in the :doc:`config` section of this guide and the :doc:`/admin/make-data-count` section of the Admin Guide. -- Dataset draft version logging occurs separately on each Glassfish server. See "Edit Draft Versions Logging" in the :doc:`/admin/monitoring` section of the Admin Guide for details. -- Password aliases (``db_password_alias``, etc.) are stored per Glassfish server. +You should be conscious of the following when running multiple app servers. -Detecting Which Glassfish Server a User Is On -+++++++++++++++++++++++++++++++++++++++++++++ +- Only one app server can be the dedicated timer server, as explained in the :doc:`/admin/timers` section of the Admin Guide. +- When users upload a logo or footer for their Dataverse collection using the "theme" feature described in the :doc:`/user/dataverse-management` section of the User Guide, these logos are stored only on the app server the user happened to be on when uploading the logo. By default these logos and footers are written to the directory ``/usr/local/payara6/glassfish/domains/domain1/docroot/logos``. +- When a sitemap is created by an app server it is written to the filesystem of just that app server. By default the sitemap is written to the directory ``/usr/local/payara6/glassfish/domains/domain1/docroot/sitemap``. +- If Make Data Count is used, its raw logs must be copied from each app server to single instance of Counter Processor. See also :ref:`:MDCLogPath` section in the Configuration section of this guide and the :doc:`/admin/make-data-count` section of the Admin Guide. +- Dataset draft version logging occurs separately on each app server. See :ref:`edit-draft-versions-logging` section in Monitoring of the Admin Guide for details. +- Password aliases (``dataverse.db.password``, etc.) are stored per app server. -If you have successfully installed multiple Glassfish servers behind a load balancer you might like to know which server a user has landed on. A straightforward solution is to place a file called ``host.txt`` in a directory that is served up by Apache such as ``/var/www/html`` and then configure Apache not to proxy requests to ``/host.txt`` to Glassfish. Here are some example commands on RHEL/CentOS 7 that accomplish this:: +Detecting Which App Server a User Is On ++++++++++++++++++++++++++++++++++++++++ + +If you have successfully installed multiple app servers behind a load balancer you might like to know which server a user has landed on. A straightforward solution is to place a file called ``host.txt`` in a directory that is served up by Apache such as ``/var/www/html`` and then configure Apache not to proxy requests to ``/host.txt`` to the app server. Here are some example commands on RHEL/derivatives that accomplish this:: [root@server1 ~]# vim /etc/httpd/conf.d/ssl.conf [root@server1 ~]# grep host.txt /etc/httpd/conf.d/ssl.conf @@ -32,6 +34,114 @@ If you have successfully installed multiple Glassfish servers behind a load bala [root@server1 ~]# curl https://dataverse.example.edu/host.txt server1.example.edu -You would repeat the steps above for all of your Glassfish servers. If users seem to be having a problem with a particular server, you can ask them to visit https://dataverse.example.edu/host.txt and let you know what they see there (e.g. "server1.example.edu") to help you know which server to troubleshoot. +You would repeat the steps above for all of your app servers. If users seem to be having a problem with a particular server, you can ask them to visit https://dataverse.example.edu/host.txt and let you know what they see there (e.g. "server1.example.edu") to help you know which server to troubleshoot. + +Please note that :ref:`network-ports` under the Configuration section has more information on fronting your app server with Apache. The :doc:`shibboleth` section talks about the use of ``ProxyPassMatch``. + +Licensing +--------- + +Dataverse allows superusers to specify the list of allowed licenses, to define which license is the default, to decide whether users can instead define custom terms, and to mark obsolete licenses as "inactive" to stop further use of them. +These can be accomplished using the :ref:`native API ` and the :ref:`:AllowCustomTermsOfUse <:AllowCustomTermsOfUse>` setting. See also :ref:`license-config`. + +.. _standardizing-custom-licenses: + +Standardizing Custom Licenses ++++++++++++++++++++++++++++++ + +In addition, if many datasets use the same set of Custom Terms, it may make sense to create and register a standard license including those terms. Doing this would include: + +- Creating and posting an external document that includes the custom terms, i.e. an HTML document with sections corresponding to the terms fields that are used. +- Defining a name, short description, URL (where it is posted), and optionally an icon URL for this license. +- Using the Dataverse API to register the new license as one of the options available in your installation. +- Using the API to make sure the license is active and deciding whether the license should also be the default. +- Once the license is registered with Dataverse, making an SQL update to change datasets/versions using that license to reference it instead of having their own copy of those custom terms. + +The benefits of this approach are: + +- usability: the license can be selected for new datasets without allowing custom terms and without users having to cut/paste terms or collection administrators having to configure templates with those terms +- efficiency: custom terms are stored per dataset whereas licenses are registered once and all uses of it refer to the same object and external URL +- security: with the license terms maintained external to Dataverse, users cannot edit specific terms and curators do not need to check for edits + +Once a standardized version of you Custom Terms are registered as a license, an SQL update like the following can be used to have datasets use it: + +:: + + UPDATE termsofuseandaccess + SET license_id = (SELECT license.id FROM license WHERE license.name = ''), termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null + WHERE termsofuseandaccess.termsofuse LIKE '%%'; + +Optional Components +------------------- + +.. _zipdownloader: + +Standalone "Zipper" Service Tool +++++++++++++++++++++++++++++++++ + +As of Dataverse Software 5.0 we offer an **experimental** optimization for the multi-file, download-as-zip functionality. +If this option (``:CustomZipDownloadServiceUrl``) is enabled, instead of enforcing the size limit on multi-file zipped +downloads (as normally specified by the option ``:ZipDownloadLimit``), we attempt to serve all the files that the user +requested (that they are authorized to download), but the request is redirected to a standalone zipper service running +as a cgi-bin executable under Apache. This moves these potentially long-running jobs completely outside the Application Server (Payara), and prevents worker threads from becoming locked serving them. Since zipping is also a CPU-intensive task, it is possible to have +this service running on a different host system, freeing the cycles on the main Application Server. (The system running +the service needs to have access to the database as well as to the storage filesystem, and/or S3 bucket). + +Please consult the `README at scripts/zipdownload `_ +in the Dataverse Software 5.0+ source tree for more information. + +To install: + +1. Follow the instructions in the file above to build ``zipdownloader-0.0.1.jar``. Please note that the package name and + the version were changed as of the release 5.10, as part of an overall cleanup and reorganization of the project + tree. In the releases 5.0-5.9 it existed under the name ``ZipDownloadService-v1.0.0``. (A pre-built jar file was + distributed under that name as part of the 5.0 release on GitHub. Aside from the name change, there have been no + changes in the functionality of the tool). +2. Copy it, together with the shell script :download:`cgi-bin/zipdownload <../../../../scripts/zipdownload/cgi-bin/zipdownload>` + to the ``cgi-bin`` directory of the chosen Apache server (``/var/www/cgi-bin`` standard). +3. Make sure the shell script (``zipdownload``) is executable, and edit it to configure the database access credentials. + Do note that the executable does not need access to the entire Dataverse installation database. A security-conscious + admin can create a dedicated database user with access to just one table: ``CUSTOMZIPSERVICEREQUEST``. + +You may need to make extra Apache configuration changes to make sure ``/cgi-bin/zipdownload`` is accessible from the outside. +For example, if this is the same Apache that's in front of your Dataverse installation Payara instance, you will need to +add another pass through statement to your configuration: + +``ProxyPassMatch ^/cgi-bin/zipdownload !`` + +Test this by accessing it directly at ``/cgi-bin/download``. You should get a ``404 No such download job!``. +If instead you are getting an "internal server error", this may be an SELinux issue; try ``setenforce Permissive``. +If you are getting a generic Dataverse collection "not found" page, review the ``ProxyPassMatch`` rule you have added. + +To activate in your Dataverse installation:: + + curl -X PUT -d '/cgi-bin/zipdownload' http://localhost:8080/api/admin/settings/:CustomZipDownloadServiceUrl + +.. _external-exporters: + +External Metadata Exporters ++++++++++++++++++++++++++++ + +Dataverse 5.14+ supports the configuration of external metadata exporters (just "external exporters" or "exporters" for short) as a way to add additional metadata export formats or replace built-in formats. For a list of built-in formats, see :ref:`metadata-export-formats` in the User Guide. + +This should be considered an **experimental** capability in that the mechanism is expected to evolve and using it may require additional effort when upgrading to new Dataverse versions. + +Enabling External Exporters +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Use the :ref:`dataverse.spi.exporters.directory` configuration option to specify a directory from which external exporters (JAR files) should be loaded. + +.. _inventory-of-external-exporters: + +Inventory of External Exporters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For a list of external exporters, see the README at https://github.com/gdcc/dataverse-exporters. To highlight a few: + +- Croissant +- RO-Crate + +Developing New Exporters +^^^^^^^^^^^^^^^^^^^^^^^^ -Please note that "Network Ports" under the :doc:`config` section has more information on fronting Glassfish with Apache. The :doc:`shibboleth` section talks about the use of ``ProxyPassMatch``. +See :doc:`/developers/metadataexport` for details about how to develop new exporters. diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 0a66a010ac9..f12a5d7dc7d 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -1,16 +1,17 @@ -============= Configuration ============= -Now that you've successfully logged into Dataverse with a superuser account after going through a basic :doc:`installation-main`, you'll need to secure and configure your installation. +Now that you've successfully logged into your Dataverse installation with a superuser account after going through a basic :doc:`installation-main`, you'll need to secure and configure your installation. -Settings within Dataverse itself are managed via JVM options or by manipulating values in the ``setting`` table directly or through API calls. +Settings within your Dataverse installation itself are managed via JVM options or by manipulating values in the ``setting`` table directly or through API calls. -Once you have finished securing and configuring your Dataverse installation, you may proceed to the :doc:`/admin/index` for more information on the ongoing administration of a Dataverse installation. Advanced configuration topics are covered in the :doc:`r-rapache-tworavens`, :doc:`shibboleth` and :doc:`oauth2` sections. +Once you have finished securing and configuring your Dataverse installation, you may proceed to the :doc:`/admin/index` for more information on the ongoing administration of a Dataverse installation. Advanced configuration topics are covered in the :doc:`shibboleth` and :doc:`oauth2` sections. .. contents:: |toctitle| :local: +.. _securing-your-installation: + Securing Your Installation -------------------------- @@ -24,18 +25,75 @@ The default password for the "dataverseAdmin" superuser account is "admin", as m Blocking API Endpoints ++++++++++++++++++++++ -The :doc:`/api/native-api` contains a useful but potentially dangerous API endpoint called "admin" that allows you to change system settings, make ordinary users into superusers, and more. The "builtin-users" endpoint lets people create a local/builtin user account if they know the key defined in :ref:`BuiltinUsers.KEY`. The endpoint "test" is not used but is where testing code maybe be added (see https://github.com/IQSS/dataverse/issues/4137 ). +The :doc:`/api/native-api` contains a useful but potentially dangerous set of API endpoints called "admin" that allows you to change system settings, make ordinary users into superusers, and more. The "builtin-users" endpoints let admins do tasks such as creating a local/builtin user account if they know the key defined in :ref:`BuiltinUsers.KEY`. + +By default in the code, most of these API endpoints can be operated on remotely and a number of endpoints do not require authentication. However, the endpoints "admin" and "builtin-users" are limited to localhost out of the box by the installer, using the JvmSettings :ref:`dataverse.api.blocked.endpoints` and :ref:`dataverse.api.blocked.policy`. + +.. note:: + The database settings :ref:`:BlockedApiEndpoints` and :ref:`:BlockedApiPolicy` are deprecated and will be removed in a future version. Please use the JvmSettings mentioned above instead. + +It is **very important** to keep the block in place for the "admin" endpoint, and to leave the "builtin-users" endpoint blocked unless you need to access it remotely. Documentation for the "admin" endpoint is spread across the :doc:`/api/native-api` section of the API Guide and the :doc:`/admin/index`. + +Given how important it is to avoid exposing the "admin" and "builtin-user" APIs, sites using a proxy, e.g. Apache or Nginx, should also consider blocking them through rules in the proxy. +The following examples may be useful: + +Apache/Httpd Rule: + +Rewrite lines added to /etc/httpd/conf.d/ssl.conf. They can be the first lines inserted after the RewriteEngine On statement: -By default, most APIs can be operated on remotely and a number of endpoints do not require authentication. The endpoints "admin" and "test" are limited to localhost out of the box by the settings :ref:`:BlockedApiEndpoints` and :ref:`:BlockedApiPolicy`. +.. code-block:: apache -It is very important to keep the block in place for the "admin" endpoint (and at least consider blocking "builtin-users"). Please note that documentation for the "admin" endpoint is spread across the :doc:`/api/native-api` section of the API Guide and the :doc:`/admin/index`. + RewriteRule ^/api/(admin|builtin-users) - [R=403,L] + RewriteRule ^/api/(v[0-9]*)/(admin|builtin-users) - [R=403,L] -It's also possible to prevent file uploads via API by adjusting the :ref:`:UploadMethods` database setting. +Nginx Configuration Rule: + +.. code-block:: nginx + + location ~ ^/api/(admin|v1/admin|builtin-users|v1/builtin-users) { + deny all; + return 403; + } + +If you are using a load balancer or a reverse proxy, there are some additional considerations. If no additional configurations are made and the upstream is configured to redirect to localhost, the API will be accessible from the outside, as your installation will register as origin the localhost for any requests to the endpoints "admin" and "builtin-users". To prevent this, you have two options: + +- If your upstream is configured to redirect to localhost, you will need to set the :ref:`JVM option ` to one of the following values ``%client.name% %datetime% %request% %status% %response.length% %header.referer% %header.x-forwarded-for%`` and configure from the load balancer side the chosen header to populate with the client IP address. + +- Another solution is to set the upstream to the client IP address. In this case no further configuration is needed. + +For more information on configuring blocked API endpoints, see :ref:`dataverse.api.blocked.endpoints` and :ref:`dataverse.api.blocked.policy` in the JvmSettings documentation. + +.. note:: + It's also possible to prevent file uploads via API by adjusting the :ref:`:UploadMethods` database setting. Forcing HTTPS +++++++++++++ -To avoid having your users send credentials in the clear, it's strongly recommended to force all web traffic to go through HTTPS (port 443) rather than HTTP (port 80). The ease with which one can install a valid SSL cert into Apache compared with the same operation in Glassfish might be a compelling enough reason to front Glassfish with Apache. In addition, Apache can be configured to rewrite HTTP to HTTPS with rules such as those found at https://wiki.apache.org/httpd/RewriteHTTPToHTTPS or in the section on :doc:`shibboleth`. +To avoid having your users send credentials in the clear, it's strongly recommended to force all web traffic to go through HTTPS (port 443) rather than HTTP (port 80). The ease with which one can install a valid SSL cert into Apache compared with the same operation in Payara might be a compelling enough reason to front Payara with Apache. In addition, Apache can be configured to rewrite HTTP to HTTPS with rules such as those found at https://wiki.apache.org/httpd/RewriteHTTPToHTTPS or in the section on :doc:`shibboleth`. + + +.. _user-ip-addresses-proxy-security: + +Recording User IP Addresses ++++++++++++++++++++++++++++ + +By default, the Dataverse installation captures the IP address from which requests originate. This is used for multiple purposes including controlling access to the admin API, IP-based user groups and Make Data Count reporting. When the Dataverse installation is configured behind a proxy such as a load balancer, this default setup may not capture the correct IP address. In this case all the incoming requests will be logged in the access logs, MDC logs etc., as if they are all coming from the IP address(es) of the load balancer itself. Proxies usually save the original address in an added HTTP header, from which it can be extracted. For example, AWS LB records the "true" original address in the standard ``X-Forwarded-For`` header. If your Dataverse installation is running behind an IP-masking proxy, but you would like to use IP groups, or record the true geographical location of the incoming requests with Make Data Count, you may enable the IP address lookup from the proxy header using the JVM option ``dataverse.useripaddresssourceheader``, described further below. + +Before doing so however, you must absolutely **consider the security risks involved**! This option must be enabled **only** on a Dataverse installation that is in fact fully behind a proxy that properly, and consistently, adds the ``X-Forwarded-For`` (or a similar) header to every request it forwards. Consider the implications of activating this option on a Dataverse installation that is not running behind a proxy, *or running behind one, but still accessible from the insecure locations bypassing the proxy*: Anyone can now add the header above to an incoming request, supplying an arbitrary IP address that the Dataverse installation will trust as the true origin of the call. Thus giving an attacker an easy way to, for example, get in a privileged IP group. The implications could be even more severe if an attacker were able to pretend to be coming from ``localhost``, if a Dataverse installation is configured to trust localhost connections for unrestricted access to the admin API! We have addressed this by making it so that Dataverse installation should never accept ``localhost``, ``127.0.0.1``, ``0:0:0:0:0:0:0:1`` etc. when supplied in such a header. But if you have reasons to still find this risk unacceptable, you may want to consider turning open localhost access to the API off (See :ref:`Securing Your Installation ` for more information.) + +This is how to verify that your proxy or load balancer, etc. is handling the originating address headers properly and securely: Make sure access logging is enabled in your application server (Payara) configuration. (```` in the ``domain.xml``). Add the address header to the access log format. For example, on a system behind AWS ELB, you may want to use something like ``%client.name% %datetime% %request% %status% %response.length% %header.referer% %header.x-forwarded-for%``. Once enabled, access the Dataverse installation from outside the LB. You should now see the real IP address of your remote client in the access log. For example, something like: +``"1.2.3.4" "01/Jun/2020:12:00:00 -0500" "GET /dataverse.xhtml HTTP/1.1" 200 81082 "NULL-REFERER" "128.64.32.16"`` + +In this example, ``128.64.32.16`` is your remote address (that you should verify), and ``1.2.3.4`` is the address of your LB. If you're not seeing your remote address in the log, do not activate the JVM option! Also, verify that all the entries in the log have this header populated. The only entries in the access log that you should be seeing without this header (logged as ``"NULL-HEADER-X-FORWARDED-FOR"``) are local requests, made from localhost, etc. In this case, since the request is not coming through the proxy, the local IP address should be logged as the primary one (as the first value in the log entry, ``%client.name%``). If you see any requests coming in from remote, insecure subnets without this header - do not use the JVM option! + +Once you are ready, enable the :ref:`JVM option `. Verify that the remote locations are properly tracked in your MDC metrics, and/or your IP groups are working. As a final test, if your Dataverse installation is allowing unrestricted localhost access to the admin API, imitate an attack in which a malicious request is pretending to be coming from ``127.0.0.1``. Try the following from a remote, insecure location: + +``curl https://your.dataverse.edu/api/admin/settings --header "X-FORWARDED-FOR: 127.0.0.1"`` + +First of all, confirm that access is denied! If you are in fact able to access the settings api from a location outside the proxy, **something is seriously wrong**, so please let us know, and stop using the JVM option. Otherwise check the access log entry for the header value. What you should see is something like ``"127.0.0.1, 128.64.32.16"``. Where the second address should be the real IP of your remote client. The fact that the "fake" ``127.0.0.1`` you sent over is present in the header is perfectly ok. This is the proper proxy behavior - it preserves any incoming values in the ``X-Forwarded-Header``, if supplied, and adds the detected incoming address to it, *on the right*. It is only this rightmost comma-separated value that Dataverse installation should ever be using. + +Still feel like activating this option in your configuration? - Have fun and be safe! + .. _PrivacyConsiderations: @@ -45,23 +103,69 @@ Privacy Considerations Email Privacy ^^^^^^^^^^^^^ -Out of the box, Dataverse will list email addresses of the contacts for datasets when users visit a dataset page and click the "Export Metadata" button. Additionally, out of the box, Dataverse will list email addresses of dataverse contacts via API (see :ref:`View a Dataverse ` in the :doc:`/api/native-api` section of the API Guide). If you would like to exclude these email addresses from export, set :ref:`:ExcludeEmailFromExport <:ExcludeEmailFromExport>` to true. +Out of the box, your Dataverse installation will list email addresses of the contacts for datasets when users visit a dataset page and click the "Export Metadata" button. Additionally, out of the box, the Dataverse installation will list email addresses of Dataverse collection contacts via API (see :ref:`View a Dataverse Collection ` in the :doc:`/api/native-api` section of the API Guide). If you would like to exclude these email addresses from export, set :ref:`:ExcludeEmailFromExport <:ExcludeEmailFromExport>` to true. Additional Recommendations ++++++++++++++++++++++++++ -Run Glassfish as a User Other Than Root -+++++++++++++++++++++++++++++++++++++++ -See the Glassfish section of :doc:`prerequisites` for details and init scripts for running Glassfish as non-root. +Run Payara as a User Other Than Root +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +See the :ref:`payara` section of :doc:`prerequisites` for details and init scripts for running Payara as non-root. + +Related to this is that you should remove ``/root/.payara/pass`` to ensure that Payara isn't ever accidentally started as root. Without the password, Payara won't be able to start as root, which is a good thing. + +.. _secure-password-storage: + +Secure Password Storage +^^^^^^^^^^^^^^^^^^^^^^^ -Related to this is that you should remove ``/root/.glassfish/pass`` to ensure that Glassfish isn't ever accidentally started as root. Without the password, Glassfish won't be able to start as root, which is a good thing. +In development or demo scenarios, we suggest not to store passwords in files permanently. +We recommend the use of at least environment variables or production-grade mechanisms to supply passwords. + +In a production setup, permanently storing passwords as plaintext should be avoided at all cost. +Environment variables are dangerous in shared environments and containers, as they may be easily exploited; we suggest not to use them. +Depending on your deployment model and environment, you can make use of the following techniques to securely store and access passwords. + +**Password Aliases** + +A `password alias`_ allows you to have a plaintext reference to an encrypted password stored on the server, with the alias being used wherever the password is needed. +This method is especially useful in a classic deployment, as it does not require any external secrets management. + +Password aliases are consumable as a MicroProfile Config source and can be referrenced by their name in a `property expression`_. +You may also reference them within a `variable substitution`_, e.g. in your ``domain.xml``. + +Creation example for an alias named *my.alias.name*: + +.. code-block:: shell + + echo "AS_ADMIN_ALIASPASSWORD=changeme" > /tmp/p.txt + asadmin create-password-alias --passwordfile "/tmp/p.txt" "my.alias.name" + rm /tmp/p.txt + +Note: omitting the ``--passwordfile`` parameter allows creating the alias in an interactive fashion with a prompt. + +**Secrets Files** + +Payara has a builtin MicroProfile Config source to consume values from files in a directory on your filesystem. +This `directory config source`_ is most useful and secure with external secrets management in place, temporarily mounting cleartext passwords as files. +Examples are Kubernetes / OpenShift `Secrets `_ or tools like `Vault Agent `_. + +Please follow the `directory config source`_ documentation to learn about its usage. + +**Cloud Providers** + +Running Dataverse on a cloud platform or running an external secret management system like `Vault `_ enables accessing secrets without any intermediate storage of cleartext. +Obviously this is the most secure option for any deployment model, but it may require more resources to set up and maintain - your mileage may vary. + +Take a look at `cloud sources`_ shipped with Payara to learn about their usage. Enforce Strong Passwords for User Accounts -++++++++++++++++++++++++++++++++++++++++++ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Dataverse only stores passwords (as salted hash, and using a strong hashing algorithm) for "builtin" users. You can increase the password complexity rules to meet your security needs. If you have configured your Dataverse installation to allow login from remote authentication providers such as Shibboleth, ORCID, GitHub or Google, you do not have any control over those remote providers' password complexity rules. See the "Auth Modes: Local vs. Remote vs. Both" section below for more on login options. +Your Dataverse installation only stores passwords (as salted hash, and using a strong hashing algorithm) for "builtin" users. You can increase the password complexity rules to meet your security needs. If you have configured your Dataverse installation to allow login from remote authentication providers such as Shibboleth, ORCID, GitHub or Google, you do not have any control over those remote providers' password complexity rules. See the :ref:`auth-modes` section below for more on login options. -Even if you are satisfied with the out-of-the-box password complexity rules Dataverse ships with, for the "dataverseAdmin" account you should use a strong password so the hash cannot easily be cracked through dictionary attacks. +Even if you are satisfied with the out-of-the-box password complexity rules the Dataverse Software ships with, for the "dataverseAdmin" account you should use a strong password so the hash cannot easily be cracked through dictionary attacks. Password complexity rules for "builtin" accounts can be adjusted with a variety of settings documented below. Here's a list: @@ -74,79 +178,532 @@ Password complexity rules for "builtin" accounts can be adjusted with a variety - :ref:`:PVGoodStrength` - :ref:`:PVCustomPasswordResetAlertMessage` +.. _samesite-cookie-attribute: + +SameSite Cookie Attribute +^^^^^^^^^^^^^^^^^^^^^^^^^ + +The SameSite cookie attribute is defined in an upcoming revision to `RFC 6265 `_ (HTTP State Management Mechanism) called `6265bis `_ ("bis" meaning "repeated"). The possible values are "None", "Lax", and "Strict". "Strict" is intended to help prevent Cross-Site Request Forgery (CSRF) attacks, as described in the RFC proposal and an OWASP `cheetsheet `_. We don't recommend "None" for security reasons. + +By default, Payara doesn't send the SameSite cookie attribute, which browsers should interpret as "Lax" according to `MDN `_. +Dataverse installations are explicity set to "Lax" out of the box by the installer (in the case of a "classic" installation) or through the base image (in the case of a Docker installation). For classic, see :ref:`http.cookie-same-site-value` and :ref:`http.cookie-same-site-enabled` for how to change the values. For Docker, you must rebuild the :doc:`base image `. See also Payara's `documentation `_ for the settings above. + +To inspect cookie attributes like SameSite, you can use ``curl -s -I http://localhost:8080 | grep JSESSIONID``, for example, looking for the "Set-Cookie" header. + +.. _ongoing-security: + +Ongoing Security of Your Installation ++++++++++++++++++++++++++++++++++++++ + +Like any application, you should keep up-to-date with patches to both the Dataverse software and the platform (usually Linux) it runs on. Dataverse releases are announced on the dataverse-community_ mailing list, the Dataverse blog_, and in chat.dataverse.org_. + +.. _dataverse-community: https://groups.google.com/g/dataverse-community +.. _blog: https://dataverse.org/blog +.. _chat.dataverse.org: https://chat.dataverse.org + +In addition to these public channels, you can subscribe to receive security notices via email from the Dataverse team. These notices are sent to the ``contact_email`` in the installation spreadsheet_ and you can open an issue in the dataverse-installations_ repo to add or change the contact email. Security notices are also sent to people and organizations that prefer to remain anonymous. To be added to this private list, please email support@dataverse.org. + +.. _spreadsheet: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0 +.. _dataverse-installations: https://github.com/IQSS/dataverse-installations + +For additional details about security practices by the Dataverse team, see the :doc:`/developers/security` section of the Developer Guide. + +.. _reporting-security-issues: + +Reporting Security Issues ++++++++++++++++++++++++++ + +If you have a security issue to report, please email it to security@dataverse.org. + +.. _network-ports: + Network Ports ------------- -Remember how under "Decisions to Make" in the :doc:`prep` section we mentioned you'll need to make a decision about whether or not to introduce a proxy in front of Dataverse such as Apache or nginx? The time has come to make that decision. - -The need to redirect port HTTP (port 80) to HTTPS (port 443) for security has already been mentioned above and the fact that Glassfish puts these services on 8080 and 8181, respectively, was touched on in the :doc:`installation-main` section. In production, you don't want to tell your users to use Dataverse on ports 8080 and 8181. You should have them use the standard HTTPS port, which is 443. +Remember how under "Decisions to Make" in the :doc:`prep` section we mentioned you'll need to make a decision about whether or not to introduce a proxy in front of the Dataverse Software such as Apache or nginx? The time has come to make that decision. -Your decision to proxy or not should primarily be driven by which features of Dataverse you'd like to use. If you'd like to use Shibboleth, the decision is easy because proxying or "fronting" Glassfish with Apache is required. The details are covered in the :doc:`shibboleth` section. +The need to redirect port HTTP (port 80) to HTTPS (port 443) for security has already been mentioned above and the fact that Payara puts these services on 8080 and 8181, respectively, was touched on in the :doc:`installation-main` section. In production, you don't want to tell your users to use your Dataverse installation on ports 8080 and 8181. You should have them use the standard HTTPS port, which is 443. -If you'd like to use TwoRavens, you should also consider fronting with Apache because you will be required to install an Apache anyway to make use of the rApache module. For details, see the :doc:`r-rapache-tworavens` section. +Your decision to proxy or not should primarily be driven by which features of the Dataverse Software you'd like to use. If you'd like to use Shibboleth, the decision is easy because proxying or "fronting" Payara with Apache is required. The details are covered in the :doc:`shibboleth` section. -Even if you have no interest in Shibboleth nor TwoRavens, you may want to front Dataverse with Apache or nginx to simply the process of installing SSL certificates. There are many tutorials on the Internet for adding certs to Apache, including a some `notes used by the Dataverse team `_, but the process of adding a certificate to Glassfish is arduous and not for the faint of heart. The Dataverse team cannot provide much help with adding certificates to Glassfish beyond linking to `tips `_ on the web. +Even if you have no interest in Shibboleth, you may want to front your Dataverse installation with Apache or nginx to simply the process of installing SSL certificates. There are many tutorials on the Internet for adding certs to Apache, including a some `notes used by the Dataverse Project team `_, but the process of adding a certificate to Payara is arduous and not for the faint of heart. The Dataverse Project team cannot provide much help with adding certificates to Payara beyond linking to `tips `_ on the web. -Still not convinced you should put Glassfish behind another web server? Even if you manage to get your SSL certificate into Glassfish, how are you going to run Glassfish on low ports such as 80 and 443? Are you going to run Glassfish as root? Bad idea. This is a security risk. Under "Additional Recommendations" under "Securing Your Installation" above you are advised to configure Glassfish to run as a user other than root. (The Dataverse team will close https://github.com/IQSS/dataverse/issues/1934 after updating the Glassfish init script provided in the :doc:`prerequisites` section to not require root.) +Still not convinced you should put Payara behind another web server? Even if you manage to get your SSL certificate into Payara, how are you going to run Payara on low ports such as 80 and 443? Are you going to run Payara as root? Bad idea. This is a security risk. Under "Additional Recommendations" under "Securing Your Installation" above you are advised to configure Payara to run as a user other than root. There's also the issue of serving a production-ready version of robots.txt. By using a proxy such as Apache, this is a one-time "set it and forget it" step as explained below in the "Going Live" section. -If you are convinced you'd like to try fronting Glassfish with Apache, the :doc:`shibboleth` section should be good resource for you. +If you are convinced you'd like to try fronting Payara with Apache, the :doc:`shibboleth` section should be good resource for you. -If you really don't want to front Glassfish with any proxy (not recommended), you can configure Glassfish to run HTTPS on port 443 like this: +If you really don't want to front Payara with any proxy (not recommended), you can configure Payara to run HTTPS on port 443 like this: ``./asadmin set server-config.network-config.network-listeners.network-listener.http-listener-2.port=443`` -What about port 80? Even if you don't front Dataverse with Apache, you may want to let Apache run on port 80 just to rewrite HTTP to HTTPS as described above. You can use a similar command as above to change the HTTP port that Glassfish uses from 8080 to 80 (substitute ``http-listener-1.port=80``). Glassfish can be used to enforce HTTPS on its own without Apache, but configuring this is an exercise for the reader. Answers here may be helpful: http://stackoverflow.com/questions/25122025/glassfish-v4-java-7-port-unification-error-not-able-to-redirect-http-to +What about port 80? Even if you don't front your Dataverse installation with Apache, you may want to let Apache run on port 80 just to rewrite HTTP to HTTPS as described above. You can use a similar command as above to change the HTTP port that Payara uses from 8080 to 80 (substitute ``http-listener-1.port=80``). Payara can be used to enforce HTTPS on its own without Apache, but configuring this is an exercise for the reader. Answers here may be helpful: https://stackoverflow.com/questions/25122025/glassfish-v4-java-7-port-unification-error-not-able-to-redirect-http-to -If you are running an installation with Apache and Glassfish on the same server, and would like to restrict Glassfish from responding to any requests to port 8080 from external hosts (in other words, not through Apache), you can restrict the AJP listener to localhost only with: +If you are running an installation with Apache and Payara on the same server, and would like to restrict Payara from responding to any requests to port 8080 from external hosts (in other words, not through Apache), you can restrict the AJP listener to localhost only with: ``./asadmin set server-config.network-config.network-listeners.network-listener.http-listener-1.address=127.0.0.1`` You should **NOT** use the configuration option above if you are running in a load-balanced environment, or otherwise have the web server on a different host than the application server. -Root Dataverse Permissions --------------------------- +Root Dataverse Collection Permissions +------------------------------------- -The user who creates a dataverse is given the "Admin" role on that dataverse. The root dataverse is created automatically for you by the installer and the "Admin" is the superuser account ("dataverseAdmin") we used in the :doc:`installation-main` section to confirm that we can log in. These next steps of configuring the root dataverse require the "Admin" role on the root dataverse, but not the much more powerful superuser attribute. In short, users with the "Admin" role are subject to the permission system. A superuser, on the other hand, completely bypasses the permission system. You can give non-superusers the "Admin" role on the root dataverse if you'd like them to configure the root dataverse. +The user who creates a Dataverse collection is given the "Admin" role on that Dataverse collection. The root Dataverse collection is created automatically for you by the installer and the "Admin" is the superuser account ("dataverseAdmin") we used in the :doc:`installation-main` section to confirm that we can log in. These next steps of configuring the root Dataverse collection require the "Admin" role on the root Dataverse collection, but not the much more powerful superuser attribute. In short, users with the "Admin" role are subject to the permission system. A superuser, on the other hand, completely bypasses the permission system. You can give non-superusers the "Admin" role on the root Dataverse collection if you'd like them to configure the root Dataverse collection. -In order for non-superusers to start creating dataverses or datasets, you need click "Edit" then "Permissions" and make choices about which users can add dataverses or datasets within the root dataverse. (There is an API endpoint for this operation as well.) Again, the user who creates a dataverse will be granted the "Admin" role on that dataverse. Non-superusers who are not "Admin" on the root dataverse will not be able to to do anything useful until the root dataverse has been published. +In order for non-superusers to start creating Dataverse collections or datasets, you need click "Edit" then "Permissions" and make choices about which users can add Dataverse collections or datasets within the root Dataverse collection. (There is an API endpoint for this operation as well.) Again, the user who creates a Dataverse collection will be granted the "Admin" role on that Dataverse collection. Non-superusers who are not "Admin" on the root Dataverse collection will not be able to do anything useful until the root Dataverse collection has been published. -As the person installing Dataverse you may or may not be a local metadata expert. You may want to have others sign up for accounts and grant them the "Admin" role at the root dataverse to configure metadata fields, templates, browse/search facets, guestbooks, etc. For more on these topics, consult the :doc:`/user/dataverse-management` section of the User Guide. +As the person installing the Dataverse Software, you may or may not be a local metadata expert. You may want to have others sign up for accounts and grant them the "Admin" role at the root Dataverse collection to configure metadata fields, templates, browse/search facets, guestbooks, etc. For more on these topics, consult the :doc:`/user/dataverse-management` section of the User Guide. + +.. _pids-configuration: Persistent Identifiers and Publishing Datasets ---------------------------------------------- -Persistent identifiers are a required and integral part of the Dataverse platform. They provide a URL that is guaranteed to resolve to the datasets or files they represent. Dataverse currently supports creating identifiers using DOI and Handle. +Persistent identifiers (PIDs) are a required and integral part of the Dataverse Software. They provide a URL that is +guaranteed to resolve to the datasets or files they represent. The Dataverse Software currently supports creating +identifiers using any of several PID types. The most appropriate PIDs for public data are DOIs (e.g., provided by +DataCite or EZID) and Handles. Dataverse also supports PermaLinks which could be useful for intranet or catalog use +cases. A DOI provider called "FAKE" is recommended only for testing and development purposes. -By default, the installer configures a default DOI namespace (10.5072) with DataCite as the registration provider. Please note that as of the release 4.9.3, we can no longer use EZID as the provider. Unlike EZID, DataCite requires that you register for a test account, configured with your own prefix (please contact support@datacite.org). Once you receive the login name, password, and prefix for the account, configure the credentials in your domain.xml, as the following two JVM options:: +Dataverse can be configured with one or more PID providers, each of which can mint and manage PIDs with a given protocol +(e.g., doi, handle, permalink) using a specific service provider/account (e.g. with DataCite, EZId, or HandleNet) +to manage an authority/shoulder combination, aka a "prefix" (PermaLinks also support custom separator characters as part of the prefix), +along with an optional list of individual PIDs (with different authority/shoulders) than can be managed with that account. - -Ddoi.username=... - -Ddoi.password=... +Dataverse automatically manages assigning PIDs and making them findable when datasets are published. There are also :ref:`API calls that +allow updating the PID target URLs and metadata of already-published datasets manually if needed `, e.g. if a Dataverse instance is +moved to a new URL or when the software is updated to generate additional metadata or address schema changes at the PID service. -and restart Glassfish. The prefix can be configured via the API (where it is referred to as "Authority"): +Note that while some forms of PIDs (Handles, PermaLinks) are technically case sensitive, common practice is to avoid creating PIDs that differ only by case. +Dataverse treats PIDs of all types as case-insensitive (as DOIs are by definition). This means that Dataverse will find datasets (in search, to display dataset pages, etc.) +when the PIDs entered do not match the case of the original but will have a problem if two PIDs that differ only by case exist in one instance. -``curl -X PUT -d 10.xxxx http://localhost:8080/api/admin/settings/:Authority`` +Testing PID Providers ++++++++++++++++++++++ -Once this is done, you will be able to publish datasets and files, but the persistent identifiers will not be citable, and they will only resolve from the DataCite test environment (and then only if the Dataverse from which you published them is accessible - DOIs minted from your laptop will not resolve). Note that any datasets or files created using the test configuration cannot be directly migrated and would need to be created again once a valid DOI namespace is configured. +By default, the installer configures the Fake DOI provider as the registration provider. Unlike other DOI Providers, the Fake Provider does not involve any +external resolution service and is not appropriate for use beyond development and testing. You may wish instead to test with +PermaLinks or with a DataCite test account (which uses DataCite's test infrastructure and will help assure your Dataverse instance can make network connections to DataCite. +DataCite requires that you register for a test account, which will have a username, password and your own prefix (please contact support@datacite.org for a test account. +You may wish to `contact the GDCC `_ instead - GDCC is able to provide DataCite accounts with a group discount and can also provide test accounts.). -To properly configure persistent identifiers for a production installation, an account and associated namespace must be acquired for a fee from a DOI or HDL provider. **DataCite** (https://www.datacite.org) is the recommended DOI provider (see https://dataverse.org/global-dataverse-community-consortium for more on joining DataCite) but **EZID** (http://ezid.cdlib.org) is an option for the University of California according to https://www.cdlib.org/cdlinfo/2017/08/04/ezid-doi-service-is-evolving/ . **Handle.Net** (https://www.handle.net) is the HDL provider. +Once you receive the login name, password, and prefix for the account, +configure the credentials as described below. -Once you have your DOI or Handle account credentials and a namespace, configure Dataverse to use them using the JVM options and database settings below. +Alternately, you may wish to configure other providers for testing: -Configuring Dataverse for DOIs -++++++++++++++++++++++++++++++ +- EZID is available to University of California scholars and researchers. Testing can be done using the authority 10.5072 and shoulder FK2 with the "apitest" account (contact EZID for credentials) or an institutional account. Configuration in Dataverse is then analogous to using DataCite. + +- The PermaLink provider, like the FAKE DOI provider, does not involve an external account. + Unlike the Fake DOI provider, the PermaLink provider creates PIDs that begin with "perma:", making it clearer that they are not DOIs, + and that do resolve to the local dataset/file page in Dataverse, making them useful for some production use cases. See :ref:`permalinks` and (for the FAKE DOI provider) the :doc:`/developers/dev-environment` section of the Developer Guide. -By default Dataverse attempts to register DOIs for each dataset and file under a test authority, though you must apply for your own credentials as explained above. +Provider-specific configuration is described below. -Here are the configuration options for DOIs: +Once all is configured, you will be able to publish datasets and files, but **the persistent identifiers will not be citable** +as they, with the exception of PermaLinks, will not redirect to your dataset page in Dataverse. -**JVM Options:** +Note that any datasets or files created using a test configuration cannot be directly migrated to a production PID provider +and would need to be created again once a valid PID Provider(s) are configured. + +One you are done testing, to properly configure persistent identifiers for a production installation, an account and associated namespace (e.g. authority/shoulder) must be +acquired for a fee from a DOI or HDL provider. (As noted above, PermaLinks May be appropriate for intranet and catalog uses cases.) +**DataCite** (https://www.datacite.org) is the recommended DOI provider +(see https://dataversecommunity.global for more on joining DataCite through the Global Dataverse Community Consortium) but **EZID** +(http://ezid.cdlib.org) is an option for the University of California according to +https://www.cdlib.org/cdlinfo/2017/08/04/ezid-doi-service-is-evolving/ . +**Handle.Net** (https://www.handle.net) is the HDL provider. + +Once you have your DOI or Handle account credentials and a prefix, configure your Dataverse installation +using the settings below. + + +Configuring PID Providers ++++++++++++++++++++++++++ + +There are two required global settings to configure PID providers - the list of ids of providers and which one of those should be the default. +Per-provider settings are also required - some that are common to all types and some type specific. All of these settings are defined +to be compatible with the MicroProfile specification which means that + +1. Any of these settings can be set via system properties (see :ref:`jvm-options` for how to do this), environment variables, or other + MicroProfile Config mechanisms supported by the app server. + `See Payara docs for supported sources `_. +2. Remember to protect your secrets. For passwords, use an environment variable (bare minimum), a password alias named the same + as the key (OK) or use the `"dir config source" of Payara `_ (best). + + Alias creation example: + + .. code-block:: shell + + echo "AS_ADMIN_ALIASPASSWORD=changeme" > /tmp/p.txt + asadmin create-password-alias --passwordfile /tmp/p.txt dataverse.pid.datacite1.datacite.password + rm /tmp/p.txt + +3. Environment variables follow the key, replacing any dot, colon, dash, etc. into an underscore "_" and all uppercase + letters. Example: ``dataverse.pid.default-provider`` -> ``DATAVERSE_PID_DEFAULT_PROVIDER`` + +Global Settings +^^^^^^^^^^^^^^^ + +The following two global settings are required to configure PID Providers in the Dataverse software: + +.. _dataverse.pid.providers: + +dataverse.pid.providers +^^^^^^^^^^^^^^^^^^^^^^^ + +A comma-separated list of the ids of the PID providers to use. IDs should be simple unique text strings, e.g. datacite1, perma1, etc. +IDs are used to scope the provider-specific settings but are not directly visible to users. + +.. _dataverse.pid.default-provider: + +dataverse.pid.default-provider +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ID of the default PID provider to use. + +.. _dataverse.spi.pidproviders.directory: + +dataverse.spi.pidproviders.directory +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The path to the directory where JAR files containing additional types of PID Providers can be added. +Dataverse includes providers that support DOIs (DataCite, EZId, or FAKE), Handles, and PermaLinks. +PID provider jar files added to this directory can replace any of these or add new PID Providers. + +Per-Provider Settings +^^^^^^^^^^^^^^^^^^^^^ + +Each Provider listed by id in the dataverse.pid.providers setting must be configured with the following common settings and any settings that are specific to the provider type. + +.. _dataverse.pid.*.type: + +dataverse.pid.*.type +^^^^^^^^^^^^^^^^^^^^ + +The Provider type, currently one of ``datacite``, ``ezid``, ``FAKE``, ``hdl``, or ``perma``. The type defines which protocol a service supports (DOI, Handle, or PermaLink) and, for DOI Providers, which +DOI service is used. + +.. _dataverse.pid.*.label: + +dataverse.pid.*.label +^^^^^^^^^^^^^^^^^^^^^ + +A human-readable label for the provider + +.. _dataverse.pid.*.authority: + +dataverse.pid.*.authority +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. _dataverse.pid.*.shoulder: + +dataverse.pid.*.shoulder +^^^^^^^^^^^^^^^^^^^^^^^^ + +In general, PIDs are of the form ``:/*`` where ``*`` is the portion unique to an individual PID. PID Providers must define +the authority and shoulder (with the protocol defined by the ``dataverse.pid.*.type`` setting) that defines the set of existing PIDs they can manage and the prefix they can use when minting new PIDs. +(Often an account with a PID service provider will be limited to using a single authority/shoulder. If your PID service provider account allows more than one combination that you wish to use in Dataverse, configure multiple PID Provider, one for each combination.) + +.. _dataverse.pid.*.identifier-generation-style: + +dataverse.pid.*.identifier-generation-style +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +By default, Pid Providers in Dataverse generate a random 6 character string, +pre-pended by the Shoulder if set, to use as the identifier for a Dataset. +Set this to ``storedProcGenerated`` to generate instead a custom *unique* +identifier (again pre-pended by the Shoulder if set) through a database +stored procedure or function (the assumed default setting is ``randomString``). +When using the ``storedProcGenerated`` setting, a stored procedure or function must be created in +the database. + +As a first example, the script below (downloadable +:download:`here `) produces +sequential numerical values. You may need to make some changes to suit your +system setup, see the comments for more information: + +.. literalinclude:: ../_static/util/createsequence.sql + :language: plpgsql + +As a second example, the script below (downloadable +:download:`here `) produces +sequential 8 character identifiers from a base36 representation of current +timestamp. + +.. literalinclude:: ../_static/util/identifier_from_timestamp.sql + :language: plpgsql + +Note that the SQL in these examples scripts is Postgres-specific. +If necessary, it can be reimplemented in any other SQL flavor - the standard +JPA code in the application simply expects the database to have a saved +function ("stored procedure") named ``generateIdentifierFromStoredProcedure()`` +returning a single ``varchar`` argument. + +Please note that this setting interacts with the ``dataverse.pid.*.datafile-pid-format`` +setting below to determine how datafile identifiers are generated. + + +.. _dataverse.pid.*.datafile-pid-format: + +dataverse.pid.*.datafile-pid-format +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This setting controls the way that the "identifier" component of a file's +persistent identifier (PID) relates to the PID of its "parent" dataset - for a give PID Provider. + +By default the identifier for a file is dependent on its parent dataset. +For example, if the identifier of a dataset is "TJCLKP", the identifier for +a file within that dataset will consist of the parent dataset's identifier +followed by a slash ("/"), followed by a random 6 character string, +yielding "TJCLKP/MLGWJO". Identifiers in this format are what you should +expect if you leave ``dataverse.pid.*.datafile-pid-format`` undefined or set it to +``DEPENDENT`` and have not changed the ``dataverse.pid.*.identifier-generation-style`` +setting from its default. + +Alternatively, the identifier for File PIDs can be configured to be +independent of Dataset PIDs using the setting ``INDEPENDENT``. +In this case, file PIDs will not contain the PIDs of their parent datasets, +and their PIDs will be generated the exact same way that datasets' PIDs are, +based on the ``dataverse.pid.*.identifier-generation-style`` setting described above +(random 6 character strings or custom unique identifiers through a stored +procedure, pre-pended by any shoulder). + +The chart below shows examples from each possible combination of parameters +from the two settings. ``dataverse.pid.*.identifier-generation-style`` can be either +``randomString`` (the default) or ``storedProcGenerated`` and +``dataverse.pid.*.datafile-pid-format`` can be either ``DEPENDENT`` (the default) or +``INDEPENDENT``. In the examples below the "identifier" for the dataset is +"TJCLKP" for ``randomString`` and "100001" for ``storedProcGenerated`` (when +using sequential numerical values, as described in +:ref:`dataverse.pid.*.identifier-generation-style` above), or "krby26qt" for +``storedProcGenerated`` (when using base36 timestamps, as described in +:ref:`dataverse.pid.*.identifier-generation-style` above). + ++-----------------+---------------+----------------------+---------------------+ +| | randomString | storedProcGenerated | storedProcGenerated | +| | | | | +| | | (sequential numbers) | (base36 timestamps) | ++=================+===============+======================+=====================+ +| **DEPENDENT** | TJCLKP/MLGWJO | 100001/1 | krby26qt/1 | ++-----------------+---------------+----------------------+---------------------+ +| **INDEPENDENT** | MLGWJO | 100002 | krby27pz | ++-----------------+---------------+----------------------+---------------------+ + +As seen above, in cases where ``dataverse.pid.*.identifier-generation-style`` is set to +``storedProcGenerated`` and ``dataverse.pid.*.datafile-pid-format`` is set to ``DEPENDENT``, +each file within a dataset will be assigned a number *within* that dataset +starting with "1". + +Otherwise, if ``dataverse.pid.*.datafile-pid-format`` is set to ``INDEPENDENT``, each file +within the dataset is assigned with a new PID which is the next available +identifier provided from the database stored procedure. In our example: +"100002" when using sequential numbers or "krby27pz" when using base36 +timestamps. + +.. _dataverse.pid.*.managed-list: + +dataverse.pid.*.managed-list +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. _dataverse.pid.*.excluded-list: + +dataverse.pid.*.excluded-list +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +With at least some PID services, it is possible for the authority(permission) to manage specific individual PIDs +to be transferred between accounts. To handle these cases, the individual PIDs, written in the +standard format, e.g. doi:10.5072/FK2ABCDEF can be added to the comma-separated ``managed`` or ``excluded`` list +for a given provider. For entries on the ``managed- list``, Dataverse will assume this PID +Provider/account can update the metadata and landing URL for the PID at the service provider +(even though it does not match the provider's authority/shoulder settings). Conversely, +Dataverse will assume that PIDs on the ``excluded-list`` cannot be managed/updated by this provider +(even though they match the provider's authority/shoulder settings). These settings are optional +with the default assumption that these lists are empty. + +.. _dataverse.pid.*.datacite: + +DataCite-specific Settings +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.datacite.mds-api-url +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.datacite.rest-api-url +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.datacite.username +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.datacite.password +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +PID Providers of type ``datacite`` require four additional parameters that define how the provider connects to DataCite. +DataCite has two APIs that are used in Dataverse: + +The base URL of the `DataCite MDS API `_, +used to mint and manage DOIs. Current valid values for ``dataverse.pid.*.datacite.mds-api-url`` are "https://mds.datacite.org" (production) and "https://mds.test.datacite.org" (testing, the default). + +The `DataCite REST API `_ is also used - :ref:`PIDs API ` information retrieval and :doc:`/admin/make-data-count`. +Current valid values for ``dataverse.pid.*.datacite.rest-api-url`` are "https://api.datacite.org" (production) and "https://api.test.datacite.org" (testing, the default). + +DataCite uses `HTTP Basic authentication `_ +for `Fabrica `_ and their APIs. You need to provide +the same credentials (``username``, ``password``) to Dataverse software to mint and manage DOIs for you. +As noted above, you should use one of the more secure options for setting the password. + +CrossRef-specific Settings +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.crossref.url +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.crossref.rest-api-url +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.crossref.username +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.crossref.password +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.crossref.depositor +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.crossref.depositor-email +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +CrossRef is an experimental provider. +PID Providers of type ``crossref`` require six additional parameters that define how the provider connects to CrossRef. +CrossRef has two APIs that are used in Dataverse: + +The base URL of the `CrossRef `_, +used to mint and manage DOIs. Current valid values for ``dataverse.pid.*.crossref.url`` are "https://doi.crossref.org" and ``dataverse.pid.*.crossref.rest-api-url`` are "https://api.crossref.org" (production). +``dataverse.pid.*.crossref.username=crusername`` +``dataverse.pid.*.crossref.password=secret`` +``dataverse.pid.*.crossref.depositor=xyz`` +``dataverse.pid.*.crossref.depositor-email=xyz@example.com`` + +CrossRef uses `HTTP Basic authentication `_ +XML files can be POSTed to CrossRef where they are added to the submission queue to await processing +`Post URL `_ +REST API allows the search and reuse our members' metadata. +`Rest API `_ and their APIs. +You need to provide the same credentials (``username``, ``password``) to Dataverse software to mint and manage DOIs for you. +As noted above, you should use one of the more secure options for setting the password. +Depositor and Depositor Email are used for the generation and distribution of Depositor reports. + +.. _dataverse.pid.*.ezid: + +EZId-specific Settings +^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.ezid.api-url +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.ezid.username +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +dataverse.pid.*.ezid.password +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Note that use of `EZId `_ is limited primarily to University of California institutions. If you have an EZId account, +you will need to configure the ``api-url`` and your account ``username`` and ``password``. As above, you should use one of the more secure +options for setting the password. + +.. _dataverse.pid.*.permalink: + +PermaLink-specific Settings +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.permalink.base-url +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.permalink.separator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +PermaLinks are a simple PID option intended for intranet and catalog use cases. They can be used without an external service or +be configured with the ``base-url`` of a resolution service. PermaLinks also allow a custom ``separator`` to be used. + +Note: + +- If you configure ``base-url``, it should include a "/" after the hostname like this: ``https://demo.dataverse.org/``. +- When using multiple PermaLink providers, you should avoid ambiguous authority/separator/shoulder combinations that would result in the same overall prefix. +- Configuring PermaLink providers differing only by their separator values is not supported. +- In general, PermaLink authority/shoulder values should be alphanumeric. For other cases, admins may need to consider the potential impact of special characters in S3 storage identifiers, resolver URLs, exports, etc. + +.. _dataverse.pid.*.handlenet: + +Handle-specific Settings +^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.handlenet.index +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.handlenet.independent-service +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.handlenet.auth-handle +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.handlenet.key +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.handlenet.path +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +dataverse.pid.*.handlenet.passphrase +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Note: If you are **minting your own handles** and plan to set up your own handle service, please refer to `Handle.Net documentation `_. + +Configure your Handle.net ``index`` to be used registering new persistent +identifiers. Defaults to ``300``. + +Indices are used to separate concerns within the Handle system. To add data to +an index, authentication is mandatory. See also chapter 1.4 "Authentication" of +the `Handle.Net Technical Documentation `__ + +Handle.Net servers use a public key authentication method where the public key +is stored in a handle itself and the matching private key is provided from this +file. Typically, the absolute path ends like ``handle/svr_1/admpriv.bin``. +The key file may (and should) be encrypted with a passphrase (used for +encryption with AES-128). See +also chapter 1.4 "Authentication" of the `Handle.Net Technical Documentation +`__ + +Provide an absolute ``key.path`` to a private key file authenticating requests to your +Handle.Net server. + +Provide a ``key.passphrase`` to decrypt the private key file at ``dataverse.pid.*.handlenet.key.path``. + +Set ``independent-service`` to true if you want to use a Handle service which is setup to work 'independently' (No communication with the Global Handle Registry). +By default this setting is false. + +Set ``auth-handle`` to / to be used on a global handle service when the public key is NOT stored in the default handle. +This setting is optional. If the public key is, for instance, stored in handle: ``21.T12996/USER01``, ``auth-handle`` should be set to this value. + + +.. _pids-doi-configuration: + +Backward-compatibility for Single PID Provider Installations +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -- :ref:`doi.baseurlstring` -- :ref:`doi.username` -- :ref:`doi.password` -- :ref:`doi.mdcbaseurlstring` +While using the PID Provider configuration settings described above is recommended, Dataverse installations +only using a single PID Provider can use the settings below instead. In general, these legacy settings mirror +those above except for not including a PID Provider id. + +Configuring Your Dataverse Installation for a Single DOI Provider +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Here are the configuration options for DOIs.: + +**JVM Options for DataCite:** + +- :ref:`dataverse.pid.datacite.mds-api-url` +- :ref:`dataverse.pid.datacite.rest-api-url` +- :ref:`dataverse.pid.datacite.username` +- :ref:`dataverse.pid.datacite.password` + +**JVM Options for EZID:** + +As stated above, with very few exceptions (e.g. University of California), you will not be able to use +this provider. + +- :ref:`dataverse.pid.ezid.api-url` +- :ref:`dataverse.pid.ezid.username` +- :ref:`dataverse.pid.ezid.password` **Database Settings:** @@ -156,18 +713,21 @@ Here are the configuration options for DOIs: - :ref:`:Shoulder <:Shoulder>` - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) -- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to true) +- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false) -Configuring Dataverse for Handles -+++++++++++++++++++++++++++++++++ +.. _pids-handle-configuration: -Here are the configuration options for handles: +Configuring Your Dataverse Installation for a Single Handle Provider +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Here are the configuration options for handles. Most notably, you need to +change the ``:Protocol`` setting, as it defaults to DOI usage. **JVM Options:** -- :ref:`dataverse.handlenet.admcredfile` -- :ref:`dataverse.handlenet.admprivphrase` -- :ref:`dataverse.handlenet.index` +- :ref:`dataverse.pid.handlenet.key.path` +- :ref:`dataverse.pid.handlenet.key.passphrase` +- :ref:`dataverse.pid.handlenet.index` **Database Settings:** @@ -176,25 +736,50 @@ Here are the configuration options for handles: - :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) - :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) - :ref:`:IndependentHandleService <:IndependentHandleService>` (optional) +- :ref:`:HandleAuthHandle <:HandleAuthHandle>` (optional) + +Note: If you are **minting your own handles** and plan to set up your own handle service, please refer to `Handle.Net documentation `_. + +.. _permalinks: + +Configuring Your Dataverse Installation for a Single PermaLink Provider +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Here are the configuration options for PermaLinks: + +**JVM Options:** + +- :ref:`dataverse.pid.permalink.base-url` -Note: If you are **minting your own handles** and plan to set up your own handle service, please refer to `Handle.Net documentation `_. +**Database Settings:** + +- :ref:`:Protocol <:Protocol>` +- :ref:`:Authority <:Authority>` +- :ref:`:Shoulder <:Shoulder>` +- :ref:`:IdentifierGenerationStyle <:IdentifierGenerationStyle>` (optional) +- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional) +- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false) + +You must restart Payara after making changes to these settings. + +.. _auth-modes: Auth Modes: Local vs. Remote vs. Both ------------------------------------- -There are three valid configurations or modes for authenticating users to Dataverse: +There are three valid configurations or modes for authenticating users to your Dataverse installation: Local Only Auth +++++++++++++++ -Out of the box, Dataverse is configured in "local only" mode. The "dataverseAdmin" superuser account mentioned in the :doc:`/installation/installation-main` section is an example of a local account. Internally, these accounts are called "builtin" because they are built in to the Dataverse application itself. +Out of the box, your Dataverse installation is configured in "local only" mode. The "dataverseAdmin" superuser account mentioned in the :doc:`/installation/installation-main` section is an example of a local account. Internally, these accounts are called "builtin" because they are built in to the Dataverse Software application itself. Both Local and Remote Auth ++++++++++++++++++++++++++ -The ``authenticationproviderrow`` database table controls which "authentication providers" are available within Dataverse. Out of the box, a single row with an id of "builtin" will be present. For each user in Dataverse, the ``authenticateduserlookup`` table will have a value under ``authenticationproviderid`` that matches this id. For example, the default "dataverseAdmin" user will have the value "builtin" under ``authenticationproviderid``. Why is this important? Users are tied to a specific authentication provider but conversion mechanisms are available to switch a user from one authentication provider to the other. As explained in the :doc:`/user/account` section of the User Guide, a graphical workflow is provided for end users to convert from the "builtin" authentication provider to a remote provider. Conversion from a remote authentication provider to the builtin provider can be performed by a sysadmin with access to the "admin" API. See the :doc:`/api/native-api` section of the API Guide for how to list users and authentication providers as JSON. +The ``authenticationproviderrow`` database table controls which "authentication providers" are available within a Dataverse installation. Out of the box, a single row with an id of "builtin" will be present. For each user in a Dataverse installation, the ``authenticateduserlookup`` table will have a value under ``authenticationproviderid`` that matches this id. For example, the default "dataverseAdmin" user will have the value "builtin" under ``authenticationproviderid``. Why is this important? Users are tied to a specific authentication provider but conversion mechanisms are available to switch a user from one authentication provider to the other. As explained in the :doc:`/user/account` section of the User Guide, a graphical workflow is provided for end users to convert from the "builtin" authentication provider to a remote provider. Conversion from a remote authentication provider to the builtin provider can be performed by a sysadmin with access to the "admin" API. See the :doc:`/api/native-api` section of the API Guide for how to list users and authentication providers as JSON. -Adding and enabling a second authentication provider (:ref:`native-api-add-auth-provider` and :ref:`api-toggle-auth-provider`) will result in the Log In page showing additional providers for your users to choose from. By default, the Log In page will show the "builtin" provider, but you can adjust this via the :ref:`conf-default-auth-provider` configuration option. Further customization can be achieved by setting :ref:`conf-allow-signup` to "false", thus preventing users from creating local accounts via the web interface. Please note that local accounts can also be created via API, and the way to prevent this is to block the ``builtin-users`` endpoint (:ref:`:BlockedApiEndpoints`) or scramble (or remove) the ``BuiltinUsers.KEY`` database setting (:ref:`BuiltinUsers.KEY`). +Adding and enabling a second authentication provider (:ref:`native-api-add-auth-provider` and :ref:`api-toggle-auth-provider`) will result in the Log In page showing additional providers for your users to choose from. By default, the Log In page will show the "builtin" provider, but you can adjust this via the :ref:`conf-default-auth-provider` configuration option. Further customization can be achieved by setting :ref:`conf-allow-signup` to "false", thus preventing users from creating local accounts via the web interface. Please note that local accounts can also be created through the API by enabling the ``builtin-users`` endpoint (:ref:`:BlockedApiEndpoints`) and setting the ``BuiltinUsers.KEY`` database setting (:ref:`BuiltinUsers.KEY`). To configure Shibboleth see the :doc:`shibboleth` section and to configure OAuth see the :doc:`oauth2` section. @@ -206,17 +791,275 @@ As for the "Remote only" authentication mode, it means that: - Shibboleth or OAuth has been enabled. - ``:AllowSignUp`` is set to "false" to prevent users from creating local accounts via the web interface. - ``:DefaultAuthProvider`` has been set to use the desired authentication provider -- The "builtin" authentication provider has been disabled (:ref:`api-toggle-auth-provider`). Note that disabling the "builtin" authentication provider means that the API endpoint for converting an account from a remote auth provider will not work. Converting directly from one remote authentication provider to another (i.e. from GitHub to Google) is not supported. Conversion from remote is always to "builtin". Then the user initiates a conversion from "builtin" to remote. Note that longer term, the plan is to permit multiple login options to the same Dataverse account per https://github.com/IQSS/dataverse/issues/3487 (so all this talk of conversion will be moot) but for now users can only use a single login option, as explained in the :doc:`/user/account` section of the User Guide. In short, "remote only" might work for you if you only plan to use a single remote authentication provider such that no conversion between remote authentication providers will be necessary. +- The "builtin" authentication provider has been disabled (:ref:`api-toggle-auth-provider`). Note that disabling the "builtin" authentication provider means that the API endpoint for converting an account from a remote auth provider will not work. Converting directly from one remote authentication provider to another (i.e. from GitHub to Google) is not supported. Conversion from remote is always to "builtin". Then the user initiates a conversion from "builtin" to remote. Note that longer term, the plan is to permit multiple login options to the same Dataverse installation account per https://github.com/IQSS/dataverse/issues/3487 (so all this talk of conversion will be moot) but for now users can only use a single login option, as explained in the :doc:`/user/account` section of the User Guide. In short, "remote only" might work for you if you only plan to use a single remote authentication provider such that no conversion between remote authentication providers will be necessary. + +.. _bearer-token-auth: + +Bearer Token Authentication +--------------------------- + +Bearer tokens are defined in `RFC 6750`_ and can be used as an alternative to API tokens. This is an experimental feature hidden behind a feature flag. + +.. _RFC 6750: https://tools.ietf.org/html/rfc6750 + +To enable bearer tokens, you must install and configure Keycloak (for now, see :ref:`oidc-dev` in the Developer Guide) and enable ``api-bearer-auth`` under :ref:`feature-flags`. + +You can test that bearer tokens are working by following the example under :ref:`bearer-tokens` in the API Guide. + +.. _smtp-config: + +SMTP/Email Configuration +------------------------ + +The installer prompts you for some basic options to configure Dataverse to send email using your SMTP server, but in many cases, extra configuration may be necessary. + +Make sure the :ref:`dataverse.mail.system-email` has been set. Email will not be sent without it. A hint will be logged about this fact. +If you want to separate system email from your support team's email, take a look at :ref:`dataverse.mail.support-email`. + +Then check the list of commonly used settings at the top of :ref:`dataverse.mail.mta`. + +If you have trouble, consider turning on debugging with :ref:`dataverse.mail.debug`. + +.. _database-persistence: + +Database Persistence +-------------------- + +The Dataverse software uses a PostgreSQL database to store objects users create. +You can configure basic and advanced settings for the PostgreSQL database connection with the help of +MicroProfile Config API. + +Basic Database Settings ++++++++++++++++++++++++ + +1. Any of these settings can be set via system properties (see :ref:`jvm-options` starting at :ref:`dataverse.db.name`), environment variables or other + MicroProfile Config mechanisms supported by the app server. + `See Payara docs for supported sources `_. +2. Remember to protect your secrets. + See :ref:`secure-password-storage` for more information. + +3. Environment variables follow the key, replacing any dot, colon, dash, etc. into an underscore "_" and all uppercase + letters. Example: ``dataverse.db.host`` -> ``DATAVERSE_DB_HOST`` + +.. list-table:: + :widths: 15 60 25 + :header-rows: 1 + :align: left + + * - MPCONFIG Key + - Description + - Default + * - dataverse.db.host + - The PostgreSQL server to connect to. + - ``localhost`` + * - dataverse.db.port + - The PostgreSQL server port to connect to. + - ``5432`` + * - dataverse.db.user + - The PostgreSQL user name to connect with. + - | ``dataverse`` + | (installer sets to ``dvnapp``) + * - dataverse.db.password + - The PostgreSQL users password to connect with. + + **Please note the safety advisory above.** + - *No default* + * - dataverse.db.name + - The PostgreSQL database name to use for the Dataverse installation. + - | ``dataverse`` + | (installer sets to ``dvndb``) + * - dataverse.db.parameters + - Connection parameters, such as ``sslmode=require``. See `Postgres JDBC docs `_ + Note: you don't need to provide the initial "?". + - *Empty string* + +Advanced Database Settings +++++++++++++++++++++++++++ + +The following options are useful in many scenarios. You might be interested in debug output during development or +monitoring performance in production. + +You can find more details within the Payara docs: + +- `User Guide: Connection Pool Configuration `_ +- `Tech Doc: Advanced Connection Pool Configuration `_. + +Connection Validation +^^^^^^^^^^^^^^^^^^^^^ + +.. list-table:: + :widths: 15 60 25 + :header-rows: 1 + :align: left + + * - MPCONFIG Key + - Description + - Default + * - dataverse.db.is-connection-validation-required + - ``true``: Validate connections, allow server to reconnect in case of failure. + - false + * - dataverse.db.connection-validation-method + - | The method of connection validation: + | ``table|autocommit|meta-data|custom-validation``. + - *Empty string* + * - dataverse.db.validation-table-name + - The name of the table used for validation if the validation method is set to ``table``. + - *Empty string* + * - dataverse.db.validation-classname + - The name of the custom class used for validation if the ``validation-method`` is set to ``custom-validation``. + - *Empty string* + * - dataverse.db.validate-atmost-once-period-in-seconds + - Specifies the time interval in seconds between successive requests to validate a connection at most once. + - ``0`` (disabled) + +Connection & Statement Leaks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. list-table:: + :widths: 15 60 25 + :header-rows: 1 + :align: left + + * - MPCONFIG Key + - Description + - Default + * - dataverse.db.connection-leak-timeout-in-seconds + - Specify timeout when connections count as "leaked". + - ``0`` (disabled) + * - dataverse.db.connection-leak-reclaim + - If enabled, leaked connection will be reclaimed by the pool after connection leak timeout occurs. + - ``false`` + * - dataverse.db.statement-leak-timeout-in-seconds + - Specifiy timeout when statements should be considered to be "leaked". + - ``0`` (disabled) + * - dataverse.db.statement-leak-reclaim + - If enabled, leaked statement will be reclaimed by the pool after statement leak timeout occurs. + - ``false`` + +Logging & Slow Performance +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. list-table:: + :widths: 15 60 25 + :header-rows: 1 + :align: left + + * - MPCONFIG Key + - Description + - Default + * - dataverse.db.statement-timeout-in-seconds + - Timeout property of a connection to enable termination of abnormally long running queries. + - ``-1`` (disabled) + * - dataverse.db.slow-query-threshold-in-seconds + - SQL queries that exceed this time in seconds will be logged. + - ``-1`` (disabled) + * - dataverse.db.log-jdbc-calls + - When set to true, all JDBC calls will be logged allowing tracing of all JDBC interactions including SQL. + - ``false`` + +Database Configuration Tips ++++++++++++++++++++++++++++ + +In this section you can find some example scenarios of advanced configuration for the database connection that can improve service performance and availability. + +Database Connection Recovery +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Consider the following scenario: if there is no advanced configuration for the database connection and the Dataverse server loses that connection, for example if the database host is down, the server will be "dead" even after the database server is back to normal. +The only solution to recover Dataverse would be to restart the service. To avoid this situation, the following settings can be used to configure validation of the database connection. +This way, the database connection can be automatically recovered after a failure, improving the server availability. For a Docker installation, it is suggested to create an init.d script so that if the container needs to be recreated, these settings will always be configured. + +.. code-block:: bash + + # Enable database connection validation + asadmin create-jvm-options "-Ddataverse.db.is-connection-validation-required=true" + # Configure to use a database table as the validation method + asadmin create-jvm-options "-Ddataverse.db.connection-validation-method=table" + # Configure the "setting" table to be used for connection validation, but any tables can be used + asadmin create-jvm-options "-Ddataverse.db.validation-table-name=setting" + # Configure a validation period of 60 seconds, but different values may be used + asadmin create-jvm-options "-Ddataverse.db.validate-atmost-once-period-in-seconds=60" + +.. _file-storage: + +File Storage +------------ + +By default, a Dataverse installation stores all data files (files uploaded by end users) on the filesystem at ``/usr/local/payara6/glassfish/domains/domain1/files``. This path can vary based on answers you gave to the installer (see the :ref:`dataverse-installer` section of the Installation Guide) or afterward by reconfiguring the ``dataverse.files.\.directory`` JVM option described below. + +A Dataverse installation can alternately store files in a Swift or S3-compatible object store, or on a Globus endpoint, and can now be configured to support multiple stores at once. With a multi-store configuration, the location for new files can be controlled on a per-Dataverse collection basis. + +A Dataverse installation may also be configured to reference some files (e.g. large and/or sensitive data) stored in a web or Globus accessible trusted remote store. + +A Dataverse installation can be configured to allow out of band upload by setting the ``dataverse.files.\.upload-out-of-band`` JVM option to ``true``. +By default, Dataverse supports uploading files via the :ref:`add-file-api`. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server). +With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the :ref:`Adding the Uploaded file to the Dataset ` API call (described in the :doc:`/developers/s3-direct-upload-api` page) used to add metadata and inform Dataverse that a new file has been added to the relevant store. + +The following sections describe how to set up various types of stores and how to configure for multiple stores. + +Multi-store Basics +++++++++++++++++++ + +To support multiple stores, a Dataverse installation now requires an id, type, and label for each store (even for a single store configuration). These are configured by defining two required jvm options: + +.. code-block:: none + + ./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..type=" + ./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..label=