Stable/mitaka by ykmmxm · Pull Request #8 · turbonomic/nova

ykmmxm · 2018-04-12T16:18:52Z

Added turbonomic_target_address parameter to mitigate OpenStack configuration complexity

Starting with the Pike release, reporting VCPU/memory/disk is no longer required. However, we used VCPU to check if a node is available, so nodes without VCPU in their properties were always ignored. This patch changes the logic to use the existing _node_resources_unavailable call. This change also fixes another related issue: when disk or memory are missing from properties, the virt driver tries to report zero max_unit for them, which is not allowed by placement. Change-Id: I1bbfc152189252c5c45e6153695a802d17b76690 Closes-Bug: #1723423 (cherry picked from commit b25928d)

…nto stable/pike

…st allocs" into stable/pike

…hitelist" into stable/pike

…elist" into stable/pike

…nto stable/pike

…to stable/pike

OSError will lead instance to ERROR state, change to MigrationPreCheckError will make the instance status not changed. Also, modify some test cases to make unit test easier Closes-Bug: 1694636 Change-Id: I3286c32ca205ffd2d5d1aaab88cc96699476e410 (cherry picked from commit cb565d9)

The BDM has no uuid attribute so the debug message in here would result in an AttributeError. This has been around since the creation of this object, and the debug log message was probably copied from the Instance object. This was only exposed in Pike when this code started lazy-loading the instance field: I1dc54a38f02bb48921bcbc4c2fdcc2c946e783c1 So this change fixes that bug and adds tests for obj_load_attr. Change-Id: I8b55227b1530a76c2f396c035384abd89237d936 Closes-Bug: #1726871 (cherry picked from commit 1ca191f)

Replace the ocata config-reference URLs with URLs in each project repo. Change-Id: I48d7c77a6e0eaaf0efe66f848f45ae99007577e1 Closes-Bug: #1715545 (cherry picked from commit 2fce8a1)

As part of the docs migration from openstack-manuals to nova in the pike release we missed the config-drive docs. This change does the following: 1. Imports the config-drive doc into the user guide. 2. Fixes a broken link to the metadata service in the doc. 3. Removes a note about liberty being the current release. 4. Adds a link in the API reference parameters to actually point at the document we have in tree now, which is otherwise not very discoverable as the main index does not link to this page (or the user index for that matter). Partial-Bug: #1714017 Closes-Bug: #1720873 Change-Id: I1d54e1f5a1a94e9821efad99b7fa430bd8fece0a (cherry picked from commit 59bd2f6)

This imports the "provide-user-data-to-instances" page from the old openstack-manuals user guide. Since we don't have a glossary, the :term: link is removed and replaced with just giving the glossary definition as the first part of the doc. Change-Id: Iae70d9b53d6cefb3bcb107fe68499cccb71fc15e Partial-Bug: #1714017 (cherry picked from commit 3fc8538)

One of the things this commit: commit 14c38ac Author: Kashyap Chamarthy <kchamart@redhat.com> Date: Thu Jul 20 19:01:23 2017 +0200 libvirt: Post-migration, set cache value for Cinder volume(s) [...] did was to supposedly remove "duplicate" calls to _set_cache_mode(). But that came back to bite us. Now, while the Cinder volumes are taken care of w.r.t handling its cache value during migration, but the above referred commit (14c38ac) seemed to introduce a regression because it disregards the 'disk_cachemodes' Nova config parameter altogether for boot disks -- i.e. even though if a user set the cache mode to be 'writeback', it's ignored and instead 'none' is set unconditionally. Add the _set_cache_mode() calls back in _get_guest_storage_config(). Co-Authored-By: melanie witt <melwittt@gmail.com> Closes-Bug: #1727558 Change-Id: I7370cc2942a6c8c51ab5355b50a9e5666cca042e (cherry picked from commit 24e79bc)

In I6ddcaaca37fc5387c2d2e9f51c67ea9e85acb5c5 we forgot to update the legacy filter properties dictionary so the requested target wasn't passed to the scheduler when evacuating. Adding a functional test for verifying the behaviour. NOTE(sbauza): The issue has been incendentally fixed in Pike by I434af8e4ad991ac114dd67d66797a562d16bafe2 so the regression test just verifies that the expected behaviour works. The Newton and Ocata backports will be slightly different from that one as we need to verify that host3 will be preferred eventually over host2. Related-Bug: #1702454 Change-Id: Id9adb10d2ef821c8b61d8f1d5dc9dd66ec7aaac8 (cherry picked from commit e0e2e065a495b4fa9ebdec987c935e3c83118c46)

When we added the requested_destination field for the RequestSpec object in Newton, we forgot to pass it to the legacy dictionary when wanting to use scheduler methods not yet supporting the NovaObject. As a consequence, when we were transforming the RequestSpec object into a tuple of (request_spec, filter_props) dicts and then rehydrating a new RequestSpec object using those dicts, the newly created object was not keeping that requested_destination field from the original. Change-Id: Iba0b88172e9a3bfd4f216dd364d70f7e01c60ee2 Closes-Bug: #1702454 (cherry picked from commit 69bef428bd555bb31f43db6ca9c21db8aeb9007e)

…tance If we're calling build_request_spec in conductor.rebuild_instance, it's because we are evacuating and the instance is so old it does not have a request spec. We need the request_spec to pass to the scheduler to pick a destination host for the evacuation. For evacuate, nova-api does not pass any image reference parameters, and even if it did, those are image IDs, not an image meta dict that build_request_spec expects, so this code has just always been wrong. This change fixes the problem by passing a primitive version of the instance.image_meta which build_request_spec will then return back to conductor and that gets used to build a RequestSpec object from primitives. It's important to use the correct image meta so that the scheduler can properly filter hosts using things like the AggregateImagePropertiesIsolation and ImagePropertiesFilter filters. Change-Id: I0c8ce65016287de7be921c312493667a8c7f762e Closes-Bug: #1727855 (cherry picked from commit d2690d6)

The bandwidth param set outside of the method "migrate" from guest object have to be done inside that to avoid duplicating that option. (cherry picked from commit c212ad2) Backported to avoid a minor merge conflict backporting change I9b545ca8, and because it addresses a related issue calling migrateToURI3. Change-Id: I8a37753dea8eca7b26466f17dfbdc184c48c24c5 Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@redhat.com>

If we specify block migration, but there are no disks which actually require block migration we call libvirt's migrateToURI3() with VIR_MIGRATE_NON_SHARED_INC in flags and an empty migrate_disks in params. Libvirt interprets this to be the default block migration behaviour of "block migrate all writeable disks". However, migrate_disks may only be empty because we filtered attached volumes out of it, in which case libvirt will block migrate attached volumes. This is a data corruptor. This change addresses the issue at the point we call migrateToURI3(). As we never want the default block migration behaviour, we can safely remove the flag if the list of disks to migrate is empty. (cherry picked from commit ea9bf52) nova/tests/unit/virt/libvirt/test_driver.py: Explicitly asserts byte string destination_xml in _test_live_migration_block_migration_flags. Not required in master due to change I85cd9a90. Change-Id: I9b545ca8aa6dd7b41ddea2d333190c9fbed19bc1 Resolves-bug: #1719362

When confirming a resize, the libvirt driver on the source host checks to see if the instance base directory (which contains the domain xml files, etc) exists and if the root disk image does not, it removes the instance base directory. However, the root image disk won't exist on local storage for a volume-backed instance and if the instance base directory is on shared storage, e.g. NFS or Ceph, between the source and destination host, the instance base directory is incorrectly deleted. This adds a check to see if the instance is volume-backed when checking to see if the instance base directory should be removed from the source host when confirming a resize. Change-Id: I29fac80d08baf64bf69e54cf673e55123174de2a Closes-Bug: #1728603 (cherry picked from commit f02afc6)

When we notice that an instance was deleted after scheduling, we punt on instance creation. When that happens, the scheduler will have created allocations already so we need to delete those to avoid leaking resources. Related-Bug: #1679750 Change-Id: I54806fe43257528fbec7d44c841ee4abb14c9dff (cherry picked from commit 57a3af6)

The resource tracker's _remove_deleted_instances_allocations() assumes that InstanceNotFound means that an instance was deleted. That's not quite accurate, as we would also see that in the window between creating allocations and actually creating the instance in the cell database. So, the code now will kill allocations for those instances before they are created. This change makes us look up the instance with read_deleted=yes, and if we find it with deleted=True, then we do the allocation removal. This does mean that someone running a full DB archive at the instant an instance is deleted in some way that didn't result in allocation removal as well could leak those. However, we can log that (unlikely) situation. Closes-Bug: #1729371 Conflicts: nova/compute/resource_tracker.py nova/tests/unit/compute/test_resource_tracker.py NOTE(mriedem): Conflicts were due to not having change 1ff1310 or change e3b7f43 in Pike. Change-Id: I4482ac2ecf8e07c197fd24c520b7f11fd5a10945 (cherry picked from commit d176175)

…ble/pike

…able/pike

…ize" into stable/pike

The hide_server_addresses extension is looking up the cached instance based on what the user provided for the server id, which may not match what is used to cache the instance for the request. For example, a request with upper-case server uuid could be found in a mysql-backed system because mysql is case insensitive by default, but the instance is keyed off the server id from the DB, which is lower-case, so we'll fail to look up the instance in the cache if the IDs don't match. There is no test for this because it turns out it's actually really hard to recreate this since it requires running with a mysql backend to recreate the case insensitive check, which isn't going to work with sqlite. Given how trivial this fix is, creating a big mysql recreate test is not worth it. Change-Id: I09b288aa2ad9969800a3cd26c675b002c6c9f638 Closes-Bug: #1693335 (cherry picked from commit ecfb65c)

…"" into stable/pike

When a server build fails on a selected compute host, the compute service will cast to conductor which calls the scheduler to select another host to attempt the build if retries are not exhausted. With commit 08d24b7, if retries are exhausted or the scheduler raises NoValidHost, conductor will deallocate networking for the instance. In the case of neutron, this means unbinding any ports that the user provided with the server create request and deleting any ports that nova-compute created during the allocate_for_instance() operation during server build. When an instance is deleted, it's networking is deallocated in the same way - unbind pre-existing ports, delete ports that nova created. The problem is when rescheduling from a failed host, if we successfully reschedule and build on a secondary host, any ports created from the original host are not cleaned up until the instance is deleted. For Ironic or SR-IOV ports, those are always deallocated. The ComputeDriver.deallocate_networks_on_reschedule() method defaults to False just so that the Ironic driver could override it, but really we should always cleanup neutron ports before rescheduling. Looking over bug report history, there are some mentions of different networking backends handling reschedules with multiple ports differently, in that sometimes it works and sometimes it fails. Regardless of the networking backend, however, we are at worst taking up port quota for the tenant for ports that will not be bound to whatever host the instance ends up on. There could also be legacy reasons for this behavior with nova-network, so that is side-stepped here by just restricting this check to whether or not neutron is being used. When we eventually remove nova-network we can then also remove the deallocate_networks_on_reschedule() method and SR-IOV check. NOTE(mriedem): There are a couple of changes to the unit test for code that didn't exist in Pike, due to the change for alternate hosts Iae904afb6cb4fcea8bb27741d774ffbe986a5fb4 and the change to pass the request spec to conductor Ie5233bd481013413f12e55201588d37a9688ae78. Change-Id: Ib2abf73166598ff14fce4e935efe15eeea0d4f7d Closes-Bug: #1597596 (cherry picked from commit 3a503a8) (cherry picked from commit 9203326)

Related-Bug: #1746509 Change-Id: I6f8f88e448c2c5d4b1f09d68b03d1b8763cb8ae8 (cherry picked from commit 464985a) (cherry picked from commit 07a1cbb)

The _make_instance_list method is used to make an InstanceList object out of database dict-like instance objects. It's possible while making the list that the various _from_db_object methods that are called might do their own database writes. Currently, we're calling _make_instance_list nested inside of a 'reader' database transaction context and we hit the error: TypeError: Can't upgrade a READER transaction to a WRITER mid-transaction during the _make_instance_list call if anything tries to do a database write. The scenario encountered was after an upgrade to Pike, older service records without UUIDs were attempted to be updated with UUIDs upon access, and that access happened to be during an instance list, so it failed when trying to write the service UUID while nested inside the 'reader' database transaction context. This simply moves the _make_instance_list method call out from the @db.select_db_reader_mode decorated _get_by_filters_impl method to the get_by_filters method to remove the nesting. Closes-Bug: #1746509 Change-Id: Ifadf408802cc15eb9769d2dc1fc920426bb7fc20 (cherry picked from commit b1ed92c) (cherry picked from commit 22b2a8e)

…le/pike

As of now, if vm task_state is not 'None', and user tries to force-delete instance, then he gets HTTP 500 Error and instance deletion doesn't progress. The same is not the case, when user tries with delete api instead of force-delete api, even if vm task_state is not 'None'. Fixed the issue by allowing force-delete to delete instance in task_state other than None. Change-Id: Ida1a9d8761cec9585f031ec25e5692b8bb55661e Closes-Bug: #1741000 (cherry picked from commit 0d2031a)

…pike

…t" into stable/pike

There are some cases where None value is set to cpuset_reserved in InstanceNUMATopology at _numa_fit_instance_cell() function in hardware.py. However, libvirt driver treat cpuset_reserved value as an iterate object when it constructs xml configuration. To avoid a risk to get an error in libvirt driver, this patch adds a check to see if the value is not None before adding the cpus for emulator threads. Change-Id: Iab3d950c4f4138118ac6a9fd98407eaadcb24d9e Closes-Bug: #1746674 (cherry picked from commit 24d9e06) (cherry picked from commit 2dc4d7a)

Change I11746d1ea996a0f18b7c54b4c9c21df58cc4714b changed the behavior of the API and conductor when rebuilding an instance with a new image such that the image is run through the scheduler filters again to see if it will work on the existing host that the instance is running on. As a result, conductor started passing 'scheduled_node' to the compute which was using it for logic to tell if a claim should be attempted. We don't need to do a claim for a rebuild since we're on the same host. This removes the scheduled_node logic from the claim code, as we should only ever attempt a claim if we're evacuating, which we can determine based on the 'recreate' parameter. Conflicts: nova/compute/manager.py NOTE(mriedem): The conflict is due to change I0883c2ba1989c5d5a46e23bcbcda53598707bcbc in Queens. Change-Id: I7fde8ce9dea16679e76b0cb2db1427aeeec0c222 Closes-Bug: #1750618 (cherry picked from commit a390290) (cherry picked from commit 3c5e519)

…into stable/pike

…stable/pike

…led" into stable/pike

…n" into stable/pike

…e/pike

…guration complexity

viveknandavanam · 2018-04-12T16:41:52Z

nova/scheduler/turbonomic_scheduler.py

+under the [DEFAULT] section
 ------------------------------------------------------------
-driver = nova.scheduler.turbonomic_scheduler.TurbonomicScheduler
+scheduler_driver = nova.scheduler.turbonomic_scheduler.TurbonomicScheduler


This wasnt changed in the Mitaka branch - Why is this showing up as a change?
This exists only on the Pike branch - Did you create the patch on the right branch?

viveknandavanam · 2018-04-12T16:42:26Z

nova/scheduler/turbonomic_scheduler.py

-      2) Add turbonomic_driver to <Python 2.7>/site-packages/nova-16.1.0-py2.7.egg-info/entry_points.txt:
-      turbonomic_scheduler = nova.scheduler.turbonomic_scheduler:TurbonomicScheduler
+      2) scheduler_driver should be enabled across all regions, turbonomic_target_address must be equal to the address specified
+      by the customer while discovering the target, e.x. a target consists of RegionOne (X.X.X.10) and RegionTwo (X.X.X.11)


while discovering the target -> while discovering the target in Turbonomic

viveknandavanam · 2018-04-12T16:44:34Z

Can you also work on resolving the conflicts mentioned above?
We can work together as there are commits that were made for other issues.

Zuul and others added 30 commits October 19, 2017 04:58

Merge "fix unstable shelve offload functional tests" into stable/pike

1b45b53

Merge "Fix sending legacy instance.update notification" into stable/pike

2b28ee8

Merge "Keep updating allocations for Ironic" into stable/pike

cb3abbd

Merge "Add _wait_for_action_fail_completion to InstanceHelperMixin" i…

ae2c632

…nto stable/pike

Merge "Add recreate test for live migrate rollback not cleaning up de…

8aef4c3

…st allocs" into stable/pike

Merge "Add live.migration.force.complete to the legacy notification w…

a2e4540

…hitelist" into stable/pike

Merge "Account for compute.metrics.update in legacy notification whit…

ad0032d

…elist" into stable/pike

Merge "Add functional migrate force_complete test" into stable/pike

418740e

Merge "Add functional for live migrate delete" into stable/pike

cd54173

Merge "Remove dest node allocations during live migration rollback" i…

656c384

…nto stable/pike

Merge "fix nova accepting invalid availability zone name with ':'" in…

932cf9e

…to stable/pike

Fix the ocata config-reference URLs

ad6208c

Replace the ocata config-reference URLs with URLs in each project repo. Change-Id: I48d7c77a6e0eaaf0efe66f848f45ae99007577e1 Closes-Bug: #1715545 (cherry picked from commit 2fce8a1)

Merge "Avoid deleting allocations for instances being built" into sta…

97cb5dd

…ble/pike

Merge "Clean up allocations if instance deleted during build" into st…

f137d34

…able/pike

Merge "libvirt: do not remove inst_base when volume-backed during res…

8fdb137

…ize" into stable/pike

Zuul and others added 28 commits March 21, 2018 21:42

Merge "Fix docs for IsolatedHostsFilter" into stable/pike

0fef45e

Merge "Revert "Refine waiting for vif plug events during _hard_reboot…

69f996c

…"" into stable/pike

Add functional regression test for bug 1746509

26870a0

Related-Bug: #1746509 Change-Id: I6f8f88e448c2c5d4b1f09d68b03d1b8763cb8ae8 (cherry picked from commit 464985a) (cherry picked from commit 07a1cbb)

Merge "Avoid exploding if guest refuses to detach a volume" into stab…

03b40fb

…le/pike

Merge "Remove osprofiler tests" into stable/pike

335fdc3

Merge "Handle spawning error on unshelving" into stable/pike

5f4ae2d

Merge "libvirt: mask InjectionInfo.admin_pass" into stable/pike

0b15880

Merge "Save admin password to sysmeta in libvirt driver" into stable/…

b8e15d8

…pike

Merge "Add functional regression test for bug 1746509" into stable/pike

0e03e75

Merge "Move _make_instance_list call outside of DB transaction contex…

6557395

…t" into stable/pike

Merge "docs: Disable smartquotes" into stable/pike

2626291

Merge "add check before adding cpus to cpuset_reserved" into stable/pike

2be5083

Merge "Only attempt a rebuild claim for an evacuation to a new host" …

ac60757

…into stable/pike

Merge "Add --by-service to discover_hosts" into stable/pike

af1dd58

Merge "Re-use existing ComputeNode on ironic rebalance" into stable/pike

916e3cf

Merge "Do not set allocation.id in AllocationList.create_all()" into …

3e80588

…stable/pike

Merge "Don't launch guestfs in a thread pool if guestfs.debug is enab…

6f04449

…led" into stable/pike

Merge "unquiesce instance on volume snapshot failure" into stable/pike

28278e1

Merge "Return 400 when compute host is not found" into stable/pike

f047f6f

Merge "Always deallocate networking before reschedule if using Neutro…

0b38a40

…n" into stable/pike

Merge "compute: Cleans up allocations after failed resize" into stabl…

708342f

…e/pike

TurbonomicScheduler created.

84d2b10

Added turbonomic_target_address parameter to mitigate OpenStack confi…

9043678

…guration complexity

viveknandavanam requested changes Apr 12, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable/mitaka#8

Stable/mitaka#8
ykmmxm wants to merge 8664 commits intoturbonomic:stable/mitakafrom
ykmmxm:stable/mitaka

ykmmxm commented Apr 12, 2018

Uh oh!

viveknandavanam Apr 12, 2018

Uh oh!

viveknandavanam Apr 12, 2018

Uh oh!

viveknandavanam commented Apr 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

ykmmxm commented Apr 12, 2018

Uh oh!

viveknandavanam Apr 12, 2018

Choose a reason for hiding this comment

Uh oh!

viveknandavanam Apr 12, 2018

Choose a reason for hiding this comment

Uh oh!

viveknandavanam commented Apr 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants