Skip to content

Stable/mitaka#8

Open
ykmmxm wants to merge 8664 commits intoturbonomic:stable/mitakafrom
ykmmxm:stable/mitaka
Open

Stable/mitaka#8
ykmmxm wants to merge 8664 commits intoturbonomic:stable/mitakafrom
ykmmxm:stable/mitaka

Conversation

@ykmmxm
Copy link

@ykmmxm ykmmxm commented Apr 12, 2018

Added turbonomic_target_address parameter to mitigate OpenStack configuration complexity

Zuul and others added 30 commits October 19, 2017 04:58
Starting with the Pike release, reporting VCPU/memory/disk is no longer required.
However, we used VCPU to check if a node is available, so nodes without VCPU in
their properties were always ignored. This patch changes the logic to use the existing
_node_resources_unavailable call.

This change also fixes another related issue: when disk or memory are missing from
properties, the virt driver tries to report zero max_unit for them, which is not
allowed by placement.

Change-Id: I1bbfc152189252c5c45e6153695a802d17b76690
Closes-Bug: #1723423
(cherry picked from commit b25928d)
OSError will lead instance to ERROR state, change to
MigrationPreCheckError will make the instance status not changed.

Also, modify some test cases to make unit test easier

Closes-Bug: 1694636

Change-Id: I3286c32ca205ffd2d5d1aaab88cc96699476e410
(cherry picked from commit cb565d9)
The BDM has no uuid attribute so the debug message in here
would result in an AttributeError. This has been around since
the creation of this object, and the debug log message was
probably copied from the Instance object.

This was only exposed in Pike when this code started
lazy-loading the instance field:

  I1dc54a38f02bb48921bcbc4c2fdcc2c946e783c1

So this change fixes that bug and adds tests for obj_load_attr.

Change-Id: I8b55227b1530a76c2f396c035384abd89237d936
Closes-Bug: #1726871
(cherry picked from commit 1ca191f)
Replace the ocata config-reference URLs with
URLs in each project repo.

Change-Id: I48d7c77a6e0eaaf0efe66f848f45ae99007577e1
Closes-Bug: #1715545
(cherry picked from commit 2fce8a1)
As part of the docs migration from openstack-manuals to
nova in the pike release we missed the config-drive docs.

This change does the following:

1. Imports the config-drive doc into the user guide.
2. Fixes a broken link to the metadata service in the doc.
3. Removes a note about liberty being the current release.
4. Adds a link in the API reference parameters to actually
   point at the document we have in tree now, which is
   otherwise not very discoverable as the main index does
   not link to this page (or the user index for that matter).

Partial-Bug: #1714017
Closes-Bug: #1720873

Change-Id: I1d54e1f5a1a94e9821efad99b7fa430bd8fece0a
(cherry picked from commit 59bd2f6)
This imports the "provide-user-data-to-instances" page
from the old openstack-manuals user guide.

Since we don't have a glossary, the :term: link is removed
and replaced with just giving the glossary definition as
the first part of the doc.

Change-Id: Iae70d9b53d6cefb3bcb107fe68499cccb71fc15e
Partial-Bug: #1714017
(cherry picked from commit 3fc8538)
One of the things this commit:

    commit 14c38ac
    Author: Kashyap Chamarthy <kchamart@redhat.com>
    Date:   Thu Jul 20 19:01:23 2017 +0200

        libvirt: Post-migration, set cache value for Cinder volume(s)

    [...]

did was to supposedly remove "duplicate" calls to _set_cache_mode().

But that came back to bite us.

Now, while the Cinder volumes are taken care of w.r.t handling its cache
value during migration, but the above referred commit (14c38ac) seemed
to introduce a regression because it disregards the 'disk_cachemodes'
Nova config parameter altogether for boot disks -- i.e. even though if
a user set the cache mode to be 'writeback', it's ignored and
instead 'none' is set unconditionally.

Add the _set_cache_mode() calls back in _get_guest_storage_config().

Co-Authored-By: melanie witt <melwittt@gmail.com>

Closes-Bug: #1727558

Change-Id: I7370cc2942a6c8c51ab5355b50a9e5666cca042e
(cherry picked from commit 24e79bc)
In I6ddcaaca37fc5387c2d2e9f51c67ea9e85acb5c5 we forgot to update the
legacy filter properties dictionary so the requested target wasn't
passed to the scheduler when evacuating.
Adding a functional test for verifying the behaviour.

NOTE(sbauza): The issue has been incendentally fixed in Pike by
I434af8e4ad991ac114dd67d66797a562d16bafe2 so the regression test just
verifies that the expected behaviour works.
The Newton and Ocata backports will be slightly different from that one as we
need to verify that host3 will be preferred eventually over host2.

Related-Bug: #1702454

Change-Id: Id9adb10d2ef821c8b61d8f1d5dc9dd66ec7aaac8
(cherry picked from commit e0e2e065a495b4fa9ebdec987c935e3c83118c46)
When we added the requested_destination field for the RequestSpec object
in Newton, we forgot to pass it to the legacy dictionary when wanting to
use scheduler methods not yet supporting the NovaObject.
As a consequence, when we were transforming the RequestSpec object into a
tuple of (request_spec, filter_props) dicts and then rehydrating a new
RequestSpec object using those dicts, the newly created object was not
keeping that requested_destination field from the original.

Change-Id: Iba0b88172e9a3bfd4f216dd364d70f7e01c60ee2
Closes-Bug: #1702454
(cherry picked from commit 69bef428bd555bb31f43db6ca9c21db8aeb9007e)
…tance

If we're calling build_request_spec in conductor.rebuild_instance,
it's because we are evacuating and the instance is so old it does
not have a request spec. We need the request_spec to pass to the
scheduler to pick a destination host for the evacuation.

For evacuate, nova-api does not pass any image reference parameters,
and even if it did, those are image IDs, not an image meta dict that
build_request_spec expects, so this code has just always been wrong.

This change fixes the problem by passing a primitive version of
the instance.image_meta which build_request_spec will then return
back to conductor and that gets used to build a RequestSpec object
from primitives.

It's important to use the correct image meta so that the scheduler
can properly filter hosts using things like the
AggregateImagePropertiesIsolation and ImagePropertiesFilter filters.

Change-Id: I0c8ce65016287de7be921c312493667a8c7f762e
Closes-Bug: #1727855
(cherry picked from commit d2690d6)
The bandwidth param set outside of the method "migrate" from guest
object have to be done inside that to avoid duplicating that option.

(cherry picked from commit c212ad2)

Backported to avoid a minor merge conflict backporting change
I9b545ca8, and because it addresses a related issue calling
migrateToURI3.

Change-Id: I8a37753dea8eca7b26466f17dfbdc184c48c24c5
Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@redhat.com>
If we specify block migration, but there are no disks which actually
require block migration we call libvirt's migrateToURI3() with
VIR_MIGRATE_NON_SHARED_INC in flags and an empty migrate_disks in
params. Libvirt interprets this to be the default block migration
behaviour of "block migrate all writeable disks". However,
migrate_disks may only be empty because we filtered attached volumes
out of it, in which case libvirt will block migrate attached volumes.
This is a data corruptor.

This change addresses the issue at the point we call migrateToURI3().
As we never want the default block migration behaviour, we can safely
remove the flag if the list of disks to migrate is empty.

(cherry picked from commit ea9bf52)

nova/tests/unit/virt/libvirt/test_driver.py:
  Explicitly asserts byte string destination_xml in
  _test_live_migration_block_migration_flags. Not required in master
  due to change I85cd9a90.

Change-Id: I9b545ca8aa6dd7b41ddea2d333190c9fbed19bc1
Resolves-bug: #1719362
When confirming a resize, the libvirt driver on the source host checks
to see if the instance base directory (which contains the domain xml
files, etc) exists and if the root disk image does not, it removes the
instance base directory.

However, the root image disk won't exist on local storage for a
volume-backed instance and if the instance base directory is on shared
storage, e.g. NFS or Ceph, between the source and destination host, the
instance base directory is incorrectly deleted.

This adds a check to see if the instance is volume-backed when checking
to see if the instance base directory should be removed from the source
host when confirming a resize.

Change-Id: I29fac80d08baf64bf69e54cf673e55123174de2a
Closes-Bug: #1728603
(cherry picked from commit f02afc6)
When we notice that an instance was deleted after scheduling, we punt on
instance creation. When that happens, the scheduler will have created
allocations already so we need to delete those to avoid leaking resources.

Related-Bug: #1679750
Change-Id: I54806fe43257528fbec7d44c841ee4abb14c9dff
(cherry picked from commit 57a3af6)
The resource tracker's _remove_deleted_instances_allocations() assumes that
InstanceNotFound means that an instance was deleted. That's not quite accurate,
as we would also see that in the window between creating allocations and actually
creating the instance in the cell database. So, the code now will kill
allocations for those instances before they are created.

This change makes us look up the instance with read_deleted=yes, and if we find
it with deleted=True, then we do the allocation removal. This does mean that
someone running a full DB archive at the instant an instance is deleted in some
way that didn't result in allocation removal as well could leak those. However,
we can log that (unlikely) situation.

Closes-Bug: #1729371

Conflicts:
      nova/compute/resource_tracker.py
      nova/tests/unit/compute/test_resource_tracker.py

NOTE(mriedem): Conflicts were due to not having change
1ff1310 or change
e3b7f43 in Pike.

Change-Id: I4482ac2ecf8e07c197fd24c520b7f11fd5a10945
(cherry picked from commit d176175)
The hide_server_addresses extension is looking up the cached
instance based on what the user provided for the server id,
which may not match what is used to cache the instance for the
request. For example, a request with upper-case server uuid
could be found in a mysql-backed system because mysql is
case insensitive by default, but the instance is keyed off the
server id from the DB, which is lower-case, so we'll fail
to look up the instance in the cache if the IDs don't match.

There is no test for this because it turns out it's actually
really hard to recreate this since it requires running with a
mysql backend to recreate the case insensitive check, which
isn't going to work with sqlite. Given how trivial this fix is,
creating a big mysql recreate test is not worth it.

Change-Id: I09b288aa2ad9969800a3cd26c675b002c6c9f638
Closes-Bug: #1693335
(cherry picked from commit ecfb65c)
Zuul and others added 28 commits March 21, 2018 21:42
When a server build fails on a selected compute host, the compute
service will cast to conductor which calls the scheduler to select
another host to attempt the build if retries are not exhausted.

With commit 08d24b7, if retries
are exhausted or the scheduler raises NoValidHost, conductor will
deallocate networking for the instance. In the case of neutron, this
means unbinding any ports that the user provided with the server
create request and deleting any ports that nova-compute created during
the allocate_for_instance() operation during server build.

When an instance is deleted, it's networking is deallocated in the same
way - unbind pre-existing ports, delete ports that nova created.

The problem is when rescheduling from a failed host, if we successfully
reschedule and build on a secondary host, any ports created from the
original host are not cleaned up until the instance is deleted. For
Ironic or SR-IOV ports, those are always deallocated.

The ComputeDriver.deallocate_networks_on_reschedule() method defaults
to False just so that the Ironic driver could override it, but really
we should always cleanup neutron ports before rescheduling.

Looking over bug report history, there are some mentions of different
networking backends handling reschedules with multiple ports differently,
in that sometimes it works and sometimes it fails. Regardless of the
networking backend, however, we are at worst taking up port quota for
the tenant for ports that will not be bound to whatever host the instance
ends up on.

There could also be legacy reasons for this behavior with nova-network,
so that is side-stepped here by just restricting this check to whether
or not neutron is being used. When we eventually remove nova-network we
can then also remove the deallocate_networks_on_reschedule() method and
SR-IOV check.

NOTE(mriedem): There are a couple of changes to the unit test for code
that didn't exist in Pike, due to the change for alternate hosts
Iae904afb6cb4fcea8bb27741d774ffbe986a5fb4 and the change to pass the
request spec to conductor Ie5233bd481013413f12e55201588d37a9688ae78.

Change-Id: Ib2abf73166598ff14fce4e935efe15eeea0d4f7d
Closes-Bug: #1597596
(cherry picked from commit 3a503a8)
(cherry picked from commit 9203326)
Related-Bug: #1746509

Change-Id: I6f8f88e448c2c5d4b1f09d68b03d1b8763cb8ae8
(cherry picked from commit 464985a)
(cherry picked from commit 07a1cbb)
The _make_instance_list method is used to make an InstanceList object
out of database dict-like instance objects. It's possible while making
the list that the various _from_db_object methods that are called might
do their own database writes.

Currently, we're calling _make_instance_list nested inside of a 'reader'
database transaction context and we hit the error:

  TypeError: Can't upgrade a READER transaction to a WRITER
  mid-transaction

during the _make_instance_list call if anything tries to do a database
write. The scenario encountered was after an upgrade to Pike, older
service records without UUIDs were attempted to be updated with UUIDs
upon access, and that access happened to be during an instance list,
so it failed when trying to write the service UUID while nested inside
the 'reader' database transaction context.

This simply moves the _make_instance_list method call out from the
@db.select_db_reader_mode decorated _get_by_filters_impl method to the
get_by_filters method to remove the nesting.

Closes-Bug: #1746509

Change-Id: Ifadf408802cc15eb9769d2dc1fc920426bb7fc20
(cherry picked from commit b1ed92c)
(cherry picked from commit 22b2a8e)
As of now, if vm task_state is not 'None', and user tries
to force-delete instance, then he gets HTTP 500 Error
and instance deletion doesn't progress.

The same is not the case, when user tries with delete api
instead of force-delete api, even if vm task_state is not 'None'.

Fixed the issue by allowing force-delete to delete instance
in task_state other than None.

Change-Id: Ida1a9d8761cec9585f031ec25e5692b8bb55661e
Closes-Bug: #1741000
(cherry picked from commit 0d2031a)
There are some cases where None value is set to cpuset_reserved in
InstanceNUMATopology at _numa_fit_instance_cell() function in
hardware.py. However, libvirt driver treat cpuset_reserved value
as an iterate object when it constructs xml configuration.

To avoid a risk to get an error in libvirt driver, this patch adds
a check to see if the value is not None before adding the cpus
for emulator threads.

Change-Id: Iab3d950c4f4138118ac6a9fd98407eaadcb24d9e
Closes-Bug: #1746674
(cherry picked from commit 24d9e06)
(cherry picked from commit 2dc4d7a)
Change I11746d1ea996a0f18b7c54b4c9c21df58cc4714b changed the
behavior of the API and conductor when rebuilding an instance
with a new image such that the image is run through the scheduler
filters again to see if it will work on the existing host that
the instance is running on.

As a result, conductor started passing 'scheduled_node' to the
compute which was using it for logic to tell if a claim should be
attempted. We don't need to do a claim for a rebuild since we're
on the same host.

This removes the scheduled_node logic from the claim code, as we
should only ever attempt a claim if we're evacuating, which we
can determine based on the 'recreate' parameter.

Conflicts:
      nova/compute/manager.py

NOTE(mriedem): The conflict is due to change
I0883c2ba1989c5d5a46e23bcbcda53598707bcbc in Queens.

Change-Id: I7fde8ce9dea16679e76b0cb2db1427aeeec0c222
Closes-Bug: #1750618
(cherry picked from commit a390290)
(cherry picked from commit 3c5e519)
under the [DEFAULT] section
------------------------------------------------------------
driver = nova.scheduler.turbonomic_scheduler.TurbonomicScheduler
scheduler_driver = nova.scheduler.turbonomic_scheduler.TurbonomicScheduler

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasnt changed in the Mitaka branch - Why is this showing up as a change?
This exists only on the Pike branch - Did you create the patch on the right branch?

2) Add turbonomic_driver to <Python 2.7>/site-packages/nova-16.1.0-py2.7.egg-info/entry_points.txt:
turbonomic_scheduler = nova.scheduler.turbonomic_scheduler:TurbonomicScheduler
2) scheduler_driver should be enabled across all regions, turbonomic_target_address must be equal to the address specified
by the customer while discovering the target, e.x. a target consists of RegionOne (X.X.X.10) and RegionTwo (X.X.X.11)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while discovering the target -> while discovering the target in Turbonomic

@viveknandavanam
Copy link

Can you also work on resolving the conflicts mentioned above?
We can work together as there are commits that were made for other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.