Skip to content

Conversation

@EyanWeissbluth
Copy link
Contributor

Added some of the functions I created for the analysis.

Copy link
Collaborator

@adampbeardsley adampbeardsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @EyanWeissbluth . I added a handful of suggestions to improve it.

Comment on lines 75 to 80
:param lon: longitude
:param lat: latitude
:param ele: elevation
:param time: date and time
:param RA: Right Ascension
:param Dec: Declination
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work including a useful doc string! (As you can see with other functions, we haven't been great about doing that.)
Could you add a little more info about these, specifically what format they should be passed in? E.g., should lon be in degrees? Hours? should it be a float or a string? You should be able to get a hint from the docstring for the function you use it in. For example, lon is passed into EarthLocation.from_geodetic(), which hopefully specifies these things.

f0_shifted = f0_shifted.to(u.GHz)
v = doppler(f0_shifted,f0)
v_adjustment = v.to(u.km/u.second)
print(v_adjustment)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally with functions like these you don't want it printing stuff to the screen unless specified. I recommend having an optional keyword verbose which defaults to False, but if set to True, it will print it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it is helpful to have it print out the v_adjustment by default. Similar to the bootcamp, I want to explain what the output actually means. What are your thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by explaining what the output means (I wasn't at the bootcamp). I think the difference here is packaged code vs a tutorial, and you're doing both. The code that goes here is the packaged code and should behave more like a library, which is generally quiet unless asked to speak. The reason for this is that a) print statements are very slow, so you don't want them in underlying code, and b) you want to avoid a mess of statements showing up to a user when they don't know what is being printed.

I think the solution is that for the packaged code, you include an option to print it, and in the tutorial you do explicitly print it, and talk about what it means. (Also, the doc string should include the description of what it is.)


def LSR_shift(lon, lat, ele, time, RA, Dec):
"""
Identifies the exact postion at which the observations were taken and corrects for the Local Standard of Rest.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this description is precisely what the function does. You give it the precise location (and pointing and time), and it finds v_adjustment, which I think is the motion of the observer in that direction relative to the LSR? The actually correction is done later when you subtract v_adjustment from your Doppler velocities.

Copy link
Contributor Author

@EyanWeissbluth EyanWeissbluth Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

 """
    By providing the exact position, pointing, and time, the code below calculates prints out the adjusted velocity from the observation position to the Local Standard of Rest. Use notaion provided below. 
    
    :param lon: longitude - in degrees
    :param lat: latitude - in degrees
    :param ele: elevation - in meters
    :param time: date and time - ex: '2023-06-19T08:00:00'
    :param RA: Right Ascension - ex: '00h88m00.0s'
    :param Dec: Declination - ex: '00d88m00.0s'
    """

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks good.

e = 2.71828
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
a1 = a2 = a3 = a4 = b1 = b2 = b3 = b4 = c1 = c2 = c3 = c4 = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of repeated stuff here. I bet we can simplify the code by using arrays and for loops. e.g.:

ncurves = 4
line = []
a = b = c = np.ones(ncurves)
model = a * (e ** -(((x - b)**2) / (2 * (c**2))))
for i, m in enumerate(model):
    line.append(ax.plot(x, m))
line.append(ax.plot(x, np.sum(m, axis=0)))

I think I goofed up an indexing thing, but this is the general idea.

Copy link
Contributor Author

@EyanWeissbluth EyanWeissbluth Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better?

def interactive_4_gauss_plot(unique_x):
    """
    Creates an interactive plot that contains 4 separate Gaussian curves along with some of those curves.
    
    param unique_x: x values of overlapping CHART data
    """
    x = unique_x
    e = 2.71828  
    fig = plt.figure()
    ax = fig.add_subplot(1, 1, 1)
    a = b = c = [1]*4

    lines = [ax.plot(x, (a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2))))))[0] for i in range(4)]
    lines.append(ax.plot(x, sum([a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2)))) for i in range(4)]))[0])

    sliders = [FloatSlider(min=-100, max=100, step=1, value=1) for _ in range(12)]
    colors = ['black']*4 + ['red']
    color_dropdowns = [Dropdown(options=['blue', 'green', 'red', 'cyan', 'magenta', 'yellow', 'black'], value=colors[i]) for i in range(5)]

    def update(a1=1, b1=1, c1=1, a2=1, b2=1, c2=1, a3=1, b3=1, c3=1, a4=1, b4=1, c4=1,
               color1='black', color2='black', color3='black', color4='black', color5='red'):
        a = [a1,a2,a3,a4]
        b = [b1,b2,b3,b4]
        c = [c1,c2,c3,c4]
        for i in range(4):
            lines[i].set_ydata((a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2))))))
        lines[4].set_ydata(sum([a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2)))) for i in range(4)]))
        for i in range(5):
            lines[i].set_color([color1,color2,color3,color4,color5][i])
        fig.canvas.draw_idle()

    interact(update,
             a1=sliders[0], b1=sliders[1], c1=sliders[2],
             a2=sliders[3], b2=sliders[4], c2=sliders[5],
             a3=sliders[6], b3=sliders[7], c3=sliders[8],
             a4=sliders[9], b4=sliders[10], c4=sliders[11],
             color1=color_dropdowns[0], color2=color_dropdowns[1],
             color3=color_dropdowns[2], color4=color_dropdowns[3],
             color5=color_dropdowns[4])
     
    return ax

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better. You can finish the job by just having a, b, c, and colors be the parameters in the update function - no need to pass them in one-by-one only to stuff them together again. You could even be clever and just pass in sliders directly and reference the relevant parts. For example, the first for loop in update might look like:

for i in range(4):
    lines[i].set_ydata((sliders[i]*(e**-(((x-sliders[4+i])**2) / (2*(sliders[8+i]**2))))))

Not sure that would be more readable though... hmm...

return unique_x, avg_y


def interactive_plot(unique_x):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we name this something more specific? It's not just creating an arbitrary interactive plot, it's making a 4-gaussian model plot.

return ax


def goodness_of_fit(unique_x, combined_gauss, avg_y):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is how you used it, the function itself doesn't need to only apply to overlapping chart data, combined gaussians, and averaged y-data. Really this is any version of x, model, and ydata.

data = np.concatenate((x_data.reshape(-1,1), y_data.reshape(-1,1)), axis=1)


mask = (gauss[:,0] >= -100) & (gauss[:,0] <= 100)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this doing?

gauss_masked = gauss[mask]
data_masked = data[mask]

chi_squared_statistic_x = np.sum((gauss_masked[:,0] - data_masked[:,0])**2 / data_masked[:,0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not always 0? gauss_masked[:, 0] is the masked unique_x, which is the same as data_masked[:, 0], I think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it doesn't look like it's used beyond calculating p_value_x, which isn't used again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Chart data outside of the interval [-100,100] diverges which impacts the goodness of fit calculation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you responded to the wrong comment here. Maybe add a comment to the code so it's more clear why you're masking outside [-100, 100]

data_masked = data[mask]

chi_squared_statistic_x = np.sum((gauss_masked[:,0] - data_masked[:,0])**2 / data_masked[:,0])
chi_squared_statistic_y = np.sum((y_gauss - y_data)**2 / y_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test you're doing isn't quite right. This is the Pearson chi-squared test, used for categorical data (e.g. how many red jelly beans are in the bag?) Instead, we want to look at the reduced chi-squared statistic:
https://en.wikipedia.org/wiki/Reduced_chi-squared_statistic
Notice the denominator here has the noise, not the expected value. I think with enough statistics you can show the Pearson's version is a special case of the the more general one I linked above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this closer to what you were looking for?

x_observed = np.array(unique_x)
y_observed = np.array(combined_gauss)
x_expected = np.array(unique_x) * 2
y_expected = np.array(avg_y) * 2

observed = np.concatenate((x_observed.reshape(-1,1), y_observed.reshape(-1,1)), axis=1)
expected = np.concatenate((x_expected.reshape(-1,1), y_expected.reshape(-1,1)), axis=1)

mask = (observed[:,0] >= -100) & (observed[:,0] <= 100)
observed_masked = observed[mask]
expected_masked = expected[mask]

chi_squared_statistic_x = np.sum((observed_masked[:,0] - expected_masked[:,0])**2 / expected_masked[:,0])
chi_squared_statistic_y = np.sum((y_observed - y_expected)**2 / y_expected)

p_value_x = chi2.sf(chi_squared_statistic_x, len(observed_masked[:,0]) - 1)
p_value_y = chi2.sf(chi_squared_statistic_y, len(y_observed) - 1)

chi_squared_statistic_xy = chi_squared_statistic_x + chi_squared_statistic_y
degrees_of_freedom_xy = len(observed) - 2
reduced_chi_squared_statistic_xy = chi_squared_statistic_xy / degrees_of_freedom_xy
p_value_xy = chi2.sf(reduced_chi_squared_statistic_xy, degrees_of_freedom_xy)

print("Reduced Chi-squared Statistic for xy: ", reduced_chi_squared_statistic_xy)
print("P-value for xy: ", p_value_xy)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially. Could you commit your changes to the actual code so we can comment on individual lines?

p_value_x = chi2.sf(chi_squared_statistic_x, len(gauss_masked[:,0]) - 1)
p_value_y = chi2.sf(chi_squared_statistic_y, len(y_gauss) - 1)

print("P-value for y: ", p_value_y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like before, let's add a verbose option, otherwise return a value to the user.

chi_squared_statistic_x = np.sum((gauss_masked[:,0] - data_masked[:,0])**2 / data_masked[:,0])
chi_squared_statistic_y = np.sum((y_gauss - y_data)**2 / y_data)

p_value_x = chi2.sf(chi_squared_statistic_x, len(gauss_masked[:,0]) - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed one thing... chi2 isn't defined in this file. Was probably imported in your notebook, but you need to import it in the context here (if you end up using it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from scipy.stats import chi2 is needed

Please add comments to this.
Please add comments to this one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants