Update analysis.py #199

EyanWeissbluth · 2023-07-20T05:16:10Z

Added some of the functions I created for the analysis.

adampbeardsley

Nice work @EyanWeissbluth . I added a handful of suggestions to improve it.

adampbeardsley · 2023-07-20T14:53:46Z

src/chart/analysis.py

+    :param lon: longitude 
+    :param lat: latitude
+    :param ele: elevation
+    :param time: date and time
+    :param RA: Right Ascension
+    :param Dec: Declination


Nice work including a useful doc string! (As you can see with other functions, we haven't been great about doing that.)
Could you add a little more info about these, specifically what format they should be passed in? E.g., should lon be in degrees? Hours? should it be a float or a string? You should be able to get a hint from the docstring for the function you use it in. For example, lon is passed into EarthLocation.from_geodetic(), which hopefully specifies these things.

adampbeardsley · 2023-07-20T14:55:32Z

src/chart/analysis.py

+    f0_shifted = f0_shifted.to(u.GHz)
+    v = doppler(f0_shifted,f0)
+    v_adjustment = v.to(u.km/u.second)
+    print(v_adjustment)


Generally with functions like these you don't want it printing stuff to the screen unless specified. I recommend having an optional keyword verbose which defaults to False, but if set to True, it will print it.

I do think it is helpful to have it print out the v_adjustment by default. Similar to the bootcamp, I want to explain what the output actually means. What are your thoughts?

I'm not sure what you mean by explaining what the output means (I wasn't at the bootcamp). I think the difference here is packaged code vs a tutorial, and you're doing both. The code that goes here is the packaged code and should behave more like a library, which is generally quiet unless asked to speak. The reason for this is that a) print statements are very slow, so you don't want them in underlying code, and b) you want to avoid a mess of statements showing up to a user when they don't know what is being printed.

I think the solution is that for the packaged code, you include an option to print it, and in the tutorial you do explicitly print it, and talk about what it means. (Also, the doc string should include the description of what it is.)

adampbeardsley · 2023-07-20T14:57:03Z

src/chart/analysis.py

+
+def LSR_shift(lon, lat, ele, time, RA, Dec):
+    """
+    Identifies the exact postion at which the observations were taken and corrects for the Local Standard of Rest.


I'm not sure this description is precisely what the function does. You give it the precise location (and pointing and time), and it finds v_adjustment, which I think is the motion of the observer in that direction relative to the LSR? The actually correction is done later when you subtract v_adjustment from your Doppler velocities.

What about this?

""" By providing the exact position, pointing, and time, the code below calculates prints out the adjusted velocity from the observation position to the Local Standard of Rest. Use notaion provided below. :param lon: longitude - in degrees :param lat: latitude - in degrees :param ele: elevation - in meters :param time: date and time - ex: '2023-06-19T08:00:00' :param RA: Right Ascension - ex: '00h88m00.0s' :param Dec: Declination - ex: '00d88m00.0s' """

Yes, that looks good.

adampbeardsley · 2023-07-20T15:08:22Z

src/chart/analysis.py

+    e = 2.71828  
+    fig = plt.figure()
+    ax = fig.add_subplot(1, 1, 1)
+    a1 = a2 = a3 = a4 = b1 = b2 = b3 = b4 = c1 = c2 = c3 = c4 = 1


There's a lot of repeated stuff here. I bet we can simplify the code by using arrays and for loops. e.g.:

ncurves = 4 line = [] a = b = c = np.ones(ncurves) model = a * (e ** -(((x - b)**2) / (2 * (c**2)))) for i, m in enumerate(model): line.append(ax.plot(x, m)) line.append(ax.plot(x, np.sum(m, axis=0)))

I think I goofed up an indexing thing, but this is the general idea.

Would this be better?

def interactive_4_gauss_plot(unique_x): """ Creates an interactive plot that contains 4 separate Gaussian curves along with some of those curves. param unique_x: x values of overlapping CHART data """ x = unique_x e = 2.71828 fig = plt.figure() ax = fig.add_subplot(1, 1, 1) a = b = c = [1]*4 lines = [ax.plot(x, (a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2))))))[0] for i in range(4)] lines.append(ax.plot(x, sum([a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2)))) for i in range(4)]))[0]) sliders = [FloatSlider(min=-100, max=100, step=1, value=1) for _ in range(12)] colors = ['black']*4 + ['red'] color_dropdowns = [Dropdown(options=['blue', 'green', 'red', 'cyan', 'magenta', 'yellow', 'black'], value=colors[i]) for i in range(5)] def update(a1=1, b1=1, c1=1, a2=1, b2=1, c2=1, a3=1, b3=1, c3=1, a4=1, b4=1, c4=1, color1='black', color2='black', color3='black', color4='black', color5='red'): a = [a1,a2,a3,a4] b = [b1,b2,b3,b4] c = [c1,c2,c3,c4] for i in range(4): lines[i].set_ydata((a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2)))))) lines[4].set_ydata(sum([a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2)))) for i in range(4)])) for i in range(5): lines[i].set_color([color1,color2,color3,color4,color5][i]) fig.canvas.draw_idle() interact(update, a1=sliders[0], b1=sliders[1], c1=sliders[2], a2=sliders[3], b2=sliders[4], c2=sliders[5], a3=sliders[6], b3=sliders[7], c3=sliders[8], a4=sliders[9], b4=sliders[10], c4=sliders[11], color1=color_dropdowns[0], color2=color_dropdowns[1], color3=color_dropdowns[2], color4=color_dropdowns[3], color5=color_dropdowns[4]) return ax

Looks much better. You can finish the job by just having a, b, c, and colors be the parameters in the update function - no need to pass them in one-by-one only to stuff them together again. You could even be clever and just pass in sliders directly and reference the relevant parts. For example, the first for loop in update might look like:

for i in range(4): lines[i].set_ydata((sliders[i]*(e**-(((x-sliders[4+i])**2) / (2*(sliders[8+i]**2))))))

Not sure that would be more readable though... hmm...

adampbeardsley · 2023-07-20T15:09:22Z

src/chart/analysis.py

+    return unique_x, avg_y
+
+
+def interactive_plot(unique_x):


Could we name this something more specific? It's not just creating an arbitrary interactive plot, it's making a 4-gaussian model plot.

adampbeardsley · 2023-07-20T15:10:41Z

src/chart/analysis.py

+    return ax
+
+
+def goodness_of_fit(unique_x, combined_gauss, avg_y):


Even though this is how you used it, the function itself doesn't need to only apply to overlapping chart data, combined gaussians, and averaged y-data. Really this is any version of x, model, and ydata.

adampbeardsley · 2023-07-20T15:12:00Z

src/chart/analysis.py

+    data = np.concatenate((x_data.reshape(-1,1), y_data.reshape(-1,1)), axis=1)
+
+
+    mask = (gauss[:,0] >= -100) & (gauss[:,0] <= 100)


what is this doing?

adampbeardsley · 2023-07-20T15:14:11Z

src/chart/analysis.py

+    gauss_masked = gauss[mask]
+    data_masked = data[mask]
+
+    chi_squared_statistic_x = np.sum((gauss_masked[:,0] - data_masked[:,0])**2 / data_masked[:,0])


Is this not always 0? gauss_masked[:, 0] is the masked unique_x, which is the same as data_masked[:, 0], I think

Also, it doesn't look like it's used beyond calculating p_value_x, which isn't used again.

The Chart data outside of the interval [-100,100] diverges which impacts the goodness of fit calculation.

I think you responded to the wrong comment here. Maybe add a comment to the code so it's more clear why you're masking outside [-100, 100]

adampbeardsley · 2023-07-20T15:27:50Z

src/chart/analysis.py

+    data_masked = data[mask]
+
+    chi_squared_statistic_x = np.sum((gauss_masked[:,0] - data_masked[:,0])**2 / data_masked[:,0])
+    chi_squared_statistic_y = np.sum((y_gauss - y_data)**2 / y_data)


I think the test you're doing isn't quite right. This is the Pearson chi-squared test, used for categorical data (e.g. how many red jelly beans are in the bag?) Instead, we want to look at the reduced chi-squared statistic:
https://en.wikipedia.org/wiki/Reduced_chi-squared_statistic
Notice the denominator here has the noise, not the expected value. I think with enough statistics you can show the Pearson's version is a special case of the the more general one I linked above.

Is this closer to what you were looking for?

x_observed = np.array(unique_x) y_observed = np.array(combined_gauss) x_expected = np.array(unique_x) * 2 y_expected = np.array(avg_y) * 2 observed = np.concatenate((x_observed.reshape(-1,1), y_observed.reshape(-1,1)), axis=1) expected = np.concatenate((x_expected.reshape(-1,1), y_expected.reshape(-1,1)), axis=1) mask = (observed[:,0] >= -100) & (observed[:,0] <= 100) observed_masked = observed[mask] expected_masked = expected[mask] chi_squared_statistic_x = np.sum((observed_masked[:,0] - expected_masked[:,0])**2 / expected_masked[:,0]) chi_squared_statistic_y = np.sum((y_observed - y_expected)**2 / y_expected) p_value_x = chi2.sf(chi_squared_statistic_x, len(observed_masked[:,0]) - 1) p_value_y = chi2.sf(chi_squared_statistic_y, len(y_observed) - 1) chi_squared_statistic_xy = chi_squared_statistic_x + chi_squared_statistic_y degrees_of_freedom_xy = len(observed) - 2 reduced_chi_squared_statistic_xy = chi_squared_statistic_xy / degrees_of_freedom_xy p_value_xy = chi2.sf(reduced_chi_squared_statistic_xy, degrees_of_freedom_xy) print("Reduced Chi-squared Statistic for xy: ", reduced_chi_squared_statistic_xy) print("P-value for xy: ", p_value_xy)

Partially. Could you commit your changes to the actual code so we can comment on individual lines?

adampbeardsley · 2023-07-20T15:28:08Z

src/chart/analysis.py

+    p_value_x = chi2.sf(chi_squared_statistic_x, len(gauss_masked[:,0]) - 1)
+    p_value_y = chi2.sf(chi_squared_statistic_y, len(y_gauss) - 1)
+
+    print("P-value for y: ", p_value_y)


Like before, let's add a verbose option, otherwise return a value to the user.

adampbeardsley · 2023-07-20T15:29:37Z

src/chart/analysis.py

+    chi_squared_statistic_x = np.sum((gauss_masked[:,0] - data_masked[:,0])**2 / data_masked[:,0])
+    chi_squared_statistic_y = np.sum((y_gauss - y_data)**2 / y_data)
+
+    p_value_x = chi2.sf(chi_squared_statistic_x, len(gauss_masked[:,0]) - 1)


Missed one thing... chi2 isn't defined in this file. Was probably imported in your notebook, but you need to import it in the context here (if you end up using it)

from scipy.stats import chi2 is needed

Please add comments to this.

Please add comments to this one.

Update analysis.py

d04a665

adampbeardsley reviewed Jul 20, 2023

View reviewed changes

EyanWeissbluth requested review from adampbeardsley and lmberkhout July 20, 2023 21:51

EyanWeissbluth added 3 commits July 25, 2023 13:58

Update analysis.py

e4a9ffd

Update analysis.py

f8f9a78

Please add comments to this.

Update analysis.py

de9233e

Please add comments to this one.

		return ax


		def goodness_of_fit(unique_x, combined_gauss, avg_y):

		data = np.concatenate((x_data.reshape(-1,1), y_data.reshape(-1,1)), axis=1)


		mask = (gauss[:,0] >= -100) & (gauss[:,0] <= 100)

Update analysis.py #199

Are you sure you want to change the base?

Update analysis.py #199

Uh oh!

Conversation

EyanWeissbluth commented Jul 20, 2023

Uh oh!

adampbeardsley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EyanWeissbluth Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EyanWeissbluth Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EyanWeissbluth Jul 21, 2023 •

edited

Loading

EyanWeissbluth Jul 21, 2023 •

edited

Loading