-
Notifications
You must be signed in to change notification settings - Fork 9
Update analysis.py #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: Eyan's_CHART_analysis_functions
Are you sure you want to change the base?
Update analysis.py #199
Conversation
adampbeardsley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @EyanWeissbluth . I added a handful of suggestions to improve it.
src/chart/analysis.py
Outdated
| :param lon: longitude | ||
| :param lat: latitude | ||
| :param ele: elevation | ||
| :param time: date and time | ||
| :param RA: Right Ascension | ||
| :param Dec: Declination |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work including a useful doc string! (As you can see with other functions, we haven't been great about doing that.)
Could you add a little more info about these, specifically what format they should be passed in? E.g., should lon be in degrees? Hours? should it be a float or a string? You should be able to get a hint from the docstring for the function you use it in. For example, lon is passed into EarthLocation.from_geodetic(), which hopefully specifies these things.
src/chart/analysis.py
Outdated
| f0_shifted = f0_shifted.to(u.GHz) | ||
| v = doppler(f0_shifted,f0) | ||
| v_adjustment = v.to(u.km/u.second) | ||
| print(v_adjustment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally with functions like these you don't want it printing stuff to the screen unless specified. I recommend having an optional keyword verbose which defaults to False, but if set to True, it will print it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think it is helpful to have it print out the v_adjustment by default. Similar to the bootcamp, I want to explain what the output actually means. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by explaining what the output means (I wasn't at the bootcamp). I think the difference here is packaged code vs a tutorial, and you're doing both. The code that goes here is the packaged code and should behave more like a library, which is generally quiet unless asked to speak. The reason for this is that a) print statements are very slow, so you don't want them in underlying code, and b) you want to avoid a mess of statements showing up to a user when they don't know what is being printed.
I think the solution is that for the packaged code, you include an option to print it, and in the tutorial you do explicitly print it, and talk about what it means. (Also, the doc string should include the description of what it is.)
src/chart/analysis.py
Outdated
|
|
||
| def LSR_shift(lon, lat, ele, time, RA, Dec): | ||
| """ | ||
| Identifies the exact postion at which the observations were taken and corrects for the Local Standard of Rest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this description is precisely what the function does. You give it the precise location (and pointing and time), and it finds v_adjustment, which I think is the motion of the observer in that direction relative to the LSR? The actually correction is done later when you subtract v_adjustment from your Doppler velocities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this?
"""
By providing the exact position, pointing, and time, the code below calculates prints out the adjusted velocity from the observation position to the Local Standard of Rest. Use notaion provided below.
:param lon: longitude - in degrees
:param lat: latitude - in degrees
:param ele: elevation - in meters
:param time: date and time - ex: '2023-06-19T08:00:00'
:param RA: Right Ascension - ex: '00h88m00.0s'
:param Dec: Declination - ex: '00d88m00.0s'
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks good.
src/chart/analysis.py
Outdated
| e = 2.71828 | ||
| fig = plt.figure() | ||
| ax = fig.add_subplot(1, 1, 1) | ||
| a1 = a2 = a3 = a4 = b1 = b2 = b3 = b4 = c1 = c2 = c3 = c4 = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of repeated stuff here. I bet we can simplify the code by using arrays and for loops. e.g.:
ncurves = 4
line = []
a = b = c = np.ones(ncurves)
model = a * (e ** -(((x - b)**2) / (2 * (c**2))))
for i, m in enumerate(model):
line.append(ax.plot(x, m))
line.append(ax.plot(x, np.sum(m, axis=0)))
I think I goofed up an indexing thing, but this is the general idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better?
def interactive_4_gauss_plot(unique_x):
"""
Creates an interactive plot that contains 4 separate Gaussian curves along with some of those curves.
param unique_x: x values of overlapping CHART data
"""
x = unique_x
e = 2.71828
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
a = b = c = [1]*4
lines = [ax.plot(x, (a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2))))))[0] for i in range(4)]
lines.append(ax.plot(x, sum([a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2)))) for i in range(4)]))[0])
sliders = [FloatSlider(min=-100, max=100, step=1, value=1) for _ in range(12)]
colors = ['black']*4 + ['red']
color_dropdowns = [Dropdown(options=['blue', 'green', 'red', 'cyan', 'magenta', 'yellow', 'black'], value=colors[i]) for i in range(5)]
def update(a1=1, b1=1, c1=1, a2=1, b2=1, c2=1, a3=1, b3=1, c3=1, a4=1, b4=1, c4=1,
color1='black', color2='black', color3='black', color4='black', color5='red'):
a = [a1,a2,a3,a4]
b = [b1,b2,b3,b4]
c = [c1,c2,c3,c4]
for i in range(4):
lines[i].set_ydata((a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2))))))
lines[4].set_ydata(sum([a[i]*(e ** -(((x-b[i])**2) / (2*(c[i]**2)))) for i in range(4)]))
for i in range(5):
lines[i].set_color([color1,color2,color3,color4,color5][i])
fig.canvas.draw_idle()
interact(update,
a1=sliders[0], b1=sliders[1], c1=sliders[2],
a2=sliders[3], b2=sliders[4], c2=sliders[5],
a3=sliders[6], b3=sliders[7], c3=sliders[8],
a4=sliders[9], b4=sliders[10], c4=sliders[11],
color1=color_dropdowns[0], color2=color_dropdowns[1],
color3=color_dropdowns[2], color4=color_dropdowns[3],
color5=color_dropdowns[4])
return ax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better. You can finish the job by just having a, b, c, and colors be the parameters in the update function - no need to pass them in one-by-one only to stuff them together again. You could even be clever and just pass in sliders directly and reference the relevant parts. For example, the first for loop in update might look like:
for i in range(4):
lines[i].set_ydata((sliders[i]*(e**-(((x-sliders[4+i])**2) / (2*(sliders[8+i]**2))))))
Not sure that would be more readable though... hmm...
| return unique_x, avg_y | ||
|
|
||
|
|
||
| def interactive_plot(unique_x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we name this something more specific? It's not just creating an arbitrary interactive plot, it's making a 4-gaussian model plot.
| return ax | ||
|
|
||
|
|
||
| def goodness_of_fit(unique_x, combined_gauss, avg_y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this is how you used it, the function itself doesn't need to only apply to overlapping chart data, combined gaussians, and averaged y-data. Really this is any version of x, model, and ydata.
src/chart/analysis.py
Outdated
| data = np.concatenate((x_data.reshape(-1,1), y_data.reshape(-1,1)), axis=1) | ||
|
|
||
|
|
||
| mask = (gauss[:,0] >= -100) & (gauss[:,0] <= 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this doing?
src/chart/analysis.py
Outdated
| gauss_masked = gauss[mask] | ||
| data_masked = data[mask] | ||
|
|
||
| chi_squared_statistic_x = np.sum((gauss_masked[:,0] - data_masked[:,0])**2 / data_masked[:,0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not always 0? gauss_masked[:, 0] is the masked unique_x, which is the same as data_masked[:, 0], I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it doesn't look like it's used beyond calculating p_value_x, which isn't used again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Chart data outside of the interval [-100,100] diverges which impacts the goodness of fit calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you responded to the wrong comment here. Maybe add a comment to the code so it's more clear why you're masking outside [-100, 100]
src/chart/analysis.py
Outdated
| data_masked = data[mask] | ||
|
|
||
| chi_squared_statistic_x = np.sum((gauss_masked[:,0] - data_masked[:,0])**2 / data_masked[:,0]) | ||
| chi_squared_statistic_y = np.sum((y_gauss - y_data)**2 / y_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test you're doing isn't quite right. This is the Pearson chi-squared test, used for categorical data (e.g. how many red jelly beans are in the bag?) Instead, we want to look at the reduced chi-squared statistic:
https://en.wikipedia.org/wiki/Reduced_chi-squared_statistic
Notice the denominator here has the noise, not the expected value. I think with enough statistics you can show the Pearson's version is a special case of the the more general one I linked above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this closer to what you were looking for?
x_observed = np.array(unique_x)
y_observed = np.array(combined_gauss)
x_expected = np.array(unique_x) * 2
y_expected = np.array(avg_y) * 2
observed = np.concatenate((x_observed.reshape(-1,1), y_observed.reshape(-1,1)), axis=1)
expected = np.concatenate((x_expected.reshape(-1,1), y_expected.reshape(-1,1)), axis=1)
mask = (observed[:,0] >= -100) & (observed[:,0] <= 100)
observed_masked = observed[mask]
expected_masked = expected[mask]
chi_squared_statistic_x = np.sum((observed_masked[:,0] - expected_masked[:,0])**2 / expected_masked[:,0])
chi_squared_statistic_y = np.sum((y_observed - y_expected)**2 / y_expected)
p_value_x = chi2.sf(chi_squared_statistic_x, len(observed_masked[:,0]) - 1)
p_value_y = chi2.sf(chi_squared_statistic_y, len(y_observed) - 1)
chi_squared_statistic_xy = chi_squared_statistic_x + chi_squared_statistic_y
degrees_of_freedom_xy = len(observed) - 2
reduced_chi_squared_statistic_xy = chi_squared_statistic_xy / degrees_of_freedom_xy
p_value_xy = chi2.sf(reduced_chi_squared_statistic_xy, degrees_of_freedom_xy)
print("Reduced Chi-squared Statistic for xy: ", reduced_chi_squared_statistic_xy)
print("P-value for xy: ", p_value_xy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially. Could you commit your changes to the actual code so we can comment on individual lines?
src/chart/analysis.py
Outdated
| p_value_x = chi2.sf(chi_squared_statistic_x, len(gauss_masked[:,0]) - 1) | ||
| p_value_y = chi2.sf(chi_squared_statistic_y, len(y_gauss) - 1) | ||
|
|
||
| print("P-value for y: ", p_value_y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like before, let's add a verbose option, otherwise return a value to the user.
src/chart/analysis.py
Outdated
| chi_squared_statistic_x = np.sum((gauss_masked[:,0] - data_masked[:,0])**2 / data_masked[:,0]) | ||
| chi_squared_statistic_y = np.sum((y_gauss - y_data)**2 / y_data) | ||
|
|
||
| p_value_x = chi2.sf(chi_squared_statistic_x, len(gauss_masked[:,0]) - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed one thing... chi2 isn't defined in this file. Was probably imported in your notebook, but you need to import it in the context here (if you end up using it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from scipy.stats import chi2 is needed
Please add comments to this.
Please add comments to this one.
Added some of the functions I created for the analysis.