Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
29 changes: 29 additions & 0 deletions ideas/MiniProject-Idea1.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<!DOCTYPE html><html><head><meta charset="utf-8"><title>MiniProject-Idea1</title><style></style></head><body id="preview">
<h1 class="code-line" data-line-start=0 data-line-end=1 ><a id="What_health_searches_are_gaining_attention_in_each_state_according_to_Google_Trends_How_does_this_interest_associate_with_statelevel_obesity_rate_0"></a>What health searches are gaining attention in each state, according to Google Trends? How does this interest associate with state-level obesity rate?</h1>
<h3 class="code-line" data-line-start=1 data-line-end=2 ><a id="_by_Tu_Tong__1"></a><em>by Tu Tong</em></h3>
<h2 class="code-line" data-line-start=3 data-line-end=4 ><a id="Data_Sources_3"></a>Data Sources</h2>
<p class="has-line-data" data-line-start="5" data-line-end="6">For this mini project, I plan to use Google Trends (<a href="https://trends.google.com/trends">https://trends.google.com/trends</a>) search engine to retrieve time series data of different terms by different states in the U.S. Specifically, I would use relevant search terms (list not finalized) to collect data:</p>
<ul>
<li class="has-line-data" data-line-start="7" data-line-end="9">
<p class="has-line-data" data-line-start="7" data-line-end="8">Diet: “Cheap food&quot;, “organic food&quot;, “supplements&quot;, “healthy foods”, “nutrition”, “calorie&quot;, “vegetable&quot;, “low carb”, “protein.&quot;</p>
</li>
<li class="has-line-data" data-line-start="9" data-line-end="11">
<p class="has-line-data" data-line-start="9" data-line-end="10">Behavioral: “gym near me&quot;, “weight loss&quot;, “exercises at home&quot;, “meal prep.”</p>
</li>
</ul>
<p class="has-line-data" data-line-start="11" data-line-end="12">For the obesity rate, I am thinking of either using Prevalence of Obesity Based on Self- Reported Weight and Height by State, which is provided in csv file by <a href="https://www.cdc.gov/obesity/data-and-statistics/adult-obesity-prevalence-maps.html">CDC Adult Obesity Prevalence Maps</a>, or simply the obesity rate that is available on <a href="https://www.statista.com/statistics/378988/us-obesity-rate-by-state/#:~:text=Data%20for%20obesity%20rates%20by%20state%20show,mostly%20found%20in%20the%20Southern%20United%20States.">Statista</a>. The two sources here are both from 2024.</p>
<p class="has-line-data" data-line-start="13" data-line-end="14">After exploratory data analysis, I would use an ML model to predict obesity rates using people’s interest in health. My features would be the frequency of those searches by state, and the obesity relevance (obesity rate) is the variable I want to predict.</p>
<h2 class="code-line" data-line-start=15 data-line-end=16 ><a id="Data_Retrieval_15"></a>Data Retrieval</h2>
<p class="has-line-data" data-line-start="16" data-line-end="17">At the moment, I am finalizing what keywords would be relevant to my topic before retrieving the data. I am consider to look at the data for multiple years.</p>
<h2 class="code-line" data-line-start=18 data-line-end=19 ><a id="Model_18"></a>Model</h2>
<p class="has-line-data" data-line-start="20" data-line-end="21">I am considering using either Random Forest or XGBoost to predict obesity based on the frequencies of the search terms mentioned above.</p>
<h2 class="code-line" data-line-start=24 data-line-end=25 ><a id="Implications_for_stakeholders_24"></a>Implications for stakeholders</h2>
<p class="has-line-data" data-line-start="26" data-line-end="27">Results from this project will help people who work in healthcare (CDC, health policymakers) understand more about how health interest relates to outcomes. This would help understand what health trends are gaining popularity within each state. Additionally, we can see if awareness/trends can impact health behaviors and outcomes.</p>
<h2 class="code-line" data-line-start=30 data-line-end=31 ><a id="Ethical_legal_societal_implications_30"></a>Ethical, legal, societal implications</h2>
<p class="has-line-data" data-line-start="32" data-line-end="33">What contributes to health outcome improvements is really hard to decide, as health is influenced by social and economic context. While studies in this area often focus on external factors, this project concentrates on the expressed interest and curiosity one has in terms of health. This project uses machine learning to help with prediction and to determine if social interests in health from people can signify outcomes or not. The result would reveal relevancy and correlation, instead of causal inference. Google Trends data is aggregated data; therefore, the result from this project does not indicate individual behaviors or responsibility, instead, it provides a look into social trends related to health.</p>
<hr>
<h2 class="code-line" data-line-start=35 data-line-end=36 ><a id="License_35"></a>License</h2>
<p class="has-line-data" data-line-start="37" data-line-end="38">MIT</p>
<p class="has-line-data" data-line-start="39" data-line-end="40"><strong>Free Software, Hell Yeah!</strong></p>

</body></html>
63 changes: 63 additions & 0 deletions ideas/MiniProject-Idea1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# What health searches are gaining attention in each state, according to Google Trends? How does this interest associate with state-level obesity rate?
### _by Tu Tong_

## Data Sources

For this mini project, I plan to use Google Trends (https://trends.google.com/trends) search engine to retrieve time series data of different terms by different states in the U.S. Specifically, I would use relevant search terms (list not finalized) to collect data:

- Diet: “Cheap food", “organic food", “supplements", “healthy foods”, “nutrition”, “calorie", “vegetable", “low carb”, “protein."

- Behavioral: “gym near me", “weight loss", “exercises at home", “meal prep.”

For the obesity rate, I am thinking of either using Prevalence of Obesity Based on Self- Reported Weight and Height by State, which is provided in csv file by [CDC Adult Obesity Prevalence Maps](https://www.cdc.gov/obesity/data-and-statistics/adult-obesity-prevalence-maps.html), or simply the obesity rate that is available on [Statista](https://www.statista.com/statistics/378988/us-obesity-rate-by-state/#:~:text=Data%20for%20obesity%20rates%20by%20state%20show,mostly%20found%20in%20the%20Southern%20United%20States.). The two sources here are both from 2024.

After exploratory data analysis, I would use an ML model to predict obesity rates using people's interest in health. My features would be the frequency of those searches by state, and the obesity relevance (obesity rate) is the variable I want to predict.

## Data Retrieval
At the moment, I am finalizing what keywords would be relevant to my topic before retrieving the data. I am consider to look at the data for multiple years.

## Model

I am considering using either Random Forest or XGBoost to predict obesity based on the frequencies of the search terms mentioned above.



## Implications for stakeholders

Results from this project will help people who work in healthcare (CDC, health policymakers) understand more about how health interest relates to outcomes. This would help understand what health trends are gaining popularity within each state. Additionally, we can see if awareness/trends can impact health behaviors and outcomes.



## Ethical, legal, societal implications

What contributes to health outcome improvements is really hard to decide, as health is influenced by social and economic context. While studies in this area often focus on external factors, this project concentrates on the expressed interest and curiosity one has in terms of health. This project uses machine learning to help with prediction and to determine if social interests in health from people can signify outcomes or not. The result would reveal relevancy and correlation, instead of causal inference. Google Trends data is aggregated data; therefore, the result from this project does not indicate individual behaviors or responsibility, instead, it provides a look into social trends related to health.

----
## License

MIT

**Free Software, Hell Yeah!**

[//]: # (These are reference links used in the body of this note and get stripped out when the markdown processor does its job. There is no need to format nicely because it shouldn't be seen. Thanks SO - http://stackoverflow.com/questions/4823468/store-comments-in-markdown-syntax)

[dill]: <https://github.com/joemccann/dillinger>
[git-repo-url]: <https://github.com/joemccann/dillinger.git>
[john gruber]: <http://daringfireball.net>
[df1]: <http://daringfireball.net/projects/markdown/>
[markdown-it]: <https://github.com/markdown-it/markdown-it>
[Ace Editor]: <http://ace.ajax.org>
[node.js]: <http://nodejs.org>
[Twitter Bootstrap]: <http://twitter.github.com/bootstrap/>
[jQuery]: <http://jquery.com>
[@tjholowaychuk]: <http://twitter.com/tjholowaychuk>
[express]: <http://expressjs.com>
[AngularJS]: <http://angularjs.org>
[Gulp]: <http://gulpjs.com>

[PlDb]: <https://github.com/joemccann/dillinger/tree/master/plugins/dropbox/README.md>
[PlGh]: <https://github.com/joemccann/dillinger/tree/master/plugins/github/README.md>
[PlGd]: <https://github.com/joemccann/dillinger/tree/master/plugins/googledrive/README.md>
[PlOd]: <https://github.com/joemccann/dillinger/tree/master/plugins/onedrive/README.md>
[PlMe]: <https://github.com/joemccann/dillinger/tree/master/plugins/medium/README.md>
[PlGa]: <https://github.com/RahulHP/dillinger/blob/master/plugins/googleanalytics/README.md>
12 changes: 12 additions & 0 deletions presentations/libs/header-attrs-2.23/header-attrs.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// Pandoc 2.9 adds attributes on both header and div. We remove the former (to
// be compatible with the behavior of Pandoc < 2.8).
document.addEventListener('DOMContentLoaded', function(e) {
var hs = document.querySelectorAll("div.section[class*='level'] > :first-child");
var i, h, a;
for (i = 0; i < hs.length; i++) {
h = hs[i];
if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6
a = h.attributes;
while (a.length > 0) h.removeAttribute(a[0].name);
}
});
12 changes: 12 additions & 0 deletions presentations/libs/header-attrs/header-attrs.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// Pandoc 2.9 adds attributes on both header and div. We remove the former (to
// be compatible with the behavior of Pandoc < 2.8).
document.addEventListener('DOMContentLoaded', function(e) {
var hs = document.querySelectorAll("div.section[class*='level'] > :first-child");
var i, h, a;
for (i = 0; i < hs.length; i++) {
h = hs[i];
if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6
a = h.attributes;
while (a.length > 0) h.removeAttribute(a[0].name);
}
});
10 changes: 10 additions & 0 deletions presentations/libs/remark-css-0.0.1/default-fonts.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
@import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
@import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic);
@import url(https://fonts.googleapis.com/css?family=Source+Code+Pro:400,700);

body { font-family: 'Droid Serif', 'Palatino Linotype', 'Book Antiqua', Palatino, 'Microsoft YaHei', 'Songti SC', serif; }
h1, h2, h3 {
font-family: 'Yanone Kaffeesatz';
font-weight: normal;
}
.remark-code, .remark-inline-code { font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace; }
72 changes: 72 additions & 0 deletions presentations/libs/remark-css-0.0.1/default.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
a, a > code {
color: rgb(249, 38, 114);
text-decoration: none;
}
.footnote {
position: absolute;
bottom: 3em;
padding-right: 4em;
font-size: 90%;
}
.remark-code-line-highlighted { background-color: #ffff88; }

.inverse {
background-color: #272822;
color: #d6d6d6;
text-shadow: 0 0 20px #333;
}
.inverse h1, .inverse h2, .inverse h3 {
color: #f3f3f3;
}
/* Two-column layout */
.left-column {
color: #777;
width: 20%;
height: 92%;
float: left;
}
.left-column h2:last-of-type, .left-column h3:last-child {
color: #000;
}
.right-column {
width: 75%;
float: right;
padding-top: 1em;
}
.pull-left {
float: left;
width: 47%;
}
.pull-right {
float: right;
width: 47%;
}
.pull-right + * {
clear: both;
}
img, video, iframe {
max-width: 100%;
}
blockquote {
border-left: solid 5px lightgray;
padding-left: 1em;
}
.remark-slide table {
margin: auto;
border-top: 1px solid #666;
border-bottom: 1px solid #666;
}
.remark-slide table thead th { border-bottom: 1px solid #ddd; }
th, td { padding: 5px; }
.remark-slide thead, .remark-slide tfoot, .remark-slide tr:nth-child(even) { background: #eee }

@page { margin: 0; }
@media print {
.remark-slide-scaler {
width: 100% !important;
height: 100% !important;
transform: scale(1) !important;
top: 0 !important;
left: 0 !important;
}
}
10 changes: 10 additions & 0 deletions presentations/libs/remark-css/default-fonts.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
@import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
@import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic);
@import url(https://fonts.googleapis.com/css?family=Source+Code+Pro:400,700);

body { font-family: 'Droid Serif', 'Palatino Linotype', 'Book Antiqua', Palatino, 'Microsoft YaHei', 'Songti SC', serif; }
h1, h2, h3 {
font-family: 'Yanone Kaffeesatz';
font-weight: normal;
}
.remark-code, .remark-inline-code { font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace; }
72 changes: 72 additions & 0 deletions presentations/libs/remark-css/default.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
a, a > code {
color: rgb(249, 38, 114);
text-decoration: none;
}
.footnote {
position: absolute;
bottom: 3em;
padding-right: 4em;
font-size: 90%;
}
.remark-code-line-highlighted { background-color: #ffff88; }

.inverse {
background-color: #272822;
color: #d6d6d6;
text-shadow: 0 0 20px #333;
}
.inverse h1, .inverse h2, .inverse h3 {
color: #f3f3f3;
}
/* Two-column layout */
.left-column {
color: #777;
width: 20%;
height: 92%;
float: left;
}
.left-column h2:last-of-type, .left-column h3:last-child {
color: #000;
}
.right-column {
width: 75%;
float: right;
padding-top: 1em;
}
.pull-left {
float: left;
width: 47%;
}
.pull-right {
float: right;
width: 47%;
}
.pull-right + * {
clear: both;
}
img, video, iframe {
max-width: 100%;
}
blockquote {
border-left: solid 5px lightgray;
padding-left: 1em;
}
.remark-slide table {
margin: auto;
border-top: 1px solid #666;
border-bottom: 1px solid #666;
}
.remark-slide table thead th { border-bottom: 1px solid #ddd; }
th, td { padding: 5px; }
.remark-slide thead, .remark-slide tfoot, .remark-slide tr:nth-child(even) { background: #eee }

@page { margin: 0; }
@media print {
.remark-slide-scaler {
width: 100% !important;
height: 100% !important;
transform: scale(1) !important;
top: 0 !important;
left: 0 !important;
}
}
18 changes: 18 additions & 0 deletions presentations/test-xaringan.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: "Test slide"
subtitle: "⚔<br/>with xaringan"
author: "TU"
institute: "Dickinson College"
date: "2026/01/23 (updated: `r Sys.Date()`)"
output:
xaringan::moon_reader:
lib_dir: libs
nature:
highlightStyle: github
highlightLines: true
countIncrementalSlides: false

---
class: center, middle

# This is Tu's test slide
Loading