Hi! I’m a PhD student in political geography with strong interests in geostatistics. However, neither my own nor neighboring universities explicitly offer any such courses. I have a strong empirical background—probability theory and statistical inference, causal inference, machine learning, etc., as well as various programming courses. Might anyone have an idea of what sort of courses might be helpful for more strongly learning geostatistics without any explicit such offerings? I figured perhaps stochastic and time series statistics and network theory would be helpful, but I’d appreciate any more insight. Thanks!
Hi all, I’m working with some temperature data, and wanted to see if anyone has recommendations for how to better deal with my situation. I am comparing a rare historical data set with only single mean/max/min values for each month to full records taken continuously over the past 10 years. Assume the data are directly comparable . We basically want to see if the most recent 10 years differ from the one historical year, with the hypothesis that the recent years would be warmer in most months .
Obviously having a single year’s worth of data isn’t ideal, but it is valuable for the system I’m studying as there are no comparable records elsewhere.
So far I have calculated the same metrics for the recent data , modeled these and put a 95% confidence interval around it, and visually looked at which historical data points are outside that interval. From this it appears some months are warmer in the recent years than in the historical year, but I can’t really say much beyond qualitative statements.
I’m not well versed in Bayes models, but is there a better way to compare these data and get some actual quantitative results? Or should I just stick to a more quantitative approach given the data’s limitations?
The data was pulled from patients records charts, sample size is over 20,000. The data has so many missing data to the point that some columns are all blank, other columns have 10 values. I thought there are many reasons for missing data, either because the population of our study patients are young so they don’t have all those diseases so the docs didn’t have to add it in their records, or the people who pulled the data forgotten something. But I have nothing to do now except that I ask the people who pulled the data.
In the mean while, what else should I do to handle categorical missing data? Majority are Y/N binary data
I’d like to model a dependent variable which a continuous value between 0 and 1 , see . It is not a count proportion, it’s the concentration of a drug needed to kill a disease.
I tried the following/my thoughts:
1. GLM with binomial family and logit link, i.e. logistic regression. The logit link to render my mean in the region makes sense, but the binomial random component does not, since my error is continuous and must be bouneded between 0 and 1.As you can see, the in-sample prediction on the train data looks quite different from the real label distribution. Also, there is a strange offset in my prediction.
2. Beta Regression also does not work, because I have 0 labels.
3. I tried different links, different regularizers , with intercept and without, but nothing seems to work.
Hence, I would be happy if someone could help me with the following questions:
1. Does it make sense to have a *compound* model, i.e. one model that predicts if the response is 0 or >0, and in case of >0 another model that predicts the label using a beta regression?
2. Logistic Regressions seems to be the go-to approach if my labels are binary. However, they are not in my case. How can I determine a proper family for my problem?
3. Are GLMs not useful at all for my problem?
Preface: It’s been a while since I’ve last needed to use data analysis and my memory has come up blank. I’m using SPSS for statistical analysis so any help that also uses this software would be much appreciated. Apologies if I have forgotten the correct lingo, again it’s been a while.
Context: I have a large set of mainly nominal data spread over a range of variables. I’ve performed a Chi squared test to check for significant variables then an ANOVA on each of those variables to determine which other variables interact significantly with them. I now want to find out which factors within each variable interact with each other significantly. How can I do this?
Hey everyone, I have a Cox survival regression model and due to non linear martingale residuals I had to add extra variables that are quadratic and cubic transformations of a variable already used in the model. How do I interpret such variable? As an example I have:
>exp exp lower .95 upper .95
>axil_nodes 1.332 0.7506 1.1861 1.4965
>I 0.986 1.0142 0.9782 0.9938
>I 1.000 0.9998 1.0001 1.0003
I get that normally, increasing ‘axil_nodes’ by one, would increase the chance of event by 1.332, but how do I interpret it, if I have transformations of this variable as other variables in the same model?
Hi, doing my psychology dissertation and it’s due so soon and I’m so stuck!!
Looking at predictors of theory of mind performance. IV/Predictor variables were: executive function , 5 personality traits, autistic traits and empathy. Was going to put in the IVs that are significantly correlated with my DV
4 IVs correlated with my DV but some of these are significantly correlated with eachother and some of my other IVs that did NOT correlate with my DV.
e.g. TOM significantly correlated with agreeableness, but agreeableness also significantly correlated with: empathy, EF, extraversion, autistic traits and neuroticism
Do I put all the IVs correlated with eachother into a regression so I am controlling for them? Problem is that would give me 8 predictors not including age or gender?
Hello everyone. I want to offer you a MakeSocialGrow social panel, which is engaged in promotion in social networks. Here you can buy services in different social networks, but today we will talk about the Reddit social module.
Main link: https://makesocialgrow.com/
Panel link: https://panel.makesocialgrow.com/
The following services are available to you:
1) Reddit upvotes, downvotes, subscriptions , price started from 0.15$ (0.10$ with discounts)
– Maximum quantity 2000 upvotes per post
– It is possible to choose the speed from slow to super fast
– It is possible to choose the age of the accounts that will complete the task, up to 1 year ago.
– It is possible to upvote different types of posts, for example, adult or onlyfans.
– It is possible to create an unlimited number of tasks
– It is possible to re-like the same post.
2) Reddit accounts , price started from 1$
– Filter by parameters such as karma and age.
– Old accounts are always available, which can be used for subreddits such as Cryptomoonshots
– Export in login password cookie format
– Guarantor for replacement or refund in case of a non-working account
3) Reddit links , price started from 1$
– Create posts on user pages
– It is possible to quickly index links
– It is possible to use your texts for links
– There is an option to choose the age of storage links.
– Links are posted for at least a month.
Everything works fully automatically 24/7. At the moment, the following forms of payment are available: Cryptocurrencies, Payeer, Advcash, Perfect Money, working on Visa and MasterCard.
Why is it profitable to work with us:
– We have a moneyback, you return the funds within a week if you did not like working with us
– We always perform 10-20% more than the amount in the task, so that the panel accurately completes the required number of upvotes.
– Money back guarantee for accounts or deleted links.
– We always make contact and value our customers.
Can I try your panel?
To get started, you need to create and activate an account. After that, write to us in telegram to add you a bonus of 50 points, this is enough for 50 upvotes or subscribers.
P.S. If you want to get an additional 50 points, leave a review in this thread and send us a screenshot with a link to the review in telegram.
#buyredditupvotes, #buyredditaccounts, #buyredditlinks, #redditbot, #buyredditbot
For my final year of undergrad, I’m debating between taking either a second sequence on mathematical analysis vs a graduate sequence in numerical analysis. Any suggestions on which I should choose?
I have already completed a two-course sequence on real analysis . Hence, I have a strong understanding of real analysis but not at the level of Rudin, which I hear is extremely helpful for graduate coursework in statistics.
On the other hand, I have also taken an undergrad course in numerical analysis . The first course in the graduate sequence of numerical analysis seems to have a lot of overlap with the undergrad course I took, although it goes in a bit more detail. The second course in the graduate sequence explores completely new topics and is much more advanced. This material would be very helpful in applied statistics, machine learning, and data science .
While it is possible for me to take both sequences in my final year, that would be *a lot* of work, seeing as the mathematical analysis class often demands 20+ hrs per week and the numerical analysis class is at the graduate level and thus moves really fast and has lots of assignments. Hence, I want to avoid taking them both together, if possible. One possibility is that I taken the first course from the mathematical analysis sequence in fall and then skip to the second course of the numerical analysis sequence in the spring seeing as the first course in the graduate numerical analysis sequence has a lot of overlap with the one I already took in undergrad. Although that would inevitably lead to some gaps in my knowledge as I skip to the second course. Plus, I would miss the second half of Rudin’s analysis which covers very valuable topics.
Any thoughts on this? I would appreciate any input.
P.s. I’m a math/stat and CS major aiming for a PhD in statistics focusing on applications .
Hey Reddit! I’ve been working with some NFL / NBA data for betting purposes, but I kept encountering the same problem in a few different contexts: getting as granular as I wanted to get massively reduced the sample size at each point, introducing a lot of noise / variance. I came up with a smoothing technique that seems logical and seems to tell a much more coherent story from the data, but I’m always wary of creating my own methodology, so I wanted to see what smarter statistical minds had to say. Here’s one example:
If the market projection for an NBA player is 7.5 rebounds, but there’s a straggling 8.5 somewhere, how much juice is it worth paying to play under 8.5? Put differently, if a player has a 50% chance to go under 7.5 rebounds, what are the chances they go under 8.5 rebounds, i.e. what are the chances they land on exactly 8 rebounds?
I gathered the 2022 data for the 31st – 40th rebounders. Here’s some of the data for Al Horford, as an example:
Mean = 7.7, Median = 7, SD = 2.9 Total Games = 69
4 rebounds – 9 games
5 rebounds – 7 games
6 rebounds – 7 games
7 rebounds – 11 games
8 rebounds – 7 games
9 rebounds – 5 games
10 rebounds – 9 games
11 rebounds – 7 games
While we could reasonably expect the data to be left or right skewed, there’s no reason to expect that Horford – or any other player who plays consistent minutes – should have more rebounds at a data point further from his median than at a data point closer to his median. It seems like it would be patently ridiculous to estimate that Horford should land on 8 rebounds and 11 rebounds 10.1% of the time each given that 8 is already above both his mean and median. My assumption is that the sample size simply isn’t large enough at each data point to coalesce on the true probability; indeed, every player I’ve looked at has some obvious outliers .
In an attempt to solve this problem, I came up with the following smoothing technique, which I’d like your feedback on:
To estimate the probability of landing on 6 rebounds, I added the number of games Horford landed on 4, 5, 6, 7, or 8 rebounds, and divided by 5. To estimate the probability of landing on 7 rebounds, I added the number of games Horford landed on 5, 6, 7, 8, or 9 rebounds, and divided by 5. And so on and so forth. Are there any flaws with this approach? I tried this process on a random set of data in a different context and confirmed that, at least in that one case, it didn’t artificially increase the correlation. **If all you had was Al Horford’s 2022 data and you wanted to make your best guess as to his true likelihood of getting exactly 8 rebounds, would you prefer to use the actual %, the smoothed %, or something else?** I also used Excel’s NORM.S.DIST function , so that could be another option.
Personally, I think the smoothed % provides a better projection than the actual %. However, I averaged the data together for the 31st through 40th rebounders, and at this point, the sample size is much larger at each point, and the overall curve for the actual % of those 10 guys landing on a certain number is pretty smooth. I’m inclined to think that with 10x the sample size, the actual % provides a better general projection than the smoothed %. That said, for rebounders 1 through 10, there’s still one big outlier right around the median, which is the number I’m most concerned with because that’s the number those players will likely be projected around. So, same question as before – with the data from 10 players combined, do you think the actual %, the smoothed %, or something else, would provide the most accurate projection? I realize this might be hard to answer without more context / the full data set. I’m happy to answer any questions or provide the excel doc, if there’s a way to do that through Reddit. Mostly, I want to make sure there’s not some glaring problem with that smoothing technique, because it’s proved illuminating in a number of different contexts. Thanks so much in advance for any insight!