I’m trying to add a tad of statistics to a manuscript where the data is obviously different .

I have two groups of students: Ones that went to tutoring, and ones that did not.

I wanted to do the Chi-squared on each class to see if there was a significant difference regarding their grade outcomes.

The class sizes are large with 100+ students and the distributions between groups are very different.

I have whole letter grade data . The problem arose that for some groups in some classes there would be a zero count for either D or F’s.

To resolve this I expanded the bin sizes. I have the arrangements:

1. Good grades , Moderate grades , and Bad Grades

2. Successful in Course and Unsuccessful in Course

3. Received and A, Did not Receive an A

All tests came out saying the data is dependent at the 99% confidence interval.

My question is would it be okay to use all three, or is there some reason some of these would not be as ideal as others.

The point of the paper is to suggest tutoring is good for student performance among other thing.

I have no formal education on stats so I’d really appreciate it. 🙂