Frequently Asked Questions | ||||||
Site last updated: Table of Contents
IntroductionThere will be a group project associated with this course. It provides an opportunity for students to practice formulating a data science question, consider what data and analysis would be needed to test a hypothesis associated with the question, locate and analyze that realworld data, come to a conclusion and communicate the results in a scientific report format. Group formationThe project will start in week 2, but we will begin forming groups immediately since time is short in the summer session. We will provide a team signup sheet where you can tell us who your chosen team members are or select to be assigned. If you have not found a group by Thursday of week 1 we will assign you. Once you have met your assigned team (in the case you are assigned) if you want to change groups you need to change as early as possible and we will provide a procedure (secondary google form for changing groups). Group sizeGroups should consist of optimally 4-5 students, though we have had single member groups and groups as large as 7. The extremes pose challenges. Too few students and it is a lot of work for fewer people. Too many students and it is difficult for everyone to have everyone get enough involvment. If you decide on less than 4-5, we suggest you plan early and choose an attainable question, dataset and analysis. If you decide on a larger team, you need to be very organized who is doing what task and ensure that everyone is given a significant enough role. You may need to choose a more complex problem or put more into the analysis. Project structureThe entire project will be turned in to a group github repository we will create for you and give you access to. You will practice version control there, and keep data, as well as turn in all the pieces there in the form of jupyter notebooks and a final video. The project will consist of multiple checkpoints to help you break down the task into managable pieces, culminating in a final report along with a video presentation that you record and a group review form to provide feedback for each others' participation. It might look intimidating, but each element actually helps you draft your final report. By the time you get to the second checkpoint, you will have a rough draft complete and just need to refine it in the last week of the class. We will also be helping with the last assignment to reduce the time demand, so you can put the most effort into the project. The pieces are as follows:
Previous quarter examplesYou can see some examples of previous works here: https://github.com/COGS108/FinalProjects-Fa21 If you look at any of the group_xx-xxxx repos and look for 'finalproject' notebooks you will see examples. I will add specific ones, the above was chosen to allow you to search for what interests you from that particular quarter.
Description of project parts1. Previous project review (Due Friday 7/14/2023 at 11:59pm)Here you will review 2 previous projects as a team and submit one google form that asks a few questions about each project. The questions are designed to orient you to the project and get a sense of what you are going to do. It will also get you to think about 1)What will the final report look like in terms of structure? 2)What are some of the questions you can ask? 3)How will you go about answering those questions? We recommend you each as individuals look over the projects and questions, make notes then come together as a group to put together the final submission. However if it works better for you to do it all as a team it is up to you. You will be surprised at what you will be able to formulate and refine. While it helps to see what has been done don't be afraid to think outside the box, this is what good scientists and engineers do in order to take human knowledge and achievement in new directions! Requirements:
Note: You can select a different project from previous COGS108 projects in our github repo or on the standard COGS108 course repo if you would like but don't spend much time searching given the timing of the assignment.
2. Project proposal (Friday 7/14/23 at 11:59pm)In this part of the project you will, following the format linked in here available in the projects github repo for COGS108 here,
It is understood that this early on all of this is a draft, and we have a short time to put this together, but the more effort that goes into this part, the more you will cruise through the rest. The idea is to get far enough to have a sense of whether your question makes sense and is answerable with data, or needs refinement. We will look at your submissions and provide feedback to help you refine the question and your plans for how you will work with and analyze the data (not tear apart what you submit). Asking the right question can mean the difference between a smooth project and a project with many necessary significant changes (such as a totally different dataset). Given summer session timing we have not done a great deal of review of data science technique yet and we have kept this in mind in planning the project. You will be adding further specifics as you go, some aspects will change, and this is why we have a data checkpoint and exploratory data analysis checkpoint later. Requirements:
3. Checkpoint 1: Data (Friday 7/21/23 at 11:59pm):For this checkpoint you should have located one or more datasets associated with your project, accessed it and performed the majority of or all of your data wrangling and cleaning in order to get it into a usable form for your analysis. You will write up a short description of the data, where it comes from and what it represents, how it is structured and you will include your code, well commented. This is not where you will have performed much analysis necessarily, though you should have done some basic visualization or otherwise show that the data is going to be useful for your question/hypothesis. You may at this time decide you need more data, different data or otherwise. To complete the checkpoint you should have a good idea where, if you need more data, you can get it – perhaps you have it but have not completed the secondary cleaning and wrangling, but have the framework from the first dataset. The main idea is – get your data, get it wrangled so you can operate on it, take a quick look at it. Plots are not required and certainly no significant analysis. But try to have a sense of if this is the dataset you will be sticking with or if you need more data, you should have put significant work into getting it even if it is too late to fully incorporate it into your checkpoint. Data description requirements: it is structured (or you can make it structured), what it contains, and any characteristics needed for your data science project
Requirements:
4. Checkpoint 2: EDA (Friday 7/28/23 at 11:59pm):For this checkpoint the goal is to, now that you have your data, work towards the goal of rejecting or failing to reject the null and gaining insight about your data science question. So explore the basic statistics of your data - central tendency and variability. If it has a normal distribution you can use standard statistics, and otherwise you can explore with nonparametric statistics. Then given those insights perform various visualizations, generate tables or other ways to 'look at' the data. Finally, as you probably have an idea of what type of modeling you would like to do, execute at least a good portion of the modeling - regression, curve fits, if you are doing machine learning try to get as far as possible on this. It is understood you may be expanding this for the final report, as that is why we call it 'exploratory' instead of'super final done from every angle' or similar. Keep in mind the more you explore your data, the better the picture you will get in your head as to the final statements you can make about it. Overall consider this a first rough draft of the report, with potentially missing sections such as the conclusion and results section might be not fully written. You want to explore as far as possible and have a good idea of the last bits of modeling you want to do in order to make a statement and support it regarding your question and hypothesis. Requirements:
5. Final report and video: Report due Friday 8/4/2023 at 11:59pm For now please reference the final project template. Essentially you will go from the EDA to the final modeling (if modeling), write your results and conclusion section, and refine the EDA checkpoint level of your draft. Correct spelling, grammar, and clean up the code, include comments as well. The conclusion is where you consider the scientific hourglass shape - broad initial starting point that narrows down to a hypothesis, then the breadth increases again until the conclusion where you make your statements about the implications and future work. Report Requirements:
Final report video (due by Sat at 12 noon, so we can create a list for people to watch and do video reviews for EC) Video instructions: 5-8m video introducing your project, data, results and conclusion. Can be (encouraged to be but no hard requirement) a set of slides, but you can express how is most effective for your project. It should be planned out however, so outline it first. Requirements:
Group peer review form here Here you will find several links to open datasets from a variety of sources. You can locate and use your own, and you can also use your own datasets, if they are stripped of personally identifiable information (PII) and you have the rights to use them.
More info to come...
|
||||||||||||||||||||||||||||||||||||