Frequently Asked Questions | ||||||
Site last updated: Table of Contents
IntroductionThere will be a group project associated with this course. It provides an opportunity for students to practice formulating a modeling and data analysis related question and hypothesis, consider what data and analysis would be needed to test a hypothesis associated with the question, locate and analyze that realworld data, come to a conclusion and communicate the results in a scientific report format. The focus is on the data, modeling and analysis, delving into the reasoning behind model selection and where it can take us in the future as well as how it can be applied to scientific and engineering work. Group formationThe project will start in week 2 (with the paper review), but we will begin forming groups immediately since time is short in the summer session. We will provide a team signup sheet where you can tell us who your chosen team members are or select to be assigned. If you have not found a group by Thursday of week 1 we will assign you. Once you have met your assigned team (in the case you are assigned) if you want to change groups you need to change as early as possible and we will provide a procedure (secondary google form for changing groups). Group sizeGroups should consist of optimally 4-5 students, though we have had single member groups and groups as large as 7. The extremes pose challenges. Too few students and it is a lot of work for fewer people. Too many students and it is difficult for everyone to have everyone get enough involvment. If you decide on less than 4-5, we suggest you plan early and choose an attainable question, dataset and analysis. If you decide on a larger team, you need to be very organized who is doing what task and ensure that everyone is given a significant enough role. You may need to choose a more complex problem or put more into the analysis. Project structureThe entire project will be turned in to a group github repository we will create for you and give you access to. You will practice version control there, and keep data, as well as turn in all the pieces there in the form of jupyter notebooks and a final video. The project will consist of multiple checkpoints to help you break down the task into managable pieces, culminating in a final report along with a video presentation that you record and a group review form to provide feedback for each others' participation. It might look intimidating, but each element actually helps you draft your final report. By the time you get to the second checkpoint, you will have a rough draft complete and just need to refine it in the last week of the class. We will also be helping with the last assignment to reduce the time demand, so you can put the most effort into the project. The pieces are as follows:
Previous quarter examplesWe will provide some examples of similar projects and the old final project which was an individual project, but provides guidance for the modeling and data anlalysis path.
Description of project parts1. Previous project review (Due Sat 7/15/2023 at 11:59pm)Here you will review 2 papers as a team and submit one google form that asks a few questions about each paper. The questions are designed to orient you to the project and get a sense of what you are going to do. It will also get you to think about 1)What will the final report look like in terms of structure? 2)What are some of the questions you can ask? 3)How will you go about answering those questions through the modeling and data analysis structure? We recommend you each as individuals look over the papers and questions, make notes then come together as a group to put together the final submission. However if it works better for you to do it all as a team it is up to you. You will be surprised at what you will be able to formulate and refine. While it helps to see what has been done don't be afraid to think outside the box, this is what good scientists and engineers do in order to take human knowledge and achievement in new directions! Requirements:
2. Project proposal and checkpoint 1: Data (Sat. 7/15/23 at 11:59pm)In this part of the project you will, following the format linked in here available in the projects github repo for COGS109 here,
It is understood that this early on all of this is a draft, and we have a short time to put this together, but the more effort that goes into this part, the more you will cruise through the rest. The idea is to get far enough to have a sense of whether your question makes sense and is answerable with data, or needs refinement. We will look at your submissions and provide feedback to help you refine the question and your plans for how you will work with and analyze the data (not tear apart what you submit). Asking the right question can mean the difference between a smooth project and a project with many necessary significant changes (such as a totally different dataset). Given summer session timing we have not done a great deal of review of modeling and data analysis technique yet (at least not the modeling part) and we have kept this in mind in planning the project. You will be adding further specifics as you go, some aspects will change, and this is why we have an exploratory data analysis checkpoint later. Details of the data checkpoint portion:
This is not where you will have performed much analysis necessarily, though you should have done some basic visualization or otherwise show that the data is going to be useful for your question/hypothesis. You may at this time decide you need more data, different data or otherwise. To complete the checkpoint you should have a good idea where, if you need more data, you can get it – perhaps you have it but have not completed the secondary cleaning and wrangling, but have the framework from the first dataset. The main idea is – get your data, get it wrangled so you can operate on it, take a quick look at it. Plots are not required and certainly no significant analysis. But try to have a sense of if this is the dataset you will be sticking with or if you need more data, you should have put significant work into getting it even if it is too late to fully incorporate it into your checkpoint. Data description requirements: it is structured (or you can make it structured), what it contains, and any characteristics needed for your data science project
Requirements:
3.Checkpoint 2: EDA (Tues. 8/1 at 11:59pm) For this checkpoint the goal is to, now that you have your data, work towards the goal of rejecting or failing to reject the null and gaining insight about your data science question. So explore the basic statistics of your data - central tendency and variability. If it has a normal distribution you can use standard statistics, and otherwise you can explore with nonparametric statistics. Then given those insights perform various visualizations, generate tables or other ways to 'look at' the data. Finally, as you probably have an idea of what type of modeling you would like to do, execute at least a good portion of the modeling - regression, curve fits, if you are doing machine learning try to get as far as possible on this. It is understood you may be expanding this for the final report, as that is why we call it 'exploratory' instead of'super final done from every angle' or similar. Keep in mind the more you explore your data, the better the picture you will get in your head as to the final statements you can make about it. Overall consider this a first rough draft of the report, with potentially missing sections such as the conclusion and results section might be not fully written. You want to explore as far as possible and have a good idea of the last bits of modeling you want to do in order to make a statement and support it regarding your question and hypothesis. Requirements:
4.Final report(Sat. 8/5 at 5pm) Requirements:
4B. Final report video portion (Sat. 8/4 at 5pm) Video instructions: 5-8m video introducing your project, data, results and conclusion. Can be (encouraged to be but no hard requirement) a set of slides, but you can express how is most effective for your project. It should be planned out however, so outline it first. Requirements:
4C. Group member review (Sat. 8/5 at 5pm) You will review each others' participation in the project. We will provide a google form, and please also include a summary in your report. This has an impact on each others' grade, and we will consider the comments when deciding the final grade of the project for individuals. 4D. Extra credit video reviews (Mon 8/7 at 5pm) Watch video presentation summaries of the other group projects and then submit a google form survey answering various questions. It will not take long per form, and you recieve 0.5% per video review, up to 6%. You'll gain additional experience by learning about the techniques other groups used and their challenges.
Here you will find several links to open datasets from a variety of sources. You can locate and use your own, and you can also use your own datasets, if they are stripped of personally identifiable information (PII) and you have the rights to use them.
More info to come...
|
||||||||||||||||||||||||||||||||