scikit-learn, by far the most popular machine learning package for Python, is more concerned with predictive accuracy. To some extent, R "assumes" that you are performing statistical learning and makes it easy to assess and diagnose your models. The line between these two terms is blurry, but machine learning is concerned primarily with predictive accuracy over model interpretability, whereas statistical learning places a greater priority on interpretability and statistical inference. Are you teaching "machine learning" or "statistical learning"? One contributing factor is that companies using a Python-based application stack can more easily integrate a data scientist who writes Python code, since that eliminates a key hurdle in "productionizing" a data scientist's work. In industry, the data science trend is slowly moving from R towards Python. In academia, especially in the field of statistics, R is much more widely used than Python. Do your students want to go into academia or industry? If your students don't have any programming experience, I think both languages have an equivalent learning curve, though many people would argue that Python is easier to learn because its code reads more like regular human language. If your students have some programming experience, Python may be the better choice because its syntax is more similar to other languages, whereas R's syntax is thought to be unintuitive by many programmers. Here are some questions that might help you (as educators or curriculum developers) to assess which language is a better fit for your students: Do your students have experience programming in other languages? I enjoy using both languages, though I have a slight personal preference for Python specifically because of its machine learning capabilities (more details below). I also mentor data science students in R, and I'm a teaching assistant for online courses in both R and Python. That's an excellent question! It doesn't have a simple answer (in my opinion) because both languages are great for data science, but one might be better than the other depending upon your students and your priorities.Īt General Assembly in DC, we currently teach the course entirely in Python, though we used to teach it in both R and Python. We are hoping to get a sense of what would be more appropriate for computer and non computer science students, so if you have a sense of what colleagues that you know would prefer, that also would be helpful. For now, we'd appreciate your sense of the relative merits of those two environments. We are considering as tools the statistical environment R and Python and will likely develop two versions of this course. Some of the modules we are developing include, for example, data cleansing, data mining, relational databases and NoSQL data stores. The course will have no prerequisites and will be targeted for non-technical majors, with a goal to show how useful data science can be in their own area. I'm part of a team developing a course, with NSF support, in data science. In the comments, I received the following question: Last week, I published a post titled Lessons learned from teaching an 11-week data science course, detailing my experiences and recommendations from teaching General Assembly's 66-hour introductory data science course. R Python Should you teach Python or R for data science?.
0 Comments
Leave a Reply. |