CST 383 < ILP < CSUMB < School < Santa Cruz Abalone Works santacruzabaloneworks.com
Home - Info - Arlon - Featured - School - People - Schools
Cabrillo | Cal Poly SLO | CSUMB
ILP
CST 231 | CST 237 | CST 238 | CST 300 | CST 311 | CST 328 | CST 329 | CST 334 | CST 336 | CST 338 | CST 363 | CST 370 | CST 383 | CST 438 | CST 462s | CST 499 | General Ed | Math 130 | Math 150 | Math 151 | Math 170

CST 383

CST 383 - Introduction to Data Science

In data science, data analysis and machine learning techniques are applied to visualize data, understand trends, and make predictions. In this course students will learn how to obtain data, preprocess it, apply machine learning methods, and visualize the results. A student who completes the course will have enough theoretical knowledge, and enough skill with modern statistical programming languages and their libraries, to define and perform complete data science projects.

Prerequisite(s)/Corequisite(s): (Prereq: CST 238 with a C- or better)

Typically Offered: Fall, Spring

Units: 4

My Experience in CST-383 - Introduction to Data Science:

This is another class that was both informative and invaluable - and this one - this topic - I literally didn't even know about until I took the class.

I was able to get some very useful and educational experimenting with Data Science in Python, NumPy, Pandas, Seaborn, MatPlotLib, and SciKit Learn.

The thing is, I didn't even know about that stuff until this class, at all! It's exactly what I would have been doing all this time if I had known about it so it's almost good I didn't!

Data Science is super powerful. Ever since I was a kid I would write computer programs to help me figure out stuff, over the years I've used Java with console output, Swing GUI program control and Java 2D Drawing API for rendering graphics, HyperCard could help visualize things, Excel (Gnumeric) can makes graphs of huge-ish data sets, JavaScript can get at JSON with CSS graphics and HTML5 graphics, and text graphics. C and C++ can do fast computations with algorithms.

But none of that compare to Python+NumPy (written in C, I think, not positive) with Pandas & Seaborn and MatPlotLib graphing, and Machine Learning with SciKit Learn. The system can help you wrangle data sets the size of which you would not believe. And without writing loops, the loops are built-in to the syntax of NumPy. df[df[age>5]].mean could give you the average age of millions of people over 5 years old in your dataframe, that's it, that one blip would do that, loop over everything, find everyone over five, get the average, and spit that out, all at optimized efficiency. A regular looped algorithm in Python would literally take on the scale of around 50 times slower for the same computation, approximately, by rough example comparison, algorithms run around 50x faster in NumPy than with straight Python loops, generally speaking.

So the power at your fingertips is enormous, compounded by the enormousness of the data set you're using, compounded again by the fact that after you wrangle the data you can throw it at Machine Learning algorithms that will train on the data and then make predictions on unknown data queries!!! Such as predict home values from home data!!! Like, this class showed me how to replace myself, in the real estate appraisal world, with my new programming skills I didn't have until now.

The way it works is Python with NumPy wrangles the data, Pandas gives the data labels, Seaborn and MatPlotLib draw the graphs and SciKit Learn has the Machine Learning algorithms.

So I'm super excited about this knowledge and literally can not wait to put it to use - I am going to take it and run with it - but I have a lot of responsibilities in between me and programming time - so I am working up to that. I'm not going to show a lot of code here because first of all, there isn't much, it's such a powerful language, second of all, I'm five classes past it now so I don't want to botch the syntax, and I don't want a bunch of stuff that can be horked later so just to be safe I'll just show a few graphs we made.

Here are links to my journals:

Links:

I put some explanations in my journals as I came across them.


There's a Data Science Chart AccountBlaster in action: an exciting Pie Chart!!
That's actually a pie chart out of AccountBlaster There's a Data Science Chart
There's some Data Science Charts

List of paradigms I practiced during the class:
  • Python
  • Python Object Oriented Programming (for fun, not part of the class)
  • Python Group Programming
  • NumPy
  • Pandas (Labels for data)
  • Seaborn (graphics, graphs)
  • MatPlotLib (graphics, graphs)
  • SciKit Learn (Machine Learning Training on Data & Data Predictions)
  • more

Thank You CSUMB for the very insightful and very educational class! The material is extremely broadly applicable and useful, as well as extremely well presented. On top of that I didn't even know about Data Science until I took the class. The class ended mid 2022.

       Powered by       Santa Cruz Web Factory