Blog

  • Recommend the MIT “missing Computer Science semester” lectures

    If you code using GUIs or a mouse, go do the MIT “The Missing Semester of Your CS Education” course as soon as possible.

    I have been coding off and on for more than 20 years now. I have had Linux desktop systems on the recommendation of and in order to better work with colleagues. I use the shell somewhat regularly. And yet, even 10 minutes into the first lecture, I have learned a few things. This missing information both helps me be more efficient and gives me a hierarchical understanding of why some things I have tried have worked well and others not so much.

    One caveat is that passively listening to the lectures is not enough, even though it is easy to do. One complaint I have with the video is sometimes how quickly it moves on from the input text if I have a question about spacing, a capitalization, or the direction of a slash. Doing the examples and pausing the video until I’ve played with variations really helps me see how this plays out on my machine. I like this much more than some of the pre-loaded, artificial environments I have used to learn coding online. I can learn stuff that way, but it is hard to know where the problem is when it does not work in your environment.

  • Visual Encyclopedia of Chemical Engineering

    While looking for something totally different, I ran across this awesome resource from my alma mater, University of Michigan. Picking a flow meter for your next project? Trying to remember the difference between an adsorber and an absorber? Curious how similar the unit ops are between distilled spirits and a pharmaceutical API? Look no further!

  • Micro tutorial: Calculating Weighted Averages with Weighting applied to Multiple Columns in a Pandas Dataframe

    I am recording this solution here because when searching for how to do this, I found many answers about how to weight one Pandas column by another column and I had already successfully done that by myself. For me, the trick was to apply one weighting column across several columns. For example, if I mix multiple batches into one tank, I might want to find the weighted average of all the attributes I am tracking for that lot.

    # make a list of all the column names to which the weighting will be applied
    var_cols = ['chiral_purity', 'water_content', '[impurity]']
    
    # each Input Batch property is weighted by how much was added to the tank by weight where the master dataframe holding all batch data is called dfA
    Output_Lot = dfA.groupby('InputBatch').apply(lambda x: pd.Series([sum(x[v] * x.InputWeight) / sum(x.InputWeight) for v in var_cols]))
    
    # Use the input column property names with the suffix "_mix" in the Output Lot dataframe to indicate the weighting has been performed
    Output_Lot.columns = [e+'_mix' for e in var_cols]

    This solution was inspired by this post.

  • Micro tutorial: Manually back up a WordPress site hosted on Dreamhost

    Before updating, WordPress always reminds you to have a backup in case things don’t work automatically and you want to restore your site from a backup. On a grammar note, Wikipedia handily explains: The verb form, referring to the process of doing so, is “back up”, whereas the noun and adjective form is “backup”. To back up a site, you need to back up two things, the database and the files.

    To back up the site database, go to phpMyAdmin. Log in to your Dreamhost panel, navigate to Websites\MySQL Databases and scroll down. The top of the window is about adding a new database, which is not what you are looking for right now because you want to back up an existing one. When you get to this header, you are in the right place:

    Dreamhost panel Websites/MySQL Databases/Hostnames for this MySQL server

    Under this list, choose the site you are backing up from the list of sql.sitenames.com and click the phpMyAdmin link under the Web Administration heading. It will open as a new tab. If you have multiple databases for this site, click on the one you want to back up. By default, databases open in the Structure tab. Click on the Export tab at the top. Technically, you could export in whatever format you want, but you probably want SQL. When you click the Go button, it might give you some options, but then will start downloading a file called sitename.sql. This is your database backup.

    Now, to back up your files, use the shell to ssh into sitename.com.

    ssh username@sitename.com
    ls #find out where you are, navigate to what you want to back up
    tar -czf sitename.com.dateyy.mm.dd.tar.gz sitename.com
    ls #run again to make sure you see the file you just created

    That’s it. Now you can update your site by clicking the upgrade button on your WordPress dashboard. This time the update worked just fine for me. Therefore, I did not need a backup. Stay tuned for the time it does not work for my post about restoring from a backup. When the time comes, future me will want this link to Rob’s post on restoring his site from a backup.

  • Computation & Biology

    There is a lot of cool stuff going on at the interface of math and biology. This post is to keep track of the corner of this world that my colleagues at Forschungszentrum Jülich, which translates in English to Jülich (the town where it is located) Research Center, have been contributing to.

    One example, championed by Dr. Eric von Lieres, is CADET. CADET (Chromatography Analysis and Design Toolkit) is a collection of fast and accurate mathematical model solvers, especially for of strongly coupled partial differential algebraic equations. Typical bioprocessing applications that this family of mathematical models might apply to include chromatography, filtration, crystallization, and fermentation.

    Another interesting thing going on at FZJ is research on automated bioprocessing. This includes both a python-based Framework for Object-Orienteded Modeling of Bioprocesses (FOOMB) and the mibiolab contract research and knowledge transfer. It’s fun to see labs of the future in development.

    Finally, Rtlive tracks the reproduction number or “R” value of how viral/ exponential the spread of COVID is in a variety of locations.

    I look forward to seeing how these projects progress.

  • Hello world!

    Welcome to Element of Science! This site has gone through several iterations and currently serves as a place to capture and share interesting bits about exploring the world through data.

    Here is a link to an inspiring 2005 TED talk by mathematician Peter Donnelly.

    This talk is important to me for 3 reasons:

    1. It discusses in a humorous way how communication gaps can arise when talking to an audience not already familiar with your work. “Modelling genes (jeans)” is a great job description that gets a different reaction than data scientist or bioinformatician.
    2. It reminds us that analyzing data is a specialization and like all specializations it requires judgment that is not necessarily intuitive to a lay person.
    3. It reminds us that there can be terrible, real-world consequences when we get data analysis and the related conclusions wrong. The world needs more contextual understanding to go with the pile of data we have and make. If the assumptions are violated or the data is not clean and applicable, we are by definition working from an inappropriate model.

    How can we apply these lessons in our everyday work?

    1. Seek common ground and when possible communicate with multiple modes (i.e. words and pictures).
    2. Never stop learning and continually question what we don’t know.
    3. Be explicit about our assumptions and which data counts.