-
Performing array intersection with Bash
Posted on October 12th, 2010 No commentsI am currently working on a project to deploy new website builds to a small number of servers. I needed something simple and reliable that could be built in a very short period of time. I decided to whip something up in bash with the intent of refining it in Python later.
As I began to write this code, I realized that it probably would have been quicker to do it in Python from the start. I decided to stick with bash as somewhat of an academic exercise. The vast majority of these deployment scripts were trivial; check the code out of git, create a manifest, package it up, spew it to the servers, etc, etc. The problem came during the last step. We decided to use a symlink to point to the active build out of a number of builds that could be available on the server at any given time. Since all of our servers should be running the exact same version of the build, it makes sense that I should only allow a user of my deployment scripts to link a build which exists on all servers. But how do you accomplish this in bash?
In most other languages, you have access to numerous array helping functions that allow you to perform intersects, uniqs, and merges. My goal was to do the same thing in bash without forking out to any external binary. So how do you ensure that a particular thing exists on N number of servers? Here it is:
function in_array() { local x ENTRY=$1 shift 1 ARRAY=( "$@" ) [ -z "${ARRAY}" ] && return 1 [ -z "${ENTRY}" ] && return 1 for x in ${ARRAY[@]}; do [ "${x}" == "${ENTRY}" ] && return 0 done return 1 } MASTER=() CURRENT=() FIRST=1 for SERVER in ${SERVERS}; do # collect all builds from server and populate CURRENT list COMMAND="${LS} -1fd ${WEBROOT}/${SITE}.*" BUILDS=`${SSH} ${SSHOPTS} root@${SERVER} "${COMMAND}"` for BUILD in ${BUILDS}; do CURRENT=( ${CURRENT[@]-} ${BUILD} ) done # if this is our first time around, copy CURRENT to MASTER if [ ${FIRST} -eq 1 ]; then MASTER=( ${CURRENT[@]} ) FIRST=0 fi # now we do a compare between MASTER and CURRENT to see what builds # are common INTERSECT=() for ENTRY in ${CURRENT[@]}; do in_array "${ENTRY}" "${MASTER[@]}" RET=$? if [ "${RET}" -eq 0 ]; then INTERSECT=( ${INTERSECT[@]-} ${ENTRY} ) fi done MASTER=( ${INTERSECT[@]} ) # clear the CURRENT array CURRENT=() done
Let me take a moment to explain the code above:
- In order to check for array intersection, you need an in_array() function
- The first argument as the “needle” and the second is the “haystack”
- We verify that both parameters were passed
- We simply loop through the haystack checking for the needle
- If we find it, return success. Otherwise, eventually return false
- We need to loop through each server eventually, but we’ll start with the first one
- Run an SSH command to get a listing of builds
- Populate an array ($CURRENT) with the builds that were found
- Since the first server has no previous server to compare with, so we just copy it to $MASTER
- We then loop to the 2nd server, and put the result of getting builds into $CURRENT
- Now that we have the first server’s builds in $MASTER, we perform an intersect with $CURRENT
- We realize the need for an $INTERSECT array to hold the intersections found above
- $INTERSECT becomes $MASTER since it only contains similar builds from the 1st and 2nd server
- Looping to the 3rd server, we get the builds and put them in $CURRENT
- Since $MASTER contains only the similar builds thus far, we again compare it with $CURRENT
- The intersect can now be used to compare against builds on the 4th server, and so on
- Once you finish looping through all servers, your $MASTER should contain only similar builds
There are a few guides out there which show you how to do this via forking, but I thought someone may appreciate the elegance of using 100% bash to accomplish this. I hope this helps someone else out there!
- In order to check for array intersection, you need an in_array() function


