Cron is perhaps the most universally used scheduling tool in the data engineering community. It has been battle tested over decades on Linux and Unix platforms. While cron is an excellent option for scheduling tasks, the format of a cron expression takes some getting used to. This short tutorial breaks down how to tell cron when to execute your scripts and tasks.
Formatting a Cron Expression
The first thing you’ll see in a crontab entry is a set of 5 string fields (the first, which is seconds is optional), each separated by whitespace. For example, this common cron expression tells cron to run the command that follows it at 5 minutes past the hour, every hour:
5 * * * *
But how? Let’s look at each of the 5 fields with acceptable values in parenthesis:
- Minute of the hour (* or 0-59)
- Hour of the day (* or 0-23)
- Day of the month (* or 1-31)
- Month of the year (* or 1-12)
- Day of the week (* or 0-6 with 0 being Sunday)
In our example, we’re setting the minute of the hour to 5 which means cron will run the given command at 5 past each hour. This is because only the “minute of hour” field is set to a value, while all others fields are set as * which is a wildcard meaning “all”.
What if we only wanted to run it at 5 past midnight? All we need to do is change the hour to 0 (midnight is the zero hour in 24 hour time), instead of *. For example, if we wanted to run a bash script called “myscript.sh” at 12:05AM UTC each day, we would enter the following entry into our crontab:
5 0 * * * myscript.sh
The same rules apply for the other fields.
Cron also has what is called a “step value” which is represented by adding a slash (“/”). You can use the format “*/#” to tell cron to run at every “#” of the specified field. Confused? It’s actually simpler than it sounds. For example, say you want to run myscript.sh not only at 5 minutes past every hour, but every 5 minutes of the day. Here, you can use a step value of 5 minutes in the minutes field.
*/5 * * * * myscript.sh
Note that the minutes value is now */5, which tells cron to run your script at every 5 minute interval. With the rest of the values being set to * (all), that means that at 12:00, 12:05, 12:10, all the way through 23:55 your script will be executed. You can use a step value in other fields as well. Here’s how you’d run your script at every 4th hour of every day.
0 */4 * * * myscript.sh
Lists of Values
You can use a comma (“,”) to specific a set of two or more values in a given field. Perhaps you want to do something like run your script at 5 minutes, 25 minutes, and 50 minutes past every hour. Here’s how:
5,25,50 * * * * myscript.sh
Ranges of Values
Finally, cron allows you to use a dash (“-“) to specify a range of values. For example, the following statement executes myscript.sh at 10 minutes past the hour, every hour from 6-11. In other words, 6:10, 7:10, 8:10, 9:10, 10:10 and 11:10.
10 6-11 * * * myscript.sh
There’s a reason why cron has withstood the test of time. Don’t let the cryptic syntax throw you off. Once you get the hang of it, you’ll find it’s a simple and reliable way to schedule all sorts of tasks.
If you haven’t already, you can sign up for the Data Liftoff mailing list to get more content and to stay up to date on the latest in data science and data engineering.
Cover image credit: https://pixabay.com/users/openclipart-vectors-30363/