Projects

Analyzing Square Item Sales in a Matrix Spreadsheet

If you engage in any direct-to-consumer agriculture, you’ll need a point of sale system to process credit card payments, track sales, etc. Square is a popular choice for many small businesses and is what we’ve used with our strawberry/blueberry patch.

While Square’s built-in reports provide a decent insight into overall performance, I had trouble drilling down to item-specific performance over varying periods of time like hour of day, day of week, etc.

To answer these questions and guide setting business hours, I created a spreadsheet to answer 3 questions:

  • How many sales occurred in each hour of an average week (all weeks overlaid)
  • Was there a shift in popular hours as the season progressed?
  • What was the typical pattern of sales during the first weekend of a crop? How does that compare to the second, third, etc, or holidays.

I additionally created two item-specific reports to show the most popular hours of a day or days of a week for every item.

Spreadsheet Setup

I created this in Google Sheets. You can add a blank copy of this spreadsheet to your drive here

Download CSV Data from Square

  1. From your Square dashboard, select Transactions

2. Then pick your timeframe for the data. The spreadsheet will allow you to filter by time period later, so unless you have a good reason to limit the input data, select an entire year at a time. (1 year is the maximum for one export)

3. On the right side, click Export, and select Item Detail CSV

4. (Repeat this for as many years as you have data you want to be able to query)

Importing CSV Data to Spreadsheet

1. If you haven’t already, add a copy of the spreadsheet to your Google Drive via this link

When you open the spreadsheet, you should land on the Raw Data sheet. Columns A – J are formula helper functions, so insert the Square data starting at column K.

2. With K2 selected, go to File > Import > Upload, and select the CSV you downloaded from Square.

3. After the file uploads, you should see the Import file modal. For import location, select “Replace data at selected cell”. Everything else can be left as is. Click import data.

You’ll notice that the first row of the inserted data contains column labels. Double check that these map to the existing labels in the top row (but don’t delete this row yet, it has formulas that need to be copied). Square has not changed their column format in the two years since I first made this spreadsheet, but you never know. ¯\_(ツ)_/¯

4. You’ll need to copy the formulas in A2 – J2 to all the rows below. The most reliable way to do this at scale is to select A2 – J2, and copy. Then select A3, scroll all the way to bottom of the sheet, hold Shift, and select the bottom cell in column J. Now do a regular paste. This should apply the formula across all the cells without having to drag and scroll down potentially thousands of rows.

5. Delete the header row at A2.

6. Repeat this process for each CSV you need to import and remember to delete the column headings each time

Go to the Query Page sheet when you’re finished. If all has gone well, you should see each year you’ve imported under the Year list, and you should see your product categories and items/price points under the respective lists. Now you’re ready to use the querying page and reports.

Using the Spreadsheet

Query page- start here

Query Page is what you’d expect- a landing page to select filters on what data you want to view. The checkbox to the left of each item includes/excludes it from the data. Depending on the size of your dataset and the kind of device you’re viewing from, filter applications could take a bit to apply, although I’m usually under a 5 second digest time with my 19,000 row dataset on a decent laptop.

The Metric dropdown controls what numbers are used in the reports: count, quantity, revenue, or transactions.

Once you’ve picked the metric you want to see and selected/unselected your filters, you can visit any of the subsequent report pages and those filters will already be applied.

These reports are all two-dimensional matrices with conditional formatting across the matrix to highlight highs/lows. There is some repetition and overlap between reports, but I’ll step through each report so you can see what each does well.

Day of Week / Hour of Day

Note that the hours noted in these column headings mean the hour portion of the time. For example, 2 PM accounts for all sales from 2:00 – 2:59 PM.

Day of Year / Hour of Day

The left axis in this view is first aligned by day of week and week of year. Then after the fact, Approximate Date is calculated for an average calendar date. In our business “a Saturday in late May” means more than a literal date like “May 27th”. I’d guess this is the case for many businesses and thus being weekday centric makes sense for fair interyear comparisons.

Week of Year / Day of Week

Think of this like a calendar with columns as days and rows as weeks. The same date fuzziness is at play here since these are aligned by week of year. The start/end dates listed on the left are averages, not extremes.

Product / Hour of Day

Product / Day of Week

Expansion

The report pages all reference the hidden Filtered Data sheet which is generated by applying the Query Page filters to the Raw Data sheet. If you’d like to build your own reports, referencing this Filtered Data sheet will allow you piggyback off all the built-in querying.

Pivot Table Weirdness

In this project I discovered how pivot tables in Google Sheets misbehave when the source data is changed. Specifically, if you have conditional filters applied, then delete and insert fresh data to the data range, the pivot table will not process the data until you remove and reapply the filters. I ended up rewriting my report pages (which originally used pivot tables) with a manual SUMIFS matrix so that it would reliably work when copied and populated with someone’s new data. This also drastically sped up the digest time of queries (I had load times of 60+ seconds on my 19,000 row dataset, down to under 5 seconds with the bare SUMIFS matrices).

I never went back to test a pivot table on the Filtered Data sheet as that method of filtering postdated the use of pivot tables, but you may encounter problems on reloads.

Tutorials

Tips for Locating Drainage Tile Lines via Google Earth

For several years we’ve had a wet area in a field which we suspect is due to a collapsed clay tile line. My dad had attempted to located the line once before without any luck. Last year I overheard a tip about using satellite photos to locate lines and decided to try to find it again while we were doing some other tiling work. I regret to admit that we were still unable to locate the line in a timely manner and had to move onto other jobs, but I learned a few tips that could be useful in this area.

I’m using the desktop version of Google Earth Pro, which is now free. You can download it here. From what I can tell, the browser version of Earth doesn’t support historical satellite imagery.

Find the Right Year

The most recent year available in a standard Google Maps satellite view may not show you much, but Google Earth allows you to view historical satellite photos from various years and months. After browsing through the available imagery, I discovered that imagery from March 2005 makes lines stand out clearly for my area.

This of course assumes the tile you’re looking for has been there for a couple decades, but presumably you’d have maps for anything done recently.

Translate to a Guidance Display

In some cases, you may be able to just measure in from a field boundary or a fixed object on the map and use that to locate the line in person with a tape measure. If that won’t work, the following steps outline how to translate what you see in Google Earth to lines on your guidance display.

Overlay measuring lines

Using the ruler tool in Google Earth, lay down measure lines on the tile lines. You can squint, zoom in and out, and play with the A/B points until you have it centered to what looks correct. Then save this line in a folder in your Saved Places. Even if you don’t plan to export the latitude and longitude data, these lines can still be helpful for visualizing the tile map.

Copy Lines to Guidance System

Put all your saved tile lines in a folder in your Saved/Temporary Places, then right click on the folder, select Save Place As. (If you use a guidance system that can natively import KMZ/KML files, great! That will be much more straightforward and you can skip the rest of this.) If not, and you’ll want to access the lat/long data manually, save the file as a KML.º

Open the KML file in a simple text editor like Notepad (Windows) or TextEdit (macOS). Scroll down to the second half of the file and look for lat/long pairs under a <coordinate> tag. Each <coordinate> tag contains six numbers, separated by commas, in the following order:

  1. Point A longitude
  2. Point A latitude
  3. Point A feet above sea level
  4. Point B longitude
  5. Point B latitude
  6. Point B feet above sea level

This might look intimidating, but it’s fairly liberating in a way; you can paste these lat/long points into whatever software you want or you could even key them directly into a display if you didn’t have very many.

We use John Deere displays, so I created a flag (line type) in Land Manager for each of these lines, then pasted in the lat/long.

Note: when making a flag line, you’ll have to just click somewhere to make a 2 point line, then you can go back later and paste in the proper lat/long for each point.
Flag lines on 2630 map

Calibrate with a Known Point

If you want to take lat/long coordinates from a satellite photo and go to them in the real world, there are at least two variables that impact the accuracy of this; 1) how accurately was the satellite photo stitched and 2) how much drift is being experienced by the receiver.

To account for this, find something nearby in the real world that appears clearly in the satellite photo and is unambiguously in the same location today.

Good candidates:

  • Concrete pads, well covers, etc
  • Fence posts
  • Road intersections

Building corners generally wouldn’t work well because you can’t sit directly next to / atop them and still have a wide view of the sky. You also run the risk of parallax distortion depending on how much of a side view the satellite captured and how tall the building is. (Watch for this on fence posts as well).

Concrete pad corner which I used

Mark this feature with lines in Google Earth using the same historical photo you used to mark the tile lines and bring those to your display as was shown earlier. Now align yourself so the drawbar on the display is at the reference point and compare your offset in the physical world.

2630 map showing my drawbar directly over the line corner
In the real world, I’m about 6′ too far south and 7′ too far west.

Once you know your offset, you should be able to drive to what the screen shows as correct and measure back to the true location accordingly. Depending on the temporal accuracy of your GPS corrections, you may need to do this immediately. In our case we’re just using SF1, so I went directly to the tile line after getting this offset measurement and started placing flags.

“Says the guy who couldn’t find his tile line”

As I mentioned earlier, I can’t prove that this method works because I wasn’t actually able to locate the line we were looking for. But it seems really solid in theory, and as we all know, when something works in theory, it works in practice.

Let me know if you have any luck with this method or have any better/simpler ways of locating lines.

Tutorials

How to Manually Log Partial Coverage Maps in John Deere Operations Center

Despite John Deere’s sincerest wishes, not every piece of equipment that covers our fields has a GreenStar display in it. Not only do we have a mixed fleet of guidance systems, but getting raw GreenStar files from custom applicators is difficult or impossible.

John Deere Operations Center does not allow the import of third party shapefiles as a coverage map for Field Analyzer or Variety Locator. Last year, the Operations Center mobile app gained the ability to manually log field operations which automatically applies a single variety/product across an entire field for a given date/time. It’s crude and simple, but at least allows something to be logged without the requirement for a GreenStar display to be involved. I used this for our herbicide sprays since our sprayer does not have a GreenStar display and it works well given that we cover the whole field with the same rate of product and typically do so within a day or two.

Where I ran into an issue was with a check strip from an aerial fungicide application. This ran perpendicularly across a corn variety test plot and I was sent a screenshot of the mapping output from AgriSmart’s Flight Plan Online. This got me thinking of a creative way to force the Operation Center mobile app to generate a partial coverage map so I could more easily do yield comparisons in the fall.

Overview of what you’ll need to do:

  1. Create a field boundary that is the shape of only the area applied. Set it as the active field boundary.
  2. Manually log an operation on that field in the mobile app.
  3. Wait for the coverage map to finish processing, then change the active boundary back to the full field boundary.

1. Modify Field Boundary

Field boundary in Operations Center (left) vs screenshot of coverage map sent by aerial applicator (right)

Above is the screenshot of the coverage map I was given. These are 80′ passes and from looking at the field to the north, 4 swaths worth were not sprayed. So a section ~320′ from the east side, excluding 80′ that was sprayed along the entire south end.

I used Google Earth to measure out this block and get latitude and longitude points for the 3 corners.

I added that as a Flag line in Operations Center (Land Manager) with a Flag Category of “Coverage Boundary” just for the sake of not junking up the tile hole or rock list. Note: you’ll have to click to lay down whatever points you’ll need first, then you can circle back and get the latitude/longitude correct.

Next, make a copy of your field boundary and name it something distinctive.

Make sure your flag line is visible (it also helps to have your old boundary invisible), then edit the new boundary and drag the edges to the flag line.

Make sure you check the box for “This is the active boundary”.

Once you have saved an active boundary that is the shape of what you want to apply, you’re ready to head to the mobile app.

2. Log operation in mobile app

⚠️ Field boundaries seem to cache for some period of time on the device and don’t reload when you open/close the app, even when clearing it from memory. To ensure that the mobile app has the most recent field boundary, log out of the app and back in.

From the Home tab, tap the + circle to add an operation.

When you select the field, make sure the shape looks correct for the modified boundary.

Select your operation type and applicable products

Set the date/time and tap Save.

Eventually, you should see a single operation on the home tab.

Once you can confirm from the Map tab that the coverage map has been created for the modified boundary, you’re all set! All that’s left to do is some clean up work to get the boundary back to normal.

3. Clean up boundaries in Operations Center

Return to Land Manager in the web version of Operations Center and archive the temporary boundary you created. You may consider leaving the flag line and syncing it to your display in the fall if you would like to see the application line from the cab.

Open the edit pane on your original boundary and check the box to make it the active boundary.

Lastly, given the cacheing of field boundaries by the mobile app, it’s a good idea to log out and back into the app to return to the original field boundary.

Open the map view for a victory lap- you should see a complete boundary but a partial coverage map. 👍

Legal disclaimer: all screenshots and functionality shown are of public versions of the software accessed via a fresh account.

Tutorials

Loading GreenStar Setup Files from a Mac

John Deere Data Manager is a Windows-only program that simplifies the process of copying setup files to the proper location on a data card or flash drive. Since this is not available for macOS, Mac users have to manually copy the files to the proper location. This is a very simple process but one that I wasn’t able to find good documentation for when I first did it, so let me add an article to the internet explaining it succinctly.

⚠️ There is currently a bug when downloading from Files in Operations Center via Safari. Instead of downloading the zip file you intended, you’ll get download.html which just contains the text “not found”. You can either use Chrome in the mean time, or you can download the setup files via the File Details tab of the Setup File Creator (in the bottom section of the page there should be a list of previous setup files from the last 12 months. Click on the green circle next to the file to download. This still works in Safari.)

Flash Drive Format

Most flash drives come pre-formatted in FAT format which is old, but very compatible and fine for small files. In the event you need to reformat the drive via Disk Utility on macOS, select MS-DOS (FAT) and make sure you select Master Boot Record if given the option for a partition scheme. I had a hair pulling experience with a 2630 never recognizing a flash drive that was partitioned with GUID Partition Map. Here’s Apple’s guide for formatting a disk with Disk Utility.

GreenStar 3 2630 / GreenStar 4 (4600, 4640, 4240, 4200)

Files are loaded to these displays on a flash drive via the USB port (if you’re reading this article I’ll assume you don’t have wireless data transfer). Here’s the top portion of how the file structure on the flash drive will need to end up:

  • [Top level of USB drive]
    • GS3_2630 ← equivalent of zip output
      • <Name of Setup File>
        • RCD
          • so on and so forth
    • ProductLayers ← will be present if you have Variety Locator data. Should be peer of GS3_2630

When you unzip the zip file you download from Files, you should have a folder called GS3_2630. If you only need that one setup file on the flash drive, you can copy that GS3_2630 folder to the top level of the flash drive (and ProductLayers if applicable) and you’ll be good to go.

I often have several setup files on the same flash drive in a season, so I don’t want to overwrite the entire GS3_2630 folder. In this case, copy the <Name of Setup File> folder from the GS3_2630 folder of the zip output to the GS3_2630 folder on the flash drive.

Since ProductLayers is at the root of the drive and a peer to GS3_2630, you won’t be able to keep multiple Variety Locator configurations on the same drive.

Current state of my flash drive with multiple setup files in GS3_2630 folder.

GreenStar 3 CommandCenter

This works identically to the 2630, just a different name for the top level folder.

  • [Top level of USB drive]
    • Command_Center ← equivalent of zip output
      • <Name of Setup File>
        • RCD
          • so on and so forth

Again, you can copy the entire Command_Center folder to the drive, or just copy individual setup folders within it to retain multiple setups at the same time.

GreenStar 2 2600 Display

The 2600 is the oddball format since there’s no compartmentalization between internal and external storage; you just pull the compact flash card out, connect it to a card reader, and write files directly to it. Below is the file structure for the card.

  • <Top level of compact flash card>
    • RCD ← equivalent of zip output
    • Programs
    • Fonts
    • etc

The zip file you download from Files will unzip as an RCD folder. That can be placed directly in the top level of the data card, overwriting the existing RCD folder. If you want to back up the previous RCD folder, copy it off elsewhere first.

GreenStar 2 1800 Display

Despite its age, the 1800 works just like the GreenStar 3 file formats and uses the USB drive as a means of transporting data to the internal storage.

  • <Top level of USB drive>
    • GS2_1800 ← equivalent of zip output
      • <Name of Setup File>
        • RCD
          • so on and so forth

I don’t have hands on experience with the 1800 to verify the entire process, but I did confirm that this folder structure is what’s created by Data Manager on Windows when loading a setup file to an external drive.

Projects

Localizing University Basis Data Using Regression Analysis

One of the first resources I found in my quest to quantify the local basis market was this dataset from the University of Illinois. It provides historical basis and cash price data for 3 futures months in 7 regions of Illinois going back to the 1970s.

On its own, I haven’t found any region in this dataset to be that accurate for my location. Perhaps I’m on the edge of two or three regions, I thought. This got me thinking- what if I had the actual weekly basis for my local market for just a year or two and combined that with the U of I regional levels reported for the same weeks? Could I use that period of concurrent data to calibrate a regression model to reasonably approximate many more years of historical basis for free?

I ended up buying historical data for my area before I could full test and answer this. While I won’t save myself any money on a basis data purchase, I can at least test the theory to see if it would have worked and give others an idea for the feasibility of doing this in their own local markets.

Regression Analysis

If you’re unfamiliar with regression analysis, essentially you provide the algorithm a series of input variables (basis levels for the 7 regions of Illinois) and a single output to find the best fit to (the basis level of your area) and it will compute a formula for how a+b+c+d+e+f+g (plus or minus a fixed offset) approximates y. Then you can run that formula against the historical data to estimate what those basis levels would have been.

We can do this using Frontline Solver’s popular XLMiner Analysis Toolpak for Excel and Google Sheets. Here‘s a helpful YouTube tutorial for using the XLMiner Toolpak in Google Sheets and understanding regression theory, and here’s a similar video for Excel.

Input Data Formatting

You’ll want to reformat the data with each region’s basis level for a given contract in separate columns (Columns E-K below), then a column for your locally observed values on/near those same dates (Column N). To keep things simple starting out, I decided to make one model that includes December, March, and July rather than treat each contract month as distinct models. To account for local-to-regional changes that could occur through a marketing year, I generated Column L to account for this, and since anything fed into this analysis needs to be numerical, I called it “Delivery Months after Sept” where December is 3, March is 6, and July is 10. I also created a “Days Until Delivery” field (Column M) that could help it spot relationship changes that may occur as delivery approaches.

Always keep in mind: what matters in this model isn’t what happens to the actual basis, but the difference between the local basis and what is reported in the regional data.

⚠️ One quirk with the Google Sheet version- after setting your input and output cells and clicking the OK button, nothing will happen. There’s no indication that the analysis is running, and you won’t even see a CPU increase on your computer as this runs in Google’s cloud. So be patient, but keep in mind there is a hidden 6 minute timeout for queries (this not shown anywhere in the UI, I had to look in the developer console to see the API call timing). If your queries are routinely taking several minutes, set a timer when you click OK and know that if your output doesn’t appear within 6 minutes you’ll need to reduce the input data or use Excel. In my case, I switched to Excel part way through the project because I was hitting this limit and the 6 minute feedback loop was too slow to play around effectively. Even on large datasets (plus running in a Windows virtual machine) I found Excel’s regression output to be nearly instantaneous, so there’s something bizarrely slow about however Google is executing this.

Results & Model Refinement

For my input I used corn basis data for 2018, 2019, and 2020 (where the year is the growing season) which provided 9 contract months and 198 observations of overlapping data between the U of I data and my actual basis records. Some of this data was collected on my own and some was a part of my GeoGrain data purchase.

Here’s the regression output for the complete dataset:

The data we care most about is the intercept (the fixed offset) and the coefficients (the values we multiply the input columns by). Also note the Adjusted R Squared value in the box above- this is a metric for measuring the accuracy of the model- and the “adjustment” applies a penalty for additional input variables since even random data can appear spuriously correlated by chance. Therefore, we can play with adding and removing inputs in an attempt to maximize the Adjusted R Squared.

0.597 is not great- that means this model can only explain about 60% of the movement seen in actual basis data by using the provided inputs. I played around with adding and removing different columns from the input:

DataObservationsAdjusted R Squared
All Regions1980.541
All Regions + Month of Marketing Year1980.596
All Regions + Days to Delivery1980.541
All Regions + Month of Mkt Year + Days to Delivery1980.597
Regions 2, 6, 7, + Month of Mkt Year + Days to Delivery1980.537
Regions 1, 3, 4, + Month of Mkt Year + Days to Delivery1980.488
All Data except Region 31980.588
December Only950.540
March Only620.900
July Only410.935
December Only- limited to 41 rows410.816

Looking at this, you’d assume everything is crap except the month-specific runs for March and July. However, it seems like a good part of those gains can be attributable to the sample size though as shown in the December run limited to the last 41 rows, a comparable comparison to the July data.

Graphing Historical Accuracy

While the error metrics output by this analysis are quantitative and helpful for official calculations, my human brain finds it helpful to see a graphical comparison of the regression model vs the real historical data.

I created a spreadsheet to query the U of I data alongside my actual basis history to graph a comparison for a given contract month and a given regression model to visualize the accuracy. I’m using the “All Regions + Month of Marketing Year” model values from above as it had effectively the highest Adjusted R Square given the full set of observations.

In case it’s not obvious how you would use the regression output together with the U of I data to calculate an expected value, here’s a pseudo-formula:

= intercept + (region1UofIValue * region1RegressionCoefficient) + (region2UofIValue * region2RegressionCoefficient) + ...  + (monthOfMktYear * monthOfMktYearRegressionCoefficient)

Now here are graphs for 2017-2019 contracts using that model:

Here’s how I would describe this model so far:

“60% of the time it works every time”

Keep in mind, the whole point of a regression model is to train it on a subset of data, then run that model on data it has never seen before and expect reasonable results. Given that this model was created using 2018-2020 data, 2017’s large errors seem concerning and could indicate futility in creating a timeless regression model with just a few years of data.

To take a stab at quantifying this performance, I’ll add a column to calculate the difference between the model and the actual value each week, then take the simple average of those differences across the contract. This is a very crude measure and probably not statistically rigorous, but it could provide a clue that this model won’t work well on historical data.

Contract MonthAverage Difference
July 2021<insufficient data>
March 2021<insufficient data>
December 20200.05
July 20200.03
March 20200.04
December 20190.03
July 20190.05
March 20190.03
December 20180.04
– – – – – – – – – – – – – – – – –
July 20180.04
March 20180.05
December 20170.08
July 20170.15
March 20170.06
December 20160.08
July 20160.10
March 20160.06
December 20150.07
July 20150.07
March 20150.04
December 20140.07

The dotted line represents the end of the data used to train the model. If we take a simple average of the six contracts above the line, it is 4.3 cents. The six contracts below the line average 7.6 cents.

Is this a definitive or statistically formal conclusion? No. But does it bring you comfort if you wanted to trust the historical estimation without having the actual data to spot check against? No again.

To provide a visual example, here’s December of 2016 which is mathematically only off by an average of $0.08.

How much more wrong could this seasonal shape be?

Conclusion: creating a single model with 2-3 years of data does not consistently approximate history.

Month-Specific Model

Remember the month-specific models with Adjusted R Squares in the 80s and 90s? Let’s see if the month-specific models do better on like historical months.

Contract MonthAverage Difference (Old Model)Average Difference (Month-Model)
July 2021<insufficient data>
March 2021<insufficient data>
December 20200.050.03
July 20200.030.02
March 20200.040.01
December 20190.030.03
July 20190.050.02
March 20190.030.02
December 20180.040.03
– – – – – – – – – – – – – – – – – – – – – – – –
July 20180.040.04
March 20180.050.03
December 20170.080.05
July 20170.150.11
March 20170.060.06
December 20160.080.05
July 20160.100.08
March 20160.060.08
December 20150.070.04
July 20150.070.04
March 20150.040.03
December 20140.070.05

Well, I guess it is better. ¯\_(ツ)_/¯ Using the month-specific model shaved off about 2 cents of error overall. The egregious December 2016 chart is also moving in the right direction although still not catching the mid-season hump.

Even with this improvement, it’s still hard to recommend this practice as a free means of obtaining reasonably accurate local basis history. What does it mean that July 2017’s model is off by an average of $0.11? In a normalized distribution, the average is equal to the median and thus half of all occurrences are above/below that line. If we assume the error to be normally distributed, that would mean half of the weeks were off by greater than 11 cents. You could say this level of error is rare in the data, but it’s still true that this has been shown to occur, so if you did this blind you have no great way of knowing if what you were looking at was even 10 cents within the actual history. In my basis market at least, 10 cents can be 30%-50% of a seasonal high/low cycle and is not negligible.

Conclusion: don’t bet the farm on this method either. If you need historical basis data, you probably need it to be reliable, so just spend a few hundred dollars and buy it.

Last Ditch Exploration- Regress ALL the Available Data

Remember how Mythbusters would determine that something wouldn’t work, but for fun, replicate the result? That’s not technically what I’m doing here, but it feels like that mood.

Before I can call this done, let’s regression analyze the full history of purchased basis data on a month-by-month basis to see if the R Squared or historical performance ever improves by much. I’ll continue to use the basis data for corn at M&M Service for consistency with my other runs. It also has the most data (solid futures from 2010-2020) relative to other elevators or terminals I have history for.

Here are the top results of those runs:

MonthNumber of ObservationsAdjusted R SquaredStandard Error
December3290.3130.049
March2470.3420.055
July1750.4550.116
All Months Together7510.3050.078

Final Disclaimer: Don’t Trust my Findings

It’s been a solid 6-7 years since my last statistics class and I know I’m throwing a lot of numbers around completely out of context. Writing this article has given me a greater appreciation for the disciplined work in academic statistics of being consistent in methodology and knowing when a figure is statistically significant or not. I’m out here knowing enough to be dangerous and moonlighting as an amateur statistician.

It’s also possible that other input variables could improve the model beyond the ones I added. What if you added a variable for “years after 2000” or “years after 2010” that would account for the ways regional data is reflected differently over time? What about day of the calendar year? How do soybeans or wheat compare to corn? The possibilities are endless, but my desire to bark up this tree for another week is not, especially when I already have the data I personally need going forward.

I’d be delighted for feedback about any egregious statistical error I committed here or if anyone get different results doing this analysis in their local market.

Tutorials

The Fastest Way to Locate 100 Herbicide Labels

We recently discovered that the Enlist tank mix we intended to post spray on our beans was not label approved after running into some gumming issues. Of course what does “label approved” even mean for Enlist where the label doesn’t include any tank mix products and those are instead offloaded to a list on the website where a person could let’s say accidentally be looking at the Enlist One tank mix list back in March and miss the fine print at the top showing that he was on the Enlist One and not the Enlist Duo webpage.

But I digress.

I wanted to definitively comb through all of the chemicals on the Enlist Duo tank mix page and find any other chemicals that contained metolachlor to replace our generic Dual.

A person who knew their chemicals well could probably look down this list and come to a faster conclusion, but for me, using a technical solution was the next best thing.

1. Paste chemical list into Google Sheets

The list formatting of the Enlist webpage caused a blank row to be inserted between every chemical, so a simple alphabetical sort will fix this problem.

2. Create label Google search URL via cell formula

When you visit google.com, enter a search query in the box and go, you’re directed to a page that encodes the search term in the URL. This is simple, well known, and has been Google’s URL format for as long as I can remember.

https://www.google.com/search?q=YOUR+SEARCH+TERM

This makes it possible to share the link to a Google search page and also allowed sites like Let Me Google That for You to land users on a working Google page.

Spaces in a Google query are replaced with a + when encoded in the URL, so we need to run the SUBSTITUTE() function on the chemical names. It’s also helpful to first wrap the chemical name in TRIM() to trim off any spaces at the beginning or end of the name.

=SUBSTITUTE(TRIM(A1)," ","+")

This will turn Moccasin II Plus into Moccasin+II+Plus

We can append this function to the base Google URL which never changes using the & operator.

="https://www.google.com/search?q="&SUBSTITUTE(TRIM(A1)," ","+")

This will now generate a link to search the contents of cell A2 in a Google search, such as https://www.google.com/search?q=Moccasin+II+Plus

Getting close! A few things to append to the search to increase the likelihood of having a label on the first result.

  • Add the words herbicide and label to the end of the search term.
    • I actually didn’t have the word herbicide in mine and realized later I should have added this. There was a big difference between the shipping-themed results for “express label” vs “express herbicide label”
  • Filter to PDF document types, using the type:pdf parameter

Note: the colon in the PDF parameter must be encoded to %3A for the URL.

Append those extra parameters to the end, and you have the final formula

="https://www.google.com/search?q="&SUBSTITUTE(TRIM(A1)," ","+")&"+herbicide+label+type%3Apdf"

3. Bulk open search URLs

You should now have a column full of valid Google search pages.

Drag to select a handful of these links (maximum of 50), right click, select Open Links, and hold onto your butts.

Here’s a few pixels showing this in action.

You should now have a browser tab with a search results page for each chemical selected. In most cases, the label will be the top result since we’re only looking for PDFs.

I also added an extra column to my spreadsheet with a checkbox validation type so I could check off after I reviewed each label. It helped a lot that they were in alphabetical order and that the tabs opened in the same order as the list.

Bottom Line

I created this spreadsheet at 6:42 PM and I viewed the last of 99 labels at 7:14 PM. That 32 minutes includes all the setup and formula creation, along with jotting notes on some active ingredients. If you were on a decent internet connection and wanted to glance through active ingredients on a big list of labels, I think you could easily push 8-10 labels a minute.

This also may not be the exact thing you need to do, but knowing how to automate the opening of Google search pages can come in handy for all kinds of research.

Tutorials

Fixing Trimble EZ Guide Plus “Low Accuracy No Corrections” in 2021

⚠️This guide is out of date after the 2022 changes to WAAS satellites 135 and 138. Click here for an updated guide.

The Trimble EZ Guide Plus was our farm’s first entrance to precision guidance in the mid-2000s. While we’ve since moved several tractors to a GreenStar autosteer system, the lightbars have stuck around for spraying and some odd tillage for their simplicity and affordability.

The last few years, keeping these running has felt like a stubborn insubordination to obsolescence as dealers and neighbors tease us for using such old guidance; but here we are, adjusting satellites and finding some runway to squeeze a bit more life out of what we have.

Assumptions

The scope of this guide is pretty narrow: we had one lightbar with working corrections, one without. I changed all the corrections to match the working one and that fixed the issue. So this is a recipe that I superstitiously follow without fully understanding the theory of specific satellites. For example, you probably don’t need to worry about MSAS (looks to be a Japanese corrections network) but who I am to question success?

Our equipment:

  • EZ Guide Plus
  • v4.11
    • I’m told this is the last version made for these
  • Hurricane L1 Antenna

Instructions

  1. Navigate to Config menu > GPS > Corrections > Edit
    • You can get to the Config menu by scrolling all the way down on the icons along the right side of the run screen. It’s a wrench icon.
  2. You should see Edit WAAS, Edit EGNOS, etc. You’ll need to enter each of these pages and set them to the following values.
    • In each of these pages, scrolling up or down will change the value in the dropdown, pressing OK will advance you down the list.
  3. Edit WAAS
    • PRN-122: Off
    • POR: Off
    • PRN-135: Off
    • PRN-138: On
  4. Edit EGNOS
    • AOR-E: Heed Hlth
    • IOR: Off
    • IOR-W: Heed Hlth
    • ARTEMIS: Off
  5. Edit MSAS
    • MTSAT-1: Ignore Hlth
    • MTSAT-2: Ignore Hlth
  6. Edit Other
    • (everything should be off)

Back up and out of the menu, and see what happens. ¯\_(ツ)_/¯

Best of luck!

Projects

The Hottest (and Coldest) Farm Data Tool: IOT Temperature Logger

This project started while my wife and I were managing spring frost protection in our strawberry patch and trying to better understand how the temperature experienced at the bloom level compares to the air temperature reported by local weather stations on Weather Underground. In the year that has passed since building this temperature logger for that purpose, I’ve encountered several other scenarios where remotely monitoring a temperature can be really helpful:

  • Air temperature in a well pump house during a cold snap
  • Field soil temperature during planting season
  • Air temperature in a high tunnel to guide venting decisions

Whatever you might use this for, the fundamentals are application agnostic and the microcontroller board from Particle used in this project can be connected to a wide array of sensors if you’d like to expand down the road. This tutorial lays out of the basics to get up and running with a single temperature probe.

Waterproof DS18B20 temperature probe laying amongst strawberry plant

Particle: Internet-Connected Arduino

If you’re not familiar, Arduino is an open source, single board microcontroller with numerous general purpose input/output (GPIO) pins that allow for easy interfacing between the physical world and code. It might seem rudimentary, but working at the physical/virtual boundary can be really satisfying; of course code can make an object on screen move, and of course touching a motor’s wiring to a battery can make it move, but the ability to use code to manipulate something in the physical world is a special kind of thrill. Absent expansion modules, an Arduino is a standalone board whose software is loaded via a USB cable and cannot natively publish data to the internet.

Particle essentially takes the Arduino hardware, adds a wifi (or cellular) internet connection, and marries it to their cloud software to create an easy to use, internet connected microcontroller. Not only does this make it easier to upload your code, but you can publish data back to the Particle Cloud, where it can trigger other web services.

Components

What you’ll need for this project:

  • Particle development board
    • I used a Photon, which is the older wifi-only board (now replaced by Argon). Particle has since added a free cellular plan for up to 100,000 device interactions per month (~135/hour), plenty for simple data logging. Unless you know you’ll always be near a wifi network, I’d now recommend a cellular-capable model like the Boron.
  • Outdoor temperature sensor probe
  • 4.7 kΩ resistor
  • Breadboard
  • External antenna (wifi or cellular, depending on your Particle hardware)
    • I used these 2.4 GHz antennas for wifi on the Photon. The threaded base and nut make them easy to install through an enclosure wall.
  • Enclosure
    • I used a plastic ammo case which worked well
  • Power supply
    • I used a USB charger block

Additional supplies to consider:

  • A weather-resistant way to connect the power supply:
    • I ran some lamp wire through a small hole in the bottom of the box, then added plugs to each end.
  • Extension to the temperature probe wire
    • I used Cat5 cable
  • Do you need to keep the box upright and out of mud/water?
    • I added some 4″ bolts as legs/stakes to secure the box upright and still ~2″ off the ground. This also allowed me to run the cables out of the bottom where they wouldn’t be subjected directly to rain
  • Silicone/caulk to seal holes

Setup

Particle has a great Quickstart guide for each model of board, so I won’t reinvent the wheel explaining it here.

Also spend some time on Particle’s Hardware Examples page to acclimate yourself with the setup() and loop() structure of the code, how to read and write GPIO pins, the use of delays, etc.

Hardware Configuration

Schematic
Breadboard wiring with jumper cables to consolidate probe wiring in 3 adjacent pins. Note: the Cat5 wiring I used to extend the probe didn’t color-coordinate perfectly. I chose to map Brown=Red, White/Yellow=Yellow, Blue=Black.
The black wire running out of the bottom right corner of the Particle board is the antenna.

As a practical matter, you’ll also want to consider how the probe wire is entering the box and how to provide strain relief to prevent it from (easily) pulling out of the breadboard. In my case, I made one large arc with the Cat5 cable which was anchored to the wall with several zip ties. I sealed these zip tie holes as well as the cable holes on the bottom with silicone.

Software Configuration

First, we’ll need to read the temperature from the probe and publish it to Particle’s cloud. Secondly, we’ll need to configure the Particle cloud to publish that data to a ThingSpeak dashboard for collection and graphing.

More buck-passing: I followed this tutorial from Particle, but I’ll list all my steps here as they diverge in the use of ThingSpeak.

Particle

The DS18B20 thermometer requires use of the 1-Wire protocol which is understood but the author in theory if not in practice. 1-Wire’s setup and error checking takes up the majority of the code.

Paste the following code into your Web IDE:

// This #include statement was automatically added by the Particle IDE.
#include <OneWire.h>


OneWire ds = OneWire(D4);  // 1-wire signal on pin D4

unsigned long lastUpdate = 0;

float lastTemp;

void setup() {
  Serial.begin(9600);
  
  pinMode(D7, OUTPUT);
}


void loop(void) {
  byte i;
  byte present = 0;
  byte type_s;
  byte data[12];
  byte addr[8];
  float celsius, fahrenheit;

  if ( !ds.search(addr)) {
    Serial.println("No more addresses.");
    Serial.println();
    ds.reset_search();
    delay(250);
    return;
  }

  // The order is changed a bit in this example
  // first the returned address is printed

  Serial.print("ROM =");
  for( i = 0; i < 8; i++) {
    Serial.write(' ');
    Serial.print(addr[i], HEX);
  }

  // second the CRC is checked, on fail,
  // print error and just return to try again

  if (OneWire::crc8(addr, 7) != addr[7]) {
      Serial.println("CRC is not valid!");
      return;
  }
  Serial.println();

  // we have a good address at this point
  // what kind of chip do we have?
  // we will set a type_s value for known types or just return

  // the first ROM byte indicates which chip
  switch (addr[0]) {
    case 0x10:
      Serial.println("  Chip = DS1820/DS18S20");
      type_s = 1;
      break;
    case 0x28:
      Serial.println("  Chip = DS18B20");
      type_s = 0;
      break;
    case 0x22:
      Serial.println("  Chip = DS1822");
      type_s = 0;
      break;
    case 0x26:
      Serial.println("  Chip = DS2438");
      type_s = 2;
      break;
    default:
      Serial.println("Unknown device type.");
      return;
  }

  // this device has temp so let's read it

  ds.reset();               // first clear the 1-wire bus
  ds.select(addr);          // now select the device we just found
  // ds.write(0x44, 1);     // tell it to start a conversion, with parasite power on at the end
  ds.write(0x44, 0);        // or start conversion in powered mode (bus finishes low)

  // just wait a second while the conversion takes place
  // different chips have different conversion times, check the specs, 1 sec is worse case + 250ms
  // you could also communicate with other devices if you like but you would need
  // to already know their address to select them.

  delay(1000);     // maybe 750ms is enough, maybe not, wait 1 sec for conversion

  // we might do a ds.depower() (parasite) here, but the reset will take care of it.

  // first make sure current values are in the scratch pad

  present = ds.reset();
  ds.select(addr);
  ds.write(0xB8,0);         // Recall Memory 0
  ds.write(0x00,0);         // Recall Memory 0

  // now read the scratch pad

  present = ds.reset();
  ds.select(addr);
  ds.write(0xBE,0);         // Read Scratchpad
  if (type_s == 2) {
    ds.write(0x00,0);       // The DS2438 needs a page# to read
  }

  // transfer and print the values

  Serial.print("  Data = ");
  Serial.print(present, HEX);
  Serial.print(" ");
  for ( i = 0; i < 9; i++) {           // we need 9 bytes
    data[i] = ds.read();
    Serial.print(data[i], HEX);
    Serial.print(" ");
  }
  Serial.print(" CRC=");
  Serial.print(OneWire::crc8(data, 8), HEX);
  Serial.println();

  // Convert the data to actual temperature
  // because the result is a 16 bit signed integer, it should
  // be stored to an "int16_t" type, which is always 16 bits
  // even when compiled on a 32 bit processor.
  int16_t raw = (data[1] << 8) | data[0];
  if (type_s == 2) raw = (data[2] << 8) | data[1];
  byte cfg = (data[4] & 0x60);

  switch (type_s) {
    case 1:
      raw = raw << 3; // 9 bit resolution default
      if (data[7] == 0x10) {
        // "count remain" gives full 12 bit resolution
        raw = (raw & 0xFFF0) + 12 - data[6];
      }
      celsius = (float)raw * 0.0625;
      break;
    case 0:
      // at lower res, the low bits are undefined, so let's zero them
      if (cfg == 0x00) raw = raw & ~7;  // 9 bit resolution, 93.75 ms
      if (cfg == 0x20) raw = raw & ~3; // 10 bit res, 187.5 ms
      if (cfg == 0x40) raw = raw & ~1; // 11 bit res, 375 ms
      // default is 12 bit resolution, 750 ms conversion time
      celsius = (float)raw * 0.0625;
      break;

    case 2:
      data[1] = (data[1] >> 3) & 0x1f;
      if (data[2] > 127) {
        celsius = (float)data[2] - ((float)data[1] * .03125);
      }else{
        celsius = (float)data[2] + ((float)data[1] * .03125);
      }
  }

  // remove random errors
  if((((celsius <= 0 && celsius > -1) && lastTemp > 5)) || celsius > 125) {
      celsius = lastTemp;
  }

  fahrenheit = celsius * 1.8 + 32.0;
  lastTemp = celsius;
  Serial.print("  Temperature = ");
  Serial.print(celsius);
  Serial.print(" Celsius, ");
  Serial.print(fahrenheit);
  Serial.println(" Fahrenheit");

  // now that we have the readings, we can publish them to the cloud
  String temperature = String(fahrenheit); // store temp in "temperature" string
  digitalWrite(D7, HIGH);
  Particle.publish("temperature", temperature, PRIVATE); // publish to cloud
  delay(100);
  digitalWrite(D7, LOW);
  delay((60*1000)-100); // 60 second delay
}

Several things to note:

  • D4 is declared as the data pin in line 5.
  • The first argument in Particle.publish() can be any string label you choose. This event name is what you’ll use in the Particle cloud to identify this piece of data.
  • D7 on the Photon I’m using is the onboard LED. I set it to blink for at least 100 milliseconds each time a temperature is published. This helps in debugging, as you can verify by watching the device if there is a temperature reading being taken.
  • I’m publishing a temperature every 60 seconds. The delay function takes in milliseconds, hence the 60 * 1000. To be petty, I subtract 100 milliseconds from the delay total to account for the 100 milliseconds of D7 illumination I added above.

Once you flash the above code to your Particle device, navigate to the Events tab in the Particle console. You should start seeing data published at whatever interval and name you configured.

Software- ThinkSpeak

Once the data is publishing to Particle’s cloud, you can go many places with it. I chose to use ThinkSpeak so I could easily view the current temperature from a browser and see a simple historical graph.

Set up a free ThinkSpeak account at thingspeak.com

Create a new Channel and label Field 1 as temperature

After creating the channel, click on API Keys. Make note of the Channel ID and Write API Key.

Back at the Particle Console, click on the Integrations tab.

Click New Integration, and select Webhook.

Webhook fields:

  • Event Name: the event name to trigger this web hook- whatever you set as the first argument in the Particle.publish() function
  • URL: https://api.thingspeak.com/update
  • Request Type: POST
  • Request Format: Web Form
  • Device: can be left to Any, unless you only want specific devices in your cloud to trigger this webhook

Click on Advanced settings, and under Form Fields, select Custom.

Add the following pairs:

  • api_key > your api key
  • field1 > {{{PARTICLE_EVENT_VALUE}}}
Completed webhook form

Save this and navigate back to ThingSpeak. You should start seeing data publishing to your channel.

ThingSpeak Data Display Options

A ThingSpeak channel can be populated with Visualizations and Widgets which can be added with the corresponding buttons at the top of the channel. A widget is a simple indicator of the last known data value, while visualizations allow for historical data, *ahem*, visualization. You can also create your own MatLab visualizations if you want to do more sophisticated analysis or charting.

Chart Visualization

The Field 1 Chart should be added to channel automatically- if not, click Add Visualization and choose the Field 1 Chart. This visualization can be customized by clicking the pencil icon in the upper right corner and has a lot of configurable options:

  • Title: chart title that appears directly above graph (there’s no way to change the “Field 1 Chart” in the blue bar)
  • X-Axis: text label for x axis
  • Y-Axis: text label for y axis
  • Color: hex color code for chart line/points
  • Background: hex color code for chart background
  • Type: choice between:
    • Line (shown in my example)
    • Bar
    • Column
    • Spline
    • Step
    • Not sure what Spline and Step do as they look the same as the Line graph, so you’ll have to play around and see what you prefer
  • Dynamic: not sure what this does, I see no effect when toggling
  • Days: how many days of historical data to show (seems to be capped at 7 days regardless of larger values being entered)
  • Results: limit the chart to a certain number of data points. If you collect data once a minute, limiting to 60 results would give you one hour of data. Leave this field blank to not apply result limit.
  • Timescale / Average / Median / Sum:
    • These are mutually exclusive ways of aggregating the data together into fewer graph points. Timescale and Average seem to be the same, so effectively you’re choosing between the average, median, and sum of every X number of points. This simplifies the graph compared to showing 7 days worth of minutely data.
    • I like to use Median with 60 points to get a median hourly data. I chose median because occasionally for unknown reasons, the Particle will report of temperatures of several thousand degrees which is obviously an error. When using average I would see spikes on any hours with such an error, whereas median eliminates that pull.
  • Rounding: number of decimal places to display when clicking on a point
  • Data Min: The minimum value that should be included in the data set. Any data points lower than this won’t be attempted to be graphed. This also helps with the bizarre temperature error I mentioned above, so I often set this to temperature ranges safely at the far edges of what would be normal for my application
  • Data Max: Opposite of data min, sets an upper bound for included data
  • Y-Axis Min: Impacts the visual rendering of the y axis. Leave blank to automatically adjust the axis according to the data being graphed.
  • Y-Axis Max: Opposite of Y-Axis Min, sets upper bound for y-axis rendering

Numeric Display Widget

The numeric display will simply show the last recorded value as well as an indicator for how old that data is. I made this the first widget on the channel so I can see right away what the “current” value is.

This widget has several configurable options, like the name that will appear in the blue title bar, which field is being displayed, and how often the widget should refresh with new data. I set this to 15 seconds which is plenty often for data that should only change every 60 seconds. You can also round to a certain number of decimal places.

Gauge Widget

The gauge widget adds some visual context for the data and allows you to customize numerical ranges and colors associated with ranges. Here’s an example of a gauge with color ranges for freezing or near freezing temperatures.

Configurable options for this include the widget title, minimum and maximum gauge range, and start/stop values for color bands.

Lamp Widget

This is the most limited display type, but I guess its cute. It allows you to turn on or off a virtual light bulb above/below a given threshold. The logic is very similar to the gauge but simpler, so I won’t spend time re-explaining the functionality.

Adding Additional Sensors

Every step in this data pipeline (the Particle hardware, the Particle Cloud, and ThingSpeak’s channels) are ready to accept multiple pieces of data if you want to add more sensors to the box.

It’s easy to imagine some of the additional data that could be added to the board and collected.

In the future, I’d like to buy a Boron board with cellular connectivity and make a standalone box with a solar panel and battery that could be flexibly placed anywhere. I’m also interested in the idea of wiring up the board to a handful of plugs through the side of the box and being able to easily add and remove sensors based on the need of the use. I’ll keep you posted if I do this.

If you have any use case ideas or want to share your own project or implementation, please leave a comment or get in touch!

Quick Tips

Quick Tip: Using a Smartwatch to View a Remote Camera

There are a handful of reasons I enjoy farming with an Apple Watch: viewing notifications and taking phone calls with dirty hands, glancing at the weather forecast with the turn of a wrist, and quantifying my activity level on physically active days. Perhaps the most fun is using the camera remote app as a remotely viewable camera.

Camera Remote app, center

The camera remote app is designed for what it says- triggering the iPhone’s camera remotely, like you would for a self portrait. The interface displays a live view of the camera with a trigger button and timer controls. Because the live view fills the screen to a reasonable size and can be viewed within the watch’s operating range (up to 50′ line of sight in my tests) this is great way to add a camera at an important angle and watch it on your wrist.

Obvious use case: backup camera

I mostly use this as a backup camera when I’m alone without a feedback buddy. The first time I used this was actually while hooking up to a gooseneck trailer; my truck toolbox obscures a direct view of the ball socket, so while I could horizontally center from the cab, I found myself getting in and out of the truck to achieve the proper front/back alignment. The wheel well provided an ideal place to prop the phone at an angle and the signal strength was sufficient going through a toolbox and the back of the truck cab.

Rolling the digital crown on the Apple Watch will zoom the camera, and you’ll see a zoom indicator at the top of the screen. If you’re precise about your camera placement, this could allow for narrowing in on a specific detail.

Other uses

One of the most practical non-backup use cases I’ve encountered is in starting the irrigation system at the orchard where we have a gas powered pump near the pond intake and a pressure gauge at the stand pipe. Different types of drip line require different operating pressures and it can take a few minutes to push the air out of the system. Pointing the phone at the stand pipe pressure gauge and remaining stationed at the pump prevents any huge pressure fluctuations and saves a few feedback trips between the pond edge and stand pipe located up bank.

Limitations

  • Sometimes it can be challenging to position your phone at the proper angle, especially in a place it won’t get damaged
  • This is a digital video stream, and thus is subject to lag and caching. It’s best to move a bit, then wait for the view on the watch to update to get a feel for the lag and make sure you’re not seeing an outdated view.
  • There is at least a ~20 second fixed cost to opening the app, making sure it connects, and placing the camera in the right location. Consider the needs of the job and how much time you save doing this vs walking back and forth.

Non-Apple Watches

I have no direct experience with other brands of smart watches. I’m sure this type of functionality is possible through third party apps and may even be available out of the box via a similar remote app. If anyone has experience with other watches, let us know in the comments below.

Projects

Tools for Quantifying Historical Basis

Early in 2020, I was facing my first year of managing my own acreage and subsequently marketing the grain (what a roller coaster I was in for) so I became a sponge for any advice I could find. I spent some time reading through Nick Horob’s thoughts on the Harvest Profit blog and I joined the Grain Market Discussion Facebook group. One theme I repeatedly encountered was the importance of separating the futures and the basis portion of a sale. This was enabled by “understanding your local historical basis patterns”. I had no clue what was typical in our area, so I jumped on the challenge of obtaining a data set to assist in this.

Over a year later this is still a work in progress, but I’ll report what I’ve learned thus far in my quest to quantify the local basis market.

Free Data Sources

USDA Data

The USDA’s Ag Marketing Service has basis data available via this query tool for the following locations:

  • Augusta, AR
  • Blytheville, AR
  • DeWitt, AR
  • Dermott, AR
  • Des Arc, AR
  • Helena, AR
  • Jonesboro, AR
  • Little Rock, AR
  • Old Town/Elaine, AR
  • Osceola, AR
  • Pendleton, AR
  • Pine Bluff, AR
  • Stuttgart, AR
  • West Memphis, AR
  • Wheatley, AR
  • Wynne, AR
  • East Iowa, IA
  • Iowa, IA
  • Mississippi River Northern Iowa, IA
  • Mississippi River Southern Iowa, IA
  • Mississippi River Southern Minn., IA
  • North East Iowa, IA
  • North West Iowa, IA
  • Southern Minnesota, IA
  • West Iowa, IA
  • Central Illinois, IL
  • Illinois River North of Peoria, IL
  • Illinois River South of Peoria, IL
  • Central Kansas, KS
  • Western Kansas, KS
  • Louisiana Gulf, LA
  • Duluth, MN
  • Minneapolis, MN
  • Minneapolis – Duluth, MN
  • Minnesota, MN
  • Kansas City, MO
  • Nebraska, NE
  • Portland, OR
  • Memphis, TN

Technically, this data does allow for future basis as the Delivery Period column contains a few rows for “New Crop”, but most of the data I’ve seen from this is “Cash” (spot price), so it won’t provide a sense of how the basis changes as the delivery period approaches. The geographic coverage is also hit or miss- congrats to everyone who lives in Arkansas; those of us in the large and ambiguous area of “Central Illinois” envy your data granularity.

Bottom line: if you farm near an unambiguous location and just need a continuous chart, this would be a good data source for you.

University of Illinois Regional Data

One of the first resources I came across were these regional datasets (corn, soybeans) from the University of Illinois which go back to the 1970s. This is tracked across 7 regions in Illinois and does a slightly better job of getting closer to your local market:

  • Northern
  • Western
  • N. Central
  • S. Central
  • Wabash
  • W.S. West
  • L. Egypt

Unlike the continuous spot data from the USDA, this data tracks 3 futures delivery periods (December, March, July) starting in January of the production year. This provides as much as ~80 weeks of July basis futures, for example.

This regional data is likely to be more smoothed out than what you’d see at any specific elevator, so it’s challenging to make very qualitative decisions with it directly. I’ve attempted to calibrate this data to my local market using regression analysis and haven’t been that comforted with the accuracy of the results. I’d only recommend this dataset as a way of generalizing the shape of the seasonal trend without looking at the actual numeric levels.

Bottom line: this data shows how basis changes as delivery approaches in several contract months, but is limited to Illinois and the regional average isn’t likely precise to your local elevators.

Purdue University Crop Basis Tool

For those looking for similar regional data not limited to Illinois, this web-based dashboard from Purdue University offers similar data for Indiana, Ohio, Michigan, and Illinois. This is also very user friendly and allows you to dig into the data from a web browser without the need for opening a spreadsheet.

I did a bit of spot checking and found the values for my county to be way off, so I think that falls into the same issues as the U of I data in terms of regional smoothing and an offset. It’s also easy to have a false expectation of precision given that you select your county but it just does a lookup for the appropriate region.

Bottom line: this is most user-friendly free tool I’ve found for viewing basis data and comparing to historical averages, but is limited to 4 states and is still regionally averaged so won’t match values at your local elevators.

DIY Data Collection

What’s the surefire way to get 5 years of data on anything? Start collecting it today, and you’ll have it in 5 years. This is the slow-go approach but one I’ve seen repeated by other farmers who track basis- I’ve asked around thinking somebody has some great off the shelf dataset, and often you’ll find people just check prices weekly and dump it in a spreadsheet.

My process is to take phone screenshots of cash bids twice a week (I chose Monday and Thursday). This is a very fast and easy way of capturing the data that I can keep up with even in the busy season. I then go through these screenshots when I have time each week or so and type the data into a spreadsheet. This only takes me about 10 minutes a week to populate since the contract months and locations don’t change that frequently and can be copied and pasted from the previous entry.

Another route you could go (depending on your elevators’ participation) is to sign up for email cash bids. In my case, not everywhere I wanted to track offered this, so if I was going to screenshot one, I might as well do that for my whole process. (If you go that route, you may want to set up a different email inbox and a forwarding rule to save yourself the spamming.)

Based on this normalized data, I set up a few pages with pivot tables and charts. My primary goal is to understand when to make cash sales vs HTAs, and when to then set the basis on those HTA; that’s what I hope to visualize with this main chart below.

You’ll also notice a few points from past years- I dug out old cash sale records for our farm and backed into the basis by looking up the board price for that day on these historical futures charts. It wasn’t documented what time of day the sale was made so there could be some slippage, but I excluded any days where there was a lot of market volatility figuring that it would be better to have less data than wrong data.

I also created this page to chart basis spreads between any two locations, in this case frequently viewed between a local elevator and a river terminal.

The great thing about having your own data is the flexibility to query it however you want. I’m sure as needs change I’ll make different types of reports and queries that answer questions I have yet to ask.

Here’s a link to a blank copy of my Google Sheet for you to use and branch off of as you wish. Things might look funky at first without data, but once you populate the Raw Data sheet a bit all the queries and dropdown should work.

Bottom line: You won’t see quick results, but collecting your own basis data is free, not that hard to do, and ensures the data is precise to your area. It’s also the ultimate insurance policy against changes in the availability of an outside data set.

Paid Data Sources

GeoGrain- Dashboard Subscription

GeoGrain is a paid subscription dashboard that tracks cash bids and basis levels at an elevator-specific level. I found most of the elevators in my area listed there and had decent luck viewing historical data.

One of GeoGrain’s strength is the Optimizer, a tool within the app where you can plug in various trucking expenses and get real-time recommendations on where to most effectively deliver grain. You can also compare an elevator’s basis to the regional average.

This would be great for someone who wants to shop around for the best basis with every sale they make or truckload they spot sell. In my case, I’m more curious about setting basis for HTAs contracts I have at a few elevators I regularly work with. You can also view current basis relative to a simple average of the last X years which can be configured in your account settings, as well as a regional average of X miles away, also configurable in your account settings. I would personally like to see a bit more granularity and customization in the historical comparison where you could pick what specific years you want to compare to that might match a common theme (export-driven bull market year, drought year, planting scare year, etc), or do some technical analysis on the basis chart.

GeoGrain does offer a 14 day free trial, so definitely check it out and see if it will work for you.

Bottom line: for $50/month, GeoGrain gives you elevator-specific basis levels (current and historical) in a web dashboard. The reports are geared toward the seller who shops around, but may not provide enough in-depth querying for a specific merchant.

GeoGrain- Historical Data Purchase

In addition to their dashboard service, GeoGrain also sells historical datasets for use in your own analysis tools. The first ~300,000 days of data is a flat $500, then there’s a variable fee for data sets larger than that.

This is daily data and includes the basis and cash price (although I was told it only included basis when I purchased, so perhaps cash price is not guaranteed for all locations). So figuring 252 trading days in a year, 6 delivery periods bid at a time (let’s say), and 2 commodities, an elevator would produce about 3,000 data points per year. I doubt they have 100 years of data on any elevator, so to make the best use of your $500 you’ll want to select several locations and can certainly gain a lot of geographic insight from doing so.

I did end up buying historical data this way and imported it into my DIY data entry spreadsheet to make my historical comparisons come alive. I haven’t yet found a preferred technical analysis method for making quantitative decisions against this data, but that’s my next step.

Bottom line: if you want to analyze basis in your own tool and it’s not worth waiting to acquire it yourself, this is a straightforward process for buying historical data.

DTN ProphetX

ProphetX is primarily a trading and charting suite but they have been advertising an upcoming feature for basis charts and county averages. This is a part of ProphetX Platinum, which runs $420 / month according to a sales representative I spoke with. This doesn’t pencil out well for my intended use especially compared to a one-time purchase above, so I didn’t pursue it much further.

What works for you?

If there’s anything you should take from this, it’s that I’m still learning and always on the lookout for a better option. If you have a different insight about anything in this article, I’d genuinely like to learn about it and incorporate it in my practices. Leave a comment below or get in touch with me.