2

I am trying for the first time to use make to run a series of scripts. I have a tree dir structure like this:

project
├── data
│   └── run1
│       └── pass
│            ├── 0
│            ├── 1
│            └── 2
├── include
│   └── variables.mk
├── Makefile
└── scripts
    └── operations.sh

I have one dataset, run1, which has multiple dirs in pass, all of which hold txt files. The operations.sh script uses a method that recursively searches a named dir (here pass). I expect more data (run2, run3 etc), and would like to be able to process them in the same way when the data is available. To this end I use include on variables.mk (not sure if this is appropriate but it works fine), defining $(INPUT_RUNS), which I will simply update as new runs arrive.

I have written a test Makefile

include $(CURDIR)/include/variables.mk
DATA_DIRS := $(addprefix $(CURDIR)/data/, $(foreach r, $(INPUT_RUNS), $(r))/pass)
OUT_DIRS := $(addprefix $(CURDIR)/analysis/, $(foreach r, $(INPUT_RUNS), $(r)))

##targets 
all: operations_run 

##operations
operations_run: $(OUT_DIRS) $(DATA_DIRS)
  mkdir -p $</operations
  sh scripts/operations.sh $</operations $(DATA_DIRS)

This specifies a set of dirs (data, analysis per run). I can then make a target with which to run operations.sh. This works fine. But it doesn't actually use make properly to my mind. I want to make the output, and then if rerunning make, not regenerate the output if no part of the data or analysis has changed.

My question therefore: generally, a target is a file. The operations.sh script runs a method not developed by me, and has particular rules about input and output (both are dirs as seen). I would like to make the target a set of files produced by operations.sh. I would like it to work something like

%.output.txt: $(DATA_DIRS)
    sh operations.sh $< > $@

I think I understand how to use % to name the dependencies, though haven't tested this. Can I give $(DATA_DIRS) as dependency, while makeing the target files? Conceptually I have no idea where to start on that aspect.

Any help is very much appreciated.

4
  • It would be a lot simpler if you told us something about the changes that are possible in "the data or analysis". Do you mean that txt files can be added, deleted and modified, or is it more that that?
    – Beta
    Commented Jul 20, 2017 at 1:49
  • note that make is peculiar about spaces in functions. The spaces after commas will be treated as additional empty input. e.g. I think DATA_DIRS := $(addprefix $(CURDIR)/data/,$(foreach r,$(INPUT_RUNS),$(r))/pass) is what you intend.
    – bdecaf
    Commented Jul 20, 2017 at 6:38
  • @beta, in the analysis, the txt files are reformatted by the script which checks for 'barcode' strings and sorts lines in txt into one of multiple (usually 8 but up to 12) dirs accordingly. The initial input txt are unchanged. While I have one run currently (run1) I expect more but that data will be in the same format and undergo the same analysis. Commented Jul 20, 2017 at 7:54
  • @bdecaf I had included a test to echo the dirs, and that have no spaces, but it is something to bear in mind Commented Jul 20, 2017 at 7:54

2 Answers 2

1

I think the rule should be like:

$(CURDIR)/analysis/%: $(CURDIR)/data/%/pass
   sh operations.sh $< $@

Basically having a rule for individual outdirs. Supposing operations takes an input and output directory.

Though that may "think" the output directory was altered for whatever reasons (think temporary files from viewer or so). Personally I like to manually "stamp" a completion.

$(CURDIR)/analysis/%/.done: $(CURDIR)/data/%/pass
   sh operations.sh $< $(@:/.done=)
   touch $@

This will put an empty .done file in the outdir with the timestamp of last successful creation.

and

operations_run: $(addsuffix /.done,$(OUT_DIRS))

to run the whole set.

3
  • This method makes sense to me. Just to clarify: I presume $(@:/.done=) removes /.done from the target allowing the script to take target dir as input (nice tip). The operations_run target is then added to all: rule to make all aware of what to work on. I had to change addpostfix to addprefix for some reason before it would work (?!), but it does now. Can I ask, is it common to stamp a file like this? I can see myself using it as a standard when I am not using files as targets. Thanks for your help. Commented Jul 20, 2017 at 12:05
  • stupid me. it was addsuffix. I will fix it in the answer. Personally I use this touch often when my scripts don't produce a simple result. The only downside I see is that it may get confusing. Whether it is common I don't dare to say - I still think about it as hack.
    – bdecaf
    Commented Jul 20, 2017 at 12:29
  • and yes - you are right about $(@:/.done=). There would be several ways to do it, but none is intuitive.
    – bdecaf
    Commented Jul 20, 2017 at 12:32
1

If you do not know what the real inputs of your script are, it will be difficult to tell make whether a particular target must be re-done or not. Make compares last modification times of target files and prerequisite files. Directories are more difficult to use for this analysis because a directory's last modification time has a different meaning: it changes when files are added or deleted, not when the content of a file changes. You should first understand what the real inputs of your script are and then express dependencies between output files and input files.

Note: you can tell make to build the list of directories to process with something like:

DATA_DIR := $(shell find $(CURDIR)/data -type d -name pass)
2
  • I suppose I do not know the real inputs, there are a lot of files and they vary between different runs. Once the run is complete those data do not change, however. I do know that the script will return a set of dirs, and within them there are a set of files, again once written these will not change, and can be cat'd together to make a single file from which I hope to proceed in the Makefile. I think that the solution of @bdecaf makes sense, as it is 'run' based in the first instance, and adding the ./done file means the target is a file (which is sensible). Commented Jul 20, 2017 at 11:11
  • Neat tip also on using shell call to define extant dirs for $(DATA_DIRS) Commented Jul 20, 2017 at 11:12

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.