Monday, August 17, 2009

Splitting a directory based on first character in filename

This is a bash small script I whipped up:

#!/bin/bash

limit=26;

function do_move
{
if [ "x$selected" != "x" ]; then
if [ "$first" == "$last" ]; then
dir="$first"
else
dir="$first-$last"
fi
mkdir "$dir";
for j in $selected; do
j2=`echo "$j" | tr "[:upper:]" "[:lower:]"`
for k in $j* $j2*; do
if [ "$k" != "$dir" -a -f "$k" ]; then
mv "$k" "$dir/"
fi
done
done
fi
}

initials=`for i in *; do echo "$i" | tr "[:lower:]" "[:upper:]" | sed "s/^\(.\).*/\1/"; done | sort | uniq`;

cur=0;

first="";

selected="";

for i in $initials; do
if [ "$first" == "" ]; then
first=$i;
fi
selected="$selected $i"
numfiles=`ls|grep -i "^$i\+"|wc -l`;
(( cur = cur + $numfiles ));
last=$i
if [ $cur -ge $limit ]; then
do_move
cur=0;
first="";
selected="";
fi
done

do_move


It splits a directory into smaller directories based on the first character in the filename. An example:

$ for i in `seq 1 1000`; do touch `pwgen -n -c 10 1`; done
$ dirsplit
$ ls
A B-C D-E F-G H-I J-K L-M N-O P Q-S T U V-W X-Y Z


As you see, it splits into ranges when it can fit few enough files into one directory.
$limit as defined at the top of the script controls the number of files that should be exceeded for the script to deem it necessary to create another directory; that is, $limit is not a hard limit, there is no hard limit, and there can't be, unless you start splitting on more than the first character. In any case, adjust as necessary.
The algorithm currently used is greedy and likely quite non-optimal, but it works for me.

The script was made to be able to split directories with many files on the CF card of my NES PowerPak (which is a fabuluous creation, but that's another story).

No comments:

Post a Comment