To keep track of what files have been seen before, Splunk stores a checksum of the first 256 bytes of each file it sees. This is usually plenty as most files start with a log message, which is almost guaranteed to be unique. This breaks down when the first 256 bytes are not unique on the same server.
I have seen two cases where this happens, as follows:
- The first case is when logs start with a common header containing information about the product version, for instance:
================================================================ == Great product version 1.2 brought to you by Great company == == Server kernel version 3.2.1 ==
- The second case is when a server writes many thousands of files with low time resolution, ...