CREG Journal Search Engine
- Development
  Version 0v28, build date Sun 12-Oct-2025 11:25:33, by
  David Gibson
 
- Warning: This page
  is under development and is likely to contain strange bugs and poor
  documentation. Please report anything odd or incorrect. Please
  include, in your report, the date and time of your search, your IP
  address (which is reportedly 216.73.216.211)
  and a description of what happened. Just saying "it didnt work" is not
  particularly helpful!
  
 
Back to top of page 
Results of Search (still searching...)
This page may take a few seconds to
load. Please wait ...
   
DEVELOPMENT NOTE: If you think something odd is
  happening, try forcing a page refresh by typing (probably) CTRL-SHIFT-R - but
  check your browser documentation. Reason: Your browser might be relying
  on old cached copies of JS or CSS files that have been modified recently,
  during development, but which your browser has decided not to download.
  Browsers are capricious like that.  
  - Plain Text Searches
 
  - Wildcard Searches
 
  - Boolean Searches
 
  - Regular Expression Searches
 
  - The Search Algorithms
 
  - Regex Conversions
 
 
  - 
	 
Plain Text Searches 
	 What you type is what is searched for, but please note the following
		exceptions... 
	 
		- When you submit your search expression, any space characters are
		  converted to underscores (_) for on-screen clarity. This means that you cannot
		  search for an underscore using a plain text search.
 - Similarly, < and > are converted to underscore. This is to prevent cross-site scripting attacks. This conversion means that you cannot
		  search for < or >.
 
		- Spaces (and underscores) in your search expression are interpreted as
		  matching any number of consecutive spaces. This is so that your search will not
		  be spoiled if the database accidentally happens to contain two spaces between
		  words. You can change this behaviour using the checkbox in Advanced Settings,
		  above.
 
		- In a plain text search, you cannot search for non-7-bit ascii
		  characters (e.g. accented characters and symbols such as
		  ±½²³). These are stored in the database as HTML
		  Character Entities and - if you need to find them - you should use a wildcard
		  search.
 
		- The database contains HTML tags and Character Entities. Your search
		  will look inside these items, because it is faster not to exclude them,
		  but this could lead to strange results. You can change this behaviour using the
		  checkboxs in Advanced Settings for Do not search inside HTML tags and
		  Do not search inside Character Entities. You can also avoid the display
		  of strangely-formatted results by selecting the option Do not tag matched
		  text.
 
	    
  - 
	 
Wildcard Searches 
	 A Wildcard search works like a Plain Text search but,
		additionally... 
	 
		- In a wildcard search the characters ? and * have a
		  special meaning. ? matches a single character; * matches a string
		  of any characters, but is prioritised to be as short as possible. 
 
		- In other respects this search is the same as a plain text search.
		  
 
		- For the wildcard *, "As short as possible" means that if the
		  string being searched was, for example, "electric field and magnetic field"
		  then the search term elec*field would match "electric field" rather than
		  "electric field and magnetic field").
 
	    
  - 
	 
Boolean Searches 
	  Not yet implemented, but you
		may be able to achieve a similar result with an appropriate Regular Expression.
		 
	 A Boolean search allows you to combine Wildcard search strings
		with the logical operators NOT, AND, OR, XOR and
		IMP (IMPLIES), and to group them with parentheses, ( and
		). 
	 
		- XOR is the EXCLUSIVE OR operator, which is equivalent
		  to (aaa AND NOT bbb) or (bbb AND NOT aaa). 
 
		- IMP is the IMPLIES operator, where A IMP B is
		  equivalent to B OR NOT A.
 
		- In this implementation, the search string and the operators must each
		  be separated by one or more spaces but you can still use spaces inside
		  your search strings. If you need to use a space at the beginning or end of your
		  search string you should enter it as an underscore instead. (See note on spaces
		  in the Plain Text Search notes above).
 
		- You cannot use ( or ) in your search string unless you
		  select the adanced option Allow ( and ) inside Boolean search
		  strings. If you select that option you must ensure, if you also use
		  ( and ) to group your search expressions, that you separate these
		  'group separators' from the search strings and the other operators using
		  spaces
 
		- You can use the operator keywords (AND, OR, etc) in your
		  search string, provided they are not bounded by spaces.
 
		- Unlike some Boolean searches, this one does not execute with a
		  simple left-to-right evaluation. Instead, the operators have a presidence
		  ranking which, in high-to-low order, is NOT, AND &
		  IMP, OR & XOR. As an example, aaa ddd OR bbb AND
		  ccc would execute as aaa ddd OR ( bbb AND CCC ) rather then the
		  left-to-right execution of ( aaa ddd OR bbb ) AND ccc.
 
		- For a Boolean search, each search term is matched by a separate parse
		  of the database, so a complex search with many search strings could be
		  slow.
 
		- Because of the structure of the database, a Boolean search is
		  potentially more likely than a Plain Text search to produce strangely-formatted
		  results. You can avoid this by selecting the option Do not tag matched
		  text
 
	   
  - 
	 
Regular Expression Searches 
	 Unlike Plain Text, Wildcard and Boolean searches, your search string
		is interpreted directly, as a Regular Expression - but see the note on spaces,
		below. 
	 
		- Regular Expression searches use PHP-style expressons (which are
		  PCRE-based).
 
		- Your search string is delimited using / characters. If you
		  include a / character in your search string, it will be escaped with
		  \.
 
		-  The mode modifier i will be appended if you have specified a
		  case-insensitive search.
 
		- When you submit your search expression it is trimmed to remove
		  leading and trailing spaces so, to search for such a space you should use the
		  RegEx syntax \s
 
	  
	 Regular Expression (RegEx) searches are very powerful, but they are only
		suitable for experts. In particular, you may need to know something about the
		database structure in order to use a regular expression to best advantage. You
		can do some advanced searches using RegExs. Examples... 
	 
		- To to locate all non-7-bit printable ascii characters, which need to
		  be converted to HTML entities, use the search expression [^\s!-~]
 
		- To locate any & characters that have not been entered as
		  an HTML entity &, use the search expression
		  &(?!.{0,6}?(;|=))
 
		- To locate all HTML Character Entities, use the search expression
		  &#?\w+?;
 -  < and > are converted to underscore before the search expression is executed. This is to prevent cross-site scripting attacks. This conversion means that you cannot
		  search for < or >.
 
		
	    
  - 
	 
The Search Algorithms 
	 The Titles and Abstracts are searched separately. The search scores as a
		hit if... 
	 
		- The search string was found in the Title AND Search
		  Titles was selected, OR...
 
		- The search string was found in the Abstract AND Search
		  Abstracts was selected
 
	  
	 Clicking the Negate the Search Result box causes the result of
		the above logical test be inverted. This means that if you elect to search
		Titles AND Abstracts then, to be scored a hit, the search term must not
		appear in either. 
	 Entries in the database are in HTML-compatible text. That is, they
		include escaped 'Character Entities' and HTML tags (in particular the Anchor
		tag). For speed, your search will look inside tags and entities, but
		this can lead to strange results. For searches other than RegExs, you have the
		option to exclude tags and entities from the search by using the checkboxs in
		Advanced Settings for Do not search inside HTML tags and Do not
		search inside Character Entities. You can also avoid the display of
		strangely-formatted results by selecting the option Do not tag matched
		text. 
	 A couple of examples will illustrate this... 
	 
		-  If you do a plain text search for hp your search will include
		  matches within the string phpBB that appears inside some HTML tags. If
		  you click on such a result, the hyperlink will not work because the matched
		  string has been replaced by HTML code to display the match in red.
 
		-  If you do a plain text search for cut your search will
		  include matches within the string acute that appears inside some
		  Character Entities. The matched string will be replaced by HTML code to display
		  the match in red, so it will not longer function correctly as a Character
		  Entity, and will display as (e.g.) é instead of é. 
 
	  
	 Some further technical details... 
	 
		- Inside a tag means "inside the < and
		  > symbols"; not what appears inbetween a start tag and
		  its matched end tag. Do not search inside... is interpreted as
		  meaning a search string must not finish inside a tag (or entity). That
		  is clearly not exactly what the description implies, and you cannot
		  (easily) search for a string that encompasses an HTML tag. It would be possible
		  to strip the tags out before searching - and this might be a future
		  option.
 
		- In the database, the Titles are followed by the page numbers in
		  parentheses. A Title Search does not search the page numbers.
 
		- References to the CREG Forum. Title
		  records can contain a reference to the CREG Forum. Search for cregf to
		  display these records. Technical Spec. Titles can contain text
		  like [cregf:viewtopic.php?f=27&t=1203]. The text after
		  'cregf' is removed from the Title before it is searched or displayed. The text
		  after ':' is appended to 'https://forums.british-caving.org.uk/' to form the
		  URL. The phrase must begin '[cregf' and end with ']'.
 
		  - The shortcut URL to the CREG forum used to be bcra.org.uk/cregf/ but it has been
		  altered (12-Oct-2025) to /cregforum, to circumvent a potential problem with 302 redirects.
		  In browsers without a cached value, /cregf/ should work correctly.
 
	  
	 For a search other than a Regular Expression search, your search
		string, and any options you specify, are converted into the approrpiate regular
		expression which is then used for the search. A list of the conversion
		operations applied to non-RegEx searches is given in RegEx Conversions,
		below. For a regular expression search, you are expected to specify the search
		string precisely, including any arcane terms to tailor the search to work a
		particular way. 
	 Boolean searches have a more complicated algorithm than the other
		types of search, which proceeds as follows. 
	 
		- Unless you have selected the option to Allow ( and
		  ) inside Boolean search strings your search text is searched and all
		  instances of ( and ) will have spaces inserted before and after
		  them.
 
		- The search string is then parsed and separated into 'tokens', using
		  'space' as a delimiter. Each token thus represent a search string (or part of a
		  search string) or an operator. If you have selected the option to
		  Allow ( and ) inside Boolean search strings you must
		  ensure that all uses of ( and ) outside a search expression have
		  a space before and after them.
 
		- The tokens are then examined, in turn. If a token matches an operator
		  exactly then it is treated as an operator, else it is treated as a
		  string. Two adjacent tokens that are both strings are joined together into a
		  single string, with a single space between them. The sequence of tokens is
		  checked for syntax errors.
 
		- The search expression is re-ordered into Reverse Polish Notation that
		  computer languages use internally to process expressions. additionally, this
		  takes into account rules of operator presidence.
 
		- The search expression is then parsed for a fourth time, converting
		  each search string into a regular expression as described under Wildcard
		  searches, above, and RegEx Conversions, below.
 
		- Additionally, the individual search terms are combined into a single
		  regular expression, $match, using an OR syntax, which is saved for later
		  use, should there be a match.
 
		- The complete parsed and processed search expression is then displayed
		  (as a debugging aid) and passed to the search engine.
 
		- The search engine examines and executes each token in turn, placing
		  the logical result of the operation on a stack.
 
		- If the option Negate the search result has been specified the
		  logical result of the search is inverted.
 
		- If the result is TRUE then the result is prepared for
		  outputting to the screen.
 
		- Unless the option Do not tag matched text has been selected,
		  the $match expression assembled earlier is used to modify the printable
		  result to highlight all the search terms.
 
	   
  - 
	 
RegEx Conversions 
	 Plain Text searches 
	 Plain-text searches are converted to Regular Expressions before the
		search is executed. The sequence of operations is as follows. 
	 
		- Spaces are converted to underscores before the expression is
		  submitted
 
		-  < and > are converted to underscore.
 - All characters that have a special meaning in RegExs are escaped with
		  \
 
		- The characters  & " £ are replaced with
		  their HTML entities
 
		- _ is replaced with /s+ so that the search matches a
		  string of spaces. This behaviour can be modified by an Advanced Setting
 
		- Hyphen is replaced by (\-|–) so that the search
		  matches a hyphen or an en-dash. This behaviour can be modified by an Advanced
		  Setting
 
		- If you specified Match Whole Words then the RegEx is bounded
		  by the metacharacter \b for 'word boundary'
 
		- If you specified Do not search inside HTML tags the RegEx
		  phrase (?![^<]*?>) is appended to your search so that matches
		  inside an HTML tag are ignored.
 
		- If you specified Do not search inside Character Entities the
		  RegEx phrase(?![^&]*?;;) is appended to your search so that matches
		  inside an HTML entity are ignored. For this to work, the database is
		  temporarily altered to replace the single terminating ; of an Entity, by
		  ;;. This apparent 'botch' is considered the simplest way of performing
		  the match, because the alternative method of using a RegEx lookbehind
		  function is tediously long-winded. 
 
		- If you did not specify Match Case then the mode modifier i is
		  appended to the regular expression, for 'case-insensitive matching'
 
	  
	 Wildcard and Boolean searches All the above, plus...
	 
		-  The wildcard ? is replaced by (.|&#?\w+?;) so that
		  it matches any single character or a Character Entity
 
		- The wildcard * is replaced by .*? to specify a search
		  for a string of any characters, but one which is prioritised to be as short as
		  possible.
 
	  
	 RegEx searches 
	 
		- The / character, which has a special meaning in a RegEx is
		  escaped with \
 
		- If you did not specify Match Case then the mode modifier i is
		  appended to the Regular Expression, for 'case-insensitive matching'
 
	   
    
   Some Preset
Searches
Some special searches 
The following list is intended mostly for 'debugging', but feel free to try
  them. 
    
   Known Issues /
Things to Do
Software Issues 
  - The Special Searches for double-quotes and &-in-tags doesnt
	 populate the search box properly, although the search works. Something to do
	 with the unbalanced or unescaped quotation marks or entities not being properly
	 escaped in URLs ... whatever.
 
 
Database Issues 
  - Page numbers: are not given for the earlier journals - the
	 database needs updating
 
  - 8-bit characters: in the database need replacing with Character
	 Entities
 
  - Tagging of Authors: From j99, authors' names are tagged with
	 <SPAN CLASS="author">. This should be extended back through all
	 issues.
 
  - Some HTML tags could be replaced by entities: Consider replacing
	 HTML tags for <sup> with entities, or improve regEx so that it searches
	 them as it would an entity. Related: why do I not use <sup> in CKS
	 database; but use CSS instead?
 
  - Articles containing corrections and updates. Check whether
	 searching for "corrections" brings up all published corrections. Check Julie's
	 database for this. Also, check her notes of 'associated articles' to see if it
	 can be incorporated, and a special search for "updates, corrections and related
	 articles" introduced, perhaps?
 
 
Future Development 
  - Consider translating 8-bit chars in Search String to HTML entities. Or,
	 at least, flagging them to user and suggesting he use a wildcard
 
  - Add Booleans. Use a presidence stack to convert to RP.
 
  - Finish debugging the new feature that puts tabs between title items, and
	 extend it to all listings. This feature is not yet advertised to the user.
	 
 
  - Consider adding option to strip HTML tags before searching
 
  - Consider changing code so that the "Using RegEx" text isnt put in the
	 SPAN "searchReport" innerHTML until the search is complete. This is just to
	 tidy the HTML output and make it easier to debug
 
  - Change database files - we no longer need lo list all the links, now that
	 covers.php handles this. Still handy to list the links to raw data files I
	 suppose - or do that via a query string, e.g. mode=raw
 
 
Additions, Bugs, Corrections (not necessarily a complete list) 
  - 12-Nov-2017 Version 0v11: Bug correction: Ampersand not converting
	 to Entity in Plain Text search. Correction made to search.php; forgot that
	 & is not converted by preg_quote()
 
  - 12-Nov-2017 Version 0v11: Layout change: Added <SPAN
	 CLASS="keepTogether"> to keep INPUT items on same line as their text, in
	 list of search options. This necessitated adding a parentNode clause in their
	 ONCLICKS. Updated /pub/popup.css
 
  - 12-Nov-2017 Version 0v11:New Feature: Added Do not search
	 inside Character Entities by using a 'botch' - see ;; above. This
	 seems the simplest way to do it though, because lookbehind (for matching the
	 opening & of an Entity) requires a fixed length search term.
 
  - 14-Nov-2017 Version 0v13: Documentation: revised notes for Boolean
	 operations (although they are still not implemented). Program: various changes
	 to comments in search.html and format_creg.php. Moved PHP error handling into
	 separate function ( for which see test mechanism at the end of printdata()).
	 
 
  - 14-Nov-2017 Version 0v15: HTML: added #results to submitted FORM
	 so that it jumps down to start of results when search is complete. Also added
	 popup info box (position: fixed) that duplicates the info in "Results of
	 Search", and which disappears when results are complete. This makes it easier
	 to inspect the regex (as displayed) during a long or buggy search.
 
  - 15-Nov-2017 Version 0v16: HTML: added notes on boolean searches.
	 Added feature to limit search to a range of years.
 
  - 16-Nov-2017 Version 0v17: Corrected bug due to accidentally wiped
	 code in format_creg for handling title and abs checkboxes. Updated 'years'
	 facility to give default string '(All)' and to move some operations out of the
	 For loop in printallData
 
  - 20-Nov-2017 Version 0v19: Searches for 'cregf' string now handled
	 better. This required a change to the database structure. See References
	 to the CREG Forum, in the Help notes above.
 
  - 21-Nov-2017 Version 0v19 Documentation: revised
 
  - 27-Nov-2017 Version 0v21 __search updated.
	 Contents.php edited to show links to local copies of PDFs when run under
	 localhost. format_creg updated for stripping of HTML pre-amble and
	 adding new pre-amble if it does not exist. Modified .htaccess and
	 contents.php to give new format for representing links to data. Updated
	 pub/popup.css. 
 
  - 27-Nov-2017 Version 0v21 Documentation: update to
	 pub/dataformat and to
	 database.html
 
  - 28-Nov-2017 Version 0v22 Layout changes to search.html. Added
	 showDOI. Added reset date to tooltip for downloads counter. Added tabs to
	 separate title items. See further work
 
  - 30-Nov-2017 Version 0v23 Version number is now PHP variable. Added
	 logging of search requests. Associated changes to docstore log files
 
  - 30-Nov-2017 Version 0v24 Sandbox handling corrected for
	 searches.
 
  - 03-Dec-2017 Version 0v25 Format updating now also includes
	 conversion to HTML Entities, but these features are only enabled at Localhost,
	 because of file permission and character set issues. Further corrections to
	 data files, for HTML entities - both 8-bit chars and one file with rogue
	 double-quots
 
  - 04-Dec-2017 Version 0v26 Corrected contents.php to remove bad type
	 conversion when testing for 'cregf', which was preventing CKS from finding 000
	 files. Changed covers.php to display new unique URLs instead of query strings.
	 Changed name of search Text Box in this file, to deter bots; renamed $v box to
	 'search' to use as a flag, to avoid needing to update other files. Amended
	 search.*, log_search* and fetch_logs accordingly.
 
  - 08-Dec-2017 Version 0v27 Preset searches for development changed
	 to use ONMOUSEOVER to build URL, so search engines cannot follow the links.
	 Added notes about searching for authors via authors.html
 - 08-Dec-2017 Version 0v28  < and > are now converted to underscore before any other processing. This is to prevent cross-site scripting attacks. This conversion means that you cannot
		  search for < or >. In practice, it would be OK to search for < and > in a plain text search, so I might alter this behaviour again later.
 
    
    | 
  
  
 View Contents: 
  
BCRA is a UK registered charity and is a constituent body of
the British Caving Association,
undertaking charitable activities on behalf of the BCA.   
BCRA publishes a range of periodicals and books.
Click here for further information. 
 
 
|  
 Searching 
To Search our pages using Google, type a search
string in the box at the top of the page and hit your Return key 
You can also search our publications catalogue at the British Caving Library 
The CREG Journal Search Engine is a new, powerful search engine which will, sometime, be extended 
to cover Cave & Karst Science.  We have a keyword search facility on our Cave Science Indexes pages but this may be rather out-of-date. 
 |  
  
  
For staff use: Link to Database  
Show/Hide
	download figures next to each item (if available and non-zero; you might need to refresh page first). Counters last
		reset on Thu 03-Jan-2019 17:29:28 +00:00. The figures are non-unique
		click-throughs.  
 
  Users please note: that, for debugging purposes, all search requests
are logged. The logged data includes the client IP address as reported to our
web server. 
 Development notes to self: Reminder: files are in /bookshop/,
pub/cregj/ and/pub/php/ run
at BCRA | run at
Localhost | location.reload(true)  |