CREG Journal

CREG Journal Search Engine

Development Version 0v28, build date Mon 31-May-2021 14:14:20, by David Gibson
Warning: This page is under development and is likely to contain strange bugs and poor documentation. Please report anything odd or incorrect. Please include, in your report, the date and time of your search, your IP address (which is reportedly 216.73.216.82) and a description of what happened. Just saying "it didnt work" is not particularly helpful!

This facility searches the Contents Lists for the CREG journal. There are some 1500 records in the database, going back to the first issue in 1988. Each record usually represents a single article, but there are also brief 'headline' records that refer to a complete journal and, occasionally, there are records that refer to the CREG Forum

, rather than the CREG journal. Click here for further info.

To see the listings for a single journal see CREG Journal
To start a search based on an author, type their name in the search box, or see List of Authors (That list might not be available yet)

Type a search string and hit return, or click the SEARCH button

For HELP scroll down to Help with Searching

Advanced Settings. These override the default search settings, but are ignored (and are cleared) for a Regular Expression search. Scroll down to Help with Searching for further information.

	Single space does not match multiple spaces
	Hyphen does not match en-dash
	Do not search inside HTML tags
	Do not search inside Character Entities
	Allow ( and ) inside Boolean search strings. Note. If you select this option you must ensure, if you also use ( and ) to group your search expressions, that these 'group separators' are separated from the search strings and the other operators using spaces

Results of Search (still searching...)

This page may take a few seconds to load. Please wait ...

Help With Searching

DEVELOPMENT NOTE: If you think something odd is happening, try forcing a page refresh by typing (probably) CTRL-SHIFT-R - but check your browser documentation. Reason: Your browser might be relying on old cached copies of JS or CSS files that have been modified recently, during development, but which your browser has decided not to download. Browsers are capricious like that.

Plain Text Searches
Wildcard Searches
Boolean Searches
Regular Expression Searches
The Search Algorithms
Regex Conversions

Plain Text Searches

What you type is what is searched for, but please note the following exceptions...
- When you submit your search expression, any space characters are converted to underscores (_) for on-screen clarity. This means that you cannot search for an underscore using a plain text search.
- Similarly, < and > are converted to underscore. This is to prevent cross-site scripting attacks. This conversion means that you cannot search for < or >.
- Spaces (and underscores) in your search expression are interpreted as matching any number of consecutive spaces. This is so that your search will not be spoiled if the database accidentally happens to contain two spaces between words. You can change this behaviour using the checkbox in Advanced Settings, above.
- In a plain text search, you cannot search for non-7-bit ascii characters (e.g. accented characters and symbols such as ±½²³). These are stored in the database as HTML Character Entities and - if you need to find them - you should use a wildcard search.
- The database contains HTML tags and Character Entities. Your search will look inside these items, because it is faster not to exclude them, but this could lead to strange results. You can change this behaviour using the checkboxs in Advanced Settings for Do not search inside HTML tags and Do not search inside Character Entities. You can also avoid the display of strangely-formatted results by selecting the option Do not tag matched text.
Wildcard Searches

A Wildcard search works like a Plain Text search but, additionally...
- In a wildcard search the characters ? and * have a special meaning. ? matches a single character; * matches a string of any characters, but is prioritised to be as short as possible.
- In other respects this search is the same as a plain text search.
- For the wildcard *, "As short as possible" means that if the string being searched was, for example, "electric field and magnetic field" then the search term elec*field would match "electric field" rather than "electric field and magnetic field").
Boolean Searches

Not yet implemented, but you may be able to achieve a similar result with an appropriate Regular Expression.

A Boolean search allows you to combine Wildcard search strings with the logical operators NOT, AND, OR, XOR and IMP (IMPLIES), and to group them with parentheses, ( and ).
- XOR is the EXCLUSIVE OR operator, which is equivalent to (aaa AND NOT bbb) or (bbb AND NOT aaa).
- IMP is the IMPLIES operator, where A IMP B is equivalent to B OR NOT A.
- In this implementation, the search string and the operators must each be separated by one or more spaces but you can still use spaces inside your search strings. If you need to use a space at the beginning or end of your search string you should enter it as an underscore instead. (See note on spaces in the Plain Text Search notes above).
- You cannot use ( or ) in your search string unless you select the adanced option Allow ( and ) inside Boolean search strings. If you select that option you must ensure, if you also use ( and ) to group your search expressions, that you separate these 'group separators' from the search strings and the other operators using spaces
- You can use the operator keywords (AND, OR, etc) in your search string, provided they are not bounded by spaces.
- Unlike some Boolean searches, this one does not execute with a simple left-to-right evaluation. Instead, the operators have a presidence ranking which, in high-to-low order, is NOT, AND & IMP, OR & XOR. As an example, aaa ddd OR bbb AND ccc would execute as aaa ddd OR ( bbb AND CCC ) rather then the left-to-right execution of ( aaa ddd OR bbb ) AND ccc.
- For a Boolean search, each search term is matched by a separate parse of the database, so a complex search with many search strings could be slow.
- Because of the structure of the database, a Boolean search is potentially more likely than a Plain Text search to produce strangely-formatted results. You can avoid this by selecting the option Do not tag matched text
Regular Expression Searches

Unlike Plain Text, Wildcard and Boolean searches, your search string is interpreted directly, as a Regular Expression - but see the note on spaces, below.
- Regular Expression searches use PHP-style expressons (which are PCRE-based).
- Your search string is delimited using / characters. If you include a / character in your search string, it will be escaped with \.
- The mode modifier i will be appended if you have specified a case-insensitive search.
- When you submit your search expression it is trimmed to remove leading and trailing spaces so, to search for such a space you should use the RegEx syntax \s
Regular Expression (RegEx) searches are very powerful, but they are only suitable for experts. In particular, you may need to know something about the database structure in order to use a regular expression to best advantage. You can do some advanced searches using RegExs. Examples...
- To to locate all non-7-bit printable ascii characters, which need to be converted to HTML entities, use the search expression [^\s!-~]
- To locate any & characters that have not been entered as an HTML entity &, use the search expression &(?!.{0,6}?(;|=))
- To locate all HTML Character Entities, use the search expression &#?\w+?;
- < and > are converted to underscore before the search expression is executed. This is to prevent cross-site scripting attacks. This conversion means that you cannot search for < or >.
The Search Algorithms

The Titles and Abstracts are searched separately. The search scores as a hit if...
- The search string was found in the Title AND Search Titles was selected, OR...
- The search string was found in the Abstract AND Search Abstracts was selected
Clicking the Negate the Search Result box causes the result of the above logical test be inverted. This means that if you elect to search Titles AND Abstracts then, to be scored a hit, the search term must not appear in either.

Entries in the database are in HTML-compatible text. That is, they include escaped 'Character Entities' and HTML tags (in particular the Anchor tag). For speed, your search will look inside tags and entities, but this can lead to strange results. For searches other than RegExs, you have the option to exclude tags and entities from the search by using the checkboxs in Advanced Settings for Do not search inside HTML tags and Do not search inside Character Entities. You can also avoid the display of strangely-formatted results by selecting the option Do not tag matched text.

A couple of examples will illustrate this...
1. If you do a plain text search for hp your search will include matches within the string phpBB that appears inside some HTML tags. If you click on such a result, the hyperlink will not work because the matched string has been replaced by HTML code to display the match in red.
2. If you do a plain text search for cut your search will include matches within the string acute that appears inside some Character Entities. The matched string will be replaced by HTML code to display the match in red, so it will not longer function correctly as a Character Entity, and will display as (e.g.) é instead of é.
Some further technical details...
1. Inside a tag means "inside the < and > symbols"; not what appears inbetween a start tag and its matched end tag. Do not search inside... is interpreted as meaning a search string must not finish inside a tag (or entity). That is clearly not exactly what the description implies, and you cannot (easily) search for a string that encompasses an HTML tag. It would be possible to strip the tags out before searching - and this might be a future option.
2. In the database, the Titles are followed by the page numbers in parentheses. A Title Search does not search the page numbers.
3. References to the CREG Forum. Title records can contain a reference to the CREG Forum. Search for cregf to display these records. Technical Spec. Titles can contain text like [cregf:viewtopic.php?f=27&t=1203]. The text after 'cregf' is removed from the Title before it is searched or displayed. The text after ':' is appended to 'http://british-caving.org.uk/phpBB3/' to form the URL. The phrase must begin '[cregf' and end with ']'.
For a search other than a Regular Expression search, your search string, and any options you specify, are converted into the approrpiate regular expression which is then used for the search. A list of the conversion operations applied to non-RegEx searches is given in RegEx Conversions, below. For a regular expression search, you are expected to specify the search string precisely, including any arcane terms to tailor the search to work a particular way.

Boolean searches have a more complicated algorithm than the other types of search, which proceeds as follows.
- Unless you have selected the option to Allow ( and ) inside Boolean search strings your search text is searched and all instances of ( and ) will have spaces inserted before and after them.
- The search string is then parsed and separated into 'tokens', using 'space' as a delimiter. Each token thus represent a search string (or part of a search string) or an operator. If you have selected the option to Allow ( and ) inside Boolean search strings you must ensure that all uses of ( and ) outside a search expression have a space before and after them.
- The tokens are then examined, in turn. If a token matches an operator exactly then it is treated as an operator, else it is treated as a string. Two adjacent tokens that are both strings are joined together into a single string, with a single space between them. The sequence of tokens is checked for syntax errors.
- The search expression is re-ordered into Reverse Polish Notation that computer languages use internally to process expressions. additionally, this takes into account rules of operator presidence.
- The search expression is then parsed for a fourth time, converting each search string into a regular expression as described under Wildcard searches, above, and RegEx Conversions, below.
- Additionally, the individual search terms are combined into a single regular expression, $match, using an OR syntax, which is saved for later use, should there be a match.
- The complete parsed and processed search expression is then displayed (as a debugging aid) and passed to the search engine.
- The search engine examines and executes each token in turn, placing the logical result of the operation on a stack.
- If the option Negate the search result has been specified the logical result of the search is inverted.
- If the result is TRUE then the result is prepared for outputting to the screen.
- Unless the option Do not tag matched text has been selected, the $match expression assembled earlier is used to modify the printable result to highlight all the search terms.
RegEx Conversions

Plain Text searches

Plain-text searches are converted to Regular Expressions before the search is executed. The sequence of operations is as follows.
- Spaces are converted to underscores before the expression is submitted
- < and > are converted to underscore.
- All characters that have a special meaning in RegExs are escaped with \
- The characters & " £ are replaced with their HTML entities
- _ is replaced with /s+ so that the search matches a string of spaces. This behaviour can be modified by an Advanced Setting
- Hyphen is replaced by (\-|–) so that the search matches a hyphen or an en-dash. This behaviour can be modified by an Advanced Setting
- If you specified Match Whole Words then the RegEx is bounded by the metacharacter \b for 'word boundary'
- If you specified Do not search inside HTML tags the RegEx phrase (?![^<]*?>) is appended to your search so that matches inside an HTML tag are ignored.
- If you specified Do not search inside Character Entities the RegEx phrase(?![^&]*?;;) is appended to your search so that matches inside an HTML entity are ignored. For this to work, the database is temporarily altered to replace the single terminating ; of an Entity, by ;;. This apparent 'botch' is considered the simplest way of performing the match, because the alternative method of using a RegEx lookbehind function is tediously long-winded.
- If you did not specify Match Case then the mode modifier i is appended to the regular expression, for 'case-insensitive matching'
Wildcard and Boolean searches
All the above, plus...
- The wildcard ? is replaced by (.|&#?\w+?;) so that it matches any single character or a Character Entity
- The wildcard * is replaced by .*? to specify a search for a string of any characters, but one which is prioritised to be as short as possible.
RegEx searches
- The / character, which has a special meaning in a RegEx is escaped with \
- If you did not specify Match Case then the mode modifier i is appended to the Regular Expression, for 'case-insensitive matching'

Some Preset Searches

Known Issues / Things to Do

Software Issues

The Special Searches for double-quotes and &amp-in-tags doesnt populate the search box properly, although the search works. Something to do with the unbalanced or unescaped quotation marks or entities not being properly escaped in URLs ... whatever.

Database Issues

Page numbers: are not given for the earlier journals - the database needs updating
8-bit characters: in the database need replacing with Character Entities
Tagging of Authors: From j99, authors' names are tagged with <SPAN CLASS="author">. This should be extended back through all issues.
Some HTML tags could be replaced by entities: Consider replacing HTML tags for <sup> with entities, or improve regEx so that it searches them as it would an entity. Related: why do I not use <sup> in CKS database; but use CSS instead?
Articles containing corrections and updates. Check whether searching for "corrections" brings up all published corrections. Check Julie's database for this. Also, check her notes of 'associated articles' to see if it can be incorporated, and a special search for "updates, corrections and related articles" introduced, perhaps?

Future Development

Consider translating 8-bit chars in Search String to HTML entities. Or, at least, flagging them to user and suggesting he use a wildcard
Add Booleans. Use a presidence stack to convert to RP.
Finish debugging the new feature that puts tabs between title items, and extend it to all listings. This feature is not yet advertised to the user.
Consider adding option to strip HTML tags before searching
Consider changing code so that the "Using RegEx" text isnt put in the SPAN "searchReport" innerHTML until the search is complete. This is just to tidy the HTML output and make it easier to debug
Change database files - we no longer need lo list all the links, now that covers.php handles this. Still handy to list the links to raw data files I suppose - or do that via a query string, e.g. mode=raw

Additions, Bugs, Corrections (not necessarily a complete list)

12-Nov-2017 Version 0v11: Bug correction: Ampersand not converting to Entity in Plain Text search. Correction made to search.php; forgot that & is not converted by preg_quote()
12-Nov-2017 Version 0v11: Layout change: Added <SPAN CLASS="keepTogether"> to keep INPUT items on same line as their text, in list of search options. This necessitated adding a parentNode clause in their ONCLICKS. Updated /pub/popup.css
12-Nov-2017 Version 0v11:New Feature: Added Do not search inside Character Entities by using a 'botch' - see ;; above. This seems the simplest way to do it though, because lookbehind (for matching the opening & of an Entity) requires a fixed length search term.
14-Nov-2017 Version 0v13: Documentation: revised notes for Boolean operations (although they are still not implemented). Program: various changes to comments in search.html and format_creg.php. Moved PHP error handling into separate function ( for which see test mechanism at the end of printdata()).
14-Nov-2017 Version 0v15: HTML: added #results to submitted FORM so that it jumps down to start of results when search is complete. Also added popup info box (position: fixed) that duplicates the info in "Results of Search", and which disappears when results are complete. This makes it easier to inspect the regex (as displayed) during a long or buggy search.
15-Nov-2017 Version 0v16: HTML: added notes on boolean searches. Added feature to limit search to a range of years.
16-Nov-2017 Version 0v17: Corrected bug due to accidentally wiped code in format_creg for handling title and abs checkboxes. Updated 'years' facility to give default string '(All)' and to move some operations out of the For loop in printallData
20-Nov-2017 Version 0v19: Searches for 'cregf' string now handled better. This required a change to the database structure. See References to the CREG Forum, in the Help notes above.
21-Nov-2017 Version 0v19 Documentation: revised
27-Nov-2017 Version 0v21 __search updated. Contents.php edited to show links to local copies of PDFs when run under localhost. format_creg updated for stripping of HTML pre-amble and adding new pre-amble if it does not exist. Modified .htaccess and contents.php to give new format for representing links to data. Updated pub/popup.css.
27-Nov-2017 Version 0v21 Documentation: update to pub/dataformat and to database.html
28-Nov-2017 Version 0v22 Layout changes to search.html. Added showDOI. Added reset date to tooltip for downloads counter. Added tabs to separate title items. See further work
30-Nov-2017 Version 0v23 Version number is now PHP variable. Added logging of search requests. Associated changes to docstore log files
30-Nov-2017 Version 0v24 Sandbox handling corrected for searches.
03-Dec-2017 Version 0v25 Format updating now also includes conversion to HTML Entities, but these features are only enabled at Localhost, because of file permission and character set issues. Further corrections to data files, for HTML entities - both 8-bit chars and one file with rogue double-quots
04-Dec-2017 Version 0v26 Corrected contents.php to remove bad type conversion when testing for 'cregf', which was preventing CKS from finding 000 files. Changed covers.php to display new unique URLs instead of query strings. Changed name of search Text Box in this file, to deter bots; renamed $v box to 'search' to use as a flag, to avoid needing to update other files. Amended search.*, log_search* and fetch_logs accordingly.
08-Dec-2017 Version 0v27 Preset searches for development changed to use ONMOUSEOVER to build URL, so search engines cannot follow the links. Added notes about searching for authors via authors.html
08-Dec-2017 Version 0v28 < and > are now converted to underscore before any other processing. This is to prevent cross-site scripting attacks. This conversion means that you cannot search for < or >. In practice, it would be OK to search for < and > in a plain text search, so I might alter this behaviour again later.

BCRA logo

View Contents:

BCRA is a UK registered charity and is a constituent body of the British Caving Association, undertaking charitable activities on behalf of the BCA.

BCRA publishes a range of periodicals and books. Click here for further information.

Searching

To Search our pages using Google, type a search string in the box at the top of the page and hit your Return key

You can also search our publications catalogue at the British Caving Library

The CREG Journal Search Engine is a new, powerful search engine which will, sometime, be extended to cover Cave & Karst Science.

We have a keyword search facility on our Cave Science Indexes pages but this may be rather out-of-date.

For staff use: Link to Database

Show/Hide download figures next to each item (if available and non-zero; you might need to refresh page first). Counters last reset on Thu 03-Jan-2019 17:29:28 +00:00. The figures are non-unique click-throughs.

Users please note: that, for debugging purposes, all search requests are logged. The logged data includes the client IP address as reported to our web server.

Development notes to self: Reminder: files are in /bookshop/, pub/cregj/ and/pub/php/ run at BCRA | run at Localhost | location.reload(true)

CREG Journal (ISSN 1361-4800)

Results of Search

Search starting...

CREG Journal Search Engine

Results of Search (still searching...)

Help With Searching

Some Preset Searches

Known Issues / Things to Do