===== Fetching Remote Wikka Content ===== {{lastedit}} >>**""FetchRemote"" v.0.6** available for testing Download the [[http://wikka.openformats.org/fetchremote.phps source]] and save it as: ##actions/fetchremote.php## ''Feedback is welcome!'' **See also:** ~-SyndicatingWikka >>===""FetchRemote"" Action=== Version 0.7 __Note:__ JavaWoman has done a **huge** work in improving/debugging the link rewrite engine, which now works almost perfectly. Hope she won't mind if I post here the 0.7 'debugging version' of the code ;) ==What it does== ~-Connects to the main Wikka server and fetches Wikka Documentation Pages. --- A "raw" handler must be available on the main Wikka server, in order to produce raw wikka-formatted content with header and footer stripped. ~-Displays an error message if remote pages do not exist on the server or if a connection is not available. ~-Parses the fetched page and rewrites internal links as links to fetchable pages. ~-Prints the fetched page locally, together with a header. ~-Allows fetched pages to be safely stored on the Wikka client. ~-If a page with the same name already exists on the Wikka client, a "see local version" button instead of the "download" button is displayed. ==How to use it== Simply add ##""{{fetchremote}}""## in one of your pages. You can specify a starting page by adding: ##""{{fetchremote page="HomePage"}}""## ==Notes== ~-Basically, the idea is to make the main Wikka site work as a //server// providing wikka-formatted content to //Wikka-clients//. There are several advantages in this approach, compared to merely fetching HTML: ~~1) the fetched content integrates seamlessly with the layout and structure of the Wikka-client; ~~1) the user can choose to download locally a fetched page, so as to make it available in its Wikka site. ~-No MySQL connection to the central database is needed, provided that a method exists for retrieving pure page content with the header and footer stripped; ~-Remote fetching of pages through ##fopen()## must be allowed by php (by default it is). == Long-term development ideas == The potential utility of such a plugin is pretty large. Just think of scenarios in which central Mother-wikis distribute //wiki-formatted content// to Child-wikis. Providing up-to-date documentation is only one of the possible uses of this plugin. //And now, for something completely different// #### Imagine that the set of patterns used by the rewrite engine to format the local version of the fetched page might be user-configurable and extended beyond link formatting. One day, we could have a plugin to retrieve content from remote 'non-wikka-powered' wikis, translate the wiki-content in wikka syntax and seamlessly integrate/save it locally. Sounds exciting, doesn't it? :) #### ==The code (##actions/fetchremote.php##) == __Note__ I had to modify a line in the code below because it contained two "%" in a row (which broke the code display on this page): Before testing this code please remove the space I added between the two "%": **original:** ##""define('PATTERN_CODE', '% %.*?% %'); # ignore code block""## **modified:** ##""define('PATTERN_CODE', '%%.*?%%'); # ignore code block""## %%(php) don't rewrite! just prevent CamelCase rewriting here */ // pattern defines // NOTE: (initial) REs for URL taken from wakka.php formatter - same potential problems.. - now adapted since there WERE indeed problems! // string to mark a "don't replace me" camel words and other strings with define('IGNOREMARKER', '!!!'); define('PATTERN_IGNOREMARKER', '!!!'); # @@@ PHP function to escape for RE? // patterns to be ignored for rewriting define('PATTERN_IGNORE', PATTERN_IGNOREMARKER.'.*?'.PATTERN_IGNOREMARKER); # string "marked up" to be ignored //Note: REMOVE spaces between % % in the following line before using the plugin define('PATTERN_CODE', '% %.*?% %'); # ignore code block define('PATTERN_LITERAL', '"".*?""'); # ignore Wikka literal define('PATTERN_ACTION', '{{(?!image).*?}}'); # ignore action _except_ image define('PATTERN_ATTRIB', '\b(\w*?\s*)(=\s?"[^\n]*?"|=\s?\'[^\n]*?\')'); # attributes (HTML, action) #define('PATTERN_URL', '\b[a-z]+:\/\/\S+'); # copied from formatter define('PATTERN_URL', '[a-z]+:\/\/\S+'); # copied from formatter - adapted #define('PATTERN_URL2', '^([a-z]+:\/\/\S+?)([^[:alnum:]^\/])?$'); # copied from formatter define('PATTERN_URL2', '\b[a-z]+:\/\/[[:alnum:]][-_[:alnum:]\/@:\.,_\?&;=]+[-_[:alnum:]\/\?&;=]'); # copied from formatter - adapted to recognize more URLs, @@@ not perfect yet define('PATTERN_INTERWIKI', '\b[A-Zƒ÷‹][A-Za-zƒ÷‹?‰ˆ¸]+[:](?![=_])\S*\b'); # copied from formatter define('PATTERN_FORCEDURL', '\[\[(?!")'.PATTERN_URL2.'(\s+(.*?))?\]\]'); # forced link with URL (ignore) @@@ (?!") still needed?? // regex pattern for forced links: accept "internal pages" (camelwords) on remote server but ignore URLs define('PATTERN_FORCED', '\[\[(?!")([^\s\/\]]+)(\s+(.*?))?\]\]'); # forced link not with URL (rewrite) @@@ (?!") still needed?? // regex patterns to recognize a "CamelWord" #define('PATTERN_CAMELWORD', '[A-Z]+[a-z]+[A-Z][A-Za-z0-9]+'); # @@@ make equivalent to formatter (see below) #define('PATTERN_CAMELWORD', '\b[A-Zƒ÷‹]+[a-z?‰ˆ¸]+[A-Z0-9ƒ÷‹][A-Za-z0-9ƒ÷‹?‰ˆ¸]*\b'); # copied from formatter but removed brackets define('PATTERN_CAMELWORD', '[A-Zƒ÷‹]+[a-z?‰ˆ¸]+[A-Z0-9ƒ÷‹][A-Za-z0-9ƒ÷‹?‰ˆ¸]*'); # copied from formatter but removed brackets #define('PATTERN_FREECAMEL', '(\s*)('.PATTERN_CAMELWORD.')'); # @@@ not needed? leave for now // regex pattern to recognize an image link (imaghe links with URLs are left to the formatter) define('PATTERN_IMGLINK', 'link="('.PATTERN_CAMELWORD.')"'); /* problems solved so far forced links: - a forced link like [[MHM]] just disappeared (see CreateNewPage) - forced links of the form [[WikiName]]s are misinterpreted (mangled result) (example on WikkaBugsResolved "Interwiki is broken") - some URLs (in forced links) not recognized but should be ignored (see DarTar) - forced links on NotifyOnChange not recognized at all (caused by the credits in (single) [] ?) => No: solution: single LinkRewrite! camelwords: - JsnX not recognised (see WikkaBugsResolved) => incorrect RE - Words like Mod040fSmartPageTitles not recognized (see WikkaBugsResolved) => incorrect RE ignores: - ignore literals ""[[double bracket]]"" or ""WikiWord"" were rewritten when they shouldn't be (see also CreateNewPage) - ignore code blocks (may contain forced links or WikiWords) - ignore URLs that contain camelwords => simply ignore URLs - URL with embedded camelword on its own on a line: URL not recognized (see Mod039fMindMapMod) => error in preg_replace_callback RE - ignore InterWiki links (see WikiName for an example; better xmp at WikkaBugsResolved "Interwiki is broken") - code not recognized on LoggedUsersHomepage and RedirectOnLogin => solution: single LinkRewrite! - literal not recognized on LoggedUsersHomepage (Camel matched first on ""IntraNet"" - why?) => solution: single LinkRewrite! - interwiki links broken again in single-function rewrite (see WikkaBugsResolved "Interwiki is broken") => clumsy fix with extra function - OrphanedPages shows error message: "Unknown action; an action name can consist only of US-ASCII characters and/or digits." but no page names at all... => add ignore for actions other: - code blocks may disappear or be broken (see FeedbackAction for an example) => incorrect code block ignore; RE must be match over multiple lines */ /* outstanding problems - rewritten image links show up as external links - unavoidable, I think: the image does link to an external URL after rewriting! (see AddingLinks for an example) */ /* list of important TEST pages - CreateNewPage - forced link without description (test correct RE and matching elements for forced links) - literals that should be ignored (including literal containing URL containing camelword) - NotifyOnChange - more forced links (test not getting confused by extra [] around forced links) - WikkaBugsResolved - forced links of the form [[WikiName]]s - see "Interwiki is broken" - InterWiki links - camelwords like JsnX and Mod040fSmartPageTitles (test correct RE for camelwords) - DarTar - forced links with external URLs (test not rewriting such forced links) - LoggedUsersHomepage - literals to be ignored (such as ""IntraNet"") as well as code blocks - FeedbackAction - code blocks (containing camelwords, literals and forced links) to be ignored - FreeMind - forced link with URL containing underscore (test correct URL RE) - Mod039fMindMapMod - lone camelword on one line followed by lone URL with camelword on next line (test URL RE and preg_replace_callback RE) - OrphanedPages - action (with camelword!) should be ignored - AddingLinks - image actions should NOT be ignored */ /* (possible) server-side bugs - XBUG: problem with googleform on UsingActions => cause: bug in googleform itself! => REPORTED on WikkaBugs - XBUG? OrphanedPages - shown directly starts with an "orphan" '12Action!' (does not exist) followed by page names; database problem? */ // SET DEFAULTS $remote_server_root = 'http://wikka.jsnx.com/'; # set remote server root //$remote_server_root = "http://test/wikka-1.1.5.0/wikka.php?wakka="; # debug server $defaultpage = 'WikkaDocumentation'; # define default page to be fetched if (isset($page)) $defaultpage = $page; # pick up action parameter if (isset($_REQUEST['page'])) $defaultpage = $_REQUEST['page']; # pick up URL parameter $page = $defaultpage; # ready to roll // PERFORM REDIRECTIONS // redirect to main documentation page if ($_POST['action'] == 'Return to Wikka Documentation') $this->Redirect($this->GetPageTag()); // redirect to Wikka homepage on disconnection if ($_POST['action'] == 'Disconnect') $this->Redirect($this->GetConfigValue('root_page')); // switch to local version of the page if ($_POST['action'] == 'See local version') $this->Redirect($page); // automatically redirect to local page if it exists // NOTE: the use of this feature is discouraged since it traps users 'locally' // and prevents them from accessing recently updated versions of the Wikka documentation //if ($this->LoadPage($page)) $this->Redirect($page); // SET HEADER & FORM ELEMENTS // header style // to be replaced by a CSS selector in the definitive version $style = 'text-align: center; margin: 30px 25%; border: 1px dotted #333; background-color: #EEE; padding: 5px;'; // build form chunks $form_local = ''; # i18n $form_main = ''; # i18n $form_disconnect = ''; # i18n $form_page = ''; $form_download = ''; # i18n // TRY TO CONNECT $remote_page = fopen($remote_server_root.$page."/raw", "r"); if (!$remote_page) { // NO CONNECTION AVAILABLE echo $this->Format('=====Wikka Documentation===== --- Visit the **[[http://wikka.jsnx.com/WikkaDocumentation Wikka Documentation Project]]** --- --- '); // if a local version of the starting page is available: if ($this->LoadPage($page)) print $this->FormOpen().$form_local.$this->FormClose(); } else { // CONNECTION ESTABLISHED // fetch raw content of remote page while (!feof($remote_page)) { $content .= fgets($remote_page, 1024); } if (!$content) { // missing or empty page: show error message $header = 'Sorry, **'; $header .= '""'.$page.'""'; $header .= '** cannot be found on the [['.$remote_server_root.$page.' Wikka server]]! --- --- '; $form = $this->FormOpen().$form_page; $form .= ($this->LoadPage($page)) ? $form_local : ''; $form .= $form_main.$this->FormClose(); } else { // START LINK-REWRITING ENGINE // define callback functions // mark strings to be ignored for rewriting function MarkIgnore($things) { /* DEBUG - remove later if ('' != $things[0]) { echo '
START MarkIgnore - $things:
';
	print_r($things);
	echo '
'; } /**/ $thing = $things[0]; // ignore things BEFORE looking at forced links or camels if ( // s modifier to match over multiple lines // i modifier to make case-insensitive preg_match('/'.PATTERN_CODE.'/s',$thing) # ignore code block || preg_match('/'.PATTERN_LITERAL.'/s',$thing) # ignore literals || preg_match('/'.PATTERN_ACTION.'/is',$thing) # ignore actions (keywords are case-insensitive and may be camelword!) || preg_match('/'.PATTERN_INTERWIKI.'/',$thing) # ignore Interwiki links ) { /* DEBUG - remove later echo 'CODE, LITERAL or INTERWIKI match: {'.htmlspecialchars($thing).'}
'; /**/ $output = IGNOREMARKER.$thing.IGNOREMARKER; # mark to be ignored } // ignore attributes except in image (action) links - MUST come before checking URLs elseif (preg_match('/'.PATTERN_ATTRIB.'/',$thing,$matches)) { /* DEBUG - remove later echo '
ATTRIB match:
';
print_r($matches);
echo '
'; /**/ if ('link' != $matches[1]) { $output = $matches[1].IGNOREMARKER.$matches[2].IGNOREMARKER; /* DEBUG - remove later echo 'ATTRIB output: {'.htmlspecialchars($output).'}
'; /**/ } else { $output = $thing; /* DEBUG - remove later echo 'ATTRIB output in image link: {'.htmlspecialchars($output).'}
'; /**/ } } // ignore forced links with URLs and 'free' URLs elseif ( preg_match('/'.PATTERN_FORCEDURL.'/', $thing) # ignore forced links with URLs || preg_match('/'.PATTERN_URL2.'/', $thing) # ignore URLs ) { /* DEBUG - remove later if (preg_match('/'.PATTERN_FORCEDURL.'/', $thing)) { echo '
FORCEDURL or URL match:
';
	echo htmlspecialchars($thing);
	echo '
'; } /**/ $output = IGNOREMARKER.$thing.IGNOREMARKER; # mark to be ignored /* DEBUG - remove later echo 'REWRITE IGNORE (FORCED) URL - output: {'.htmlentities($output).'}

'; /**/ } /* DEBUG - remove later echo 'IGNORE - output: {'.htmlentities($output).'}

'; /**/ return $output; } // rewrite links (unless in a to be ignored string) function RewriteLink($things) { /* DEBUG - remove later if ('' != $things[0]) { echo '
START RewriteLink - $things:
';
	print_r($things);
	echo '
'; } /**/ global $wakka; $thing = $things[0]; if (preg_match('/'.PATTERN_IGNORE.'/s',$thing)) # already marked as ignore: nothing to do { /* DEBUG - remove later echo 'IGNORE match: {'.htmlspecialchars($thing).'}
'; /**/ $output = $thing; } // rewrite forced (non-URL) links elseif (preg_match('/'.PATTERN_FORCED.'/',$thing,$matches)) { /* DEBUG - remove later echo '
FORCED match:
';
print_r($matches);
echo '
'; /**/ if (isset($matches[3])) #$linktext = preg_replace('/'.PATTERN_CAMELWORD.'/', IGNOREMARKER."$0".IGNOREMARKER, $matches[3]); $linktext = $matches[3]; else $linktext = $matches[1]; # use name for forced link without a description (like [[MHM]]) $output = IGNOREMARKER.'""'.$linktext.'""'.IGNOREMARKER; /* DEBUG - remove later echo 'REWRITE FORCED - output: {'.htmlentities($output).'}

'; /**/ } // rewrite image links - MUST come before rewriting Camelwords! elseif (preg_match('/'.PATTERN_IMGLINK.'/',$thing,$matches)) { /* DEBUG - remove later echo '
IMGLINK match:
';
print_r($matches);
echo '
'; /**/ $output = 'link="'.$wakka->Href('','',"page=".$matches[1]).'"'; /* DEBUG - remove later/ echo 'REWRITE IMGLINK - output: {'.htmlspecialchars($output).'}

'; /**/ } // rewrite Camelwords elseif (preg_match('/'.PATTERN_CAMELWORD.'/',$thing,$matches)) { /* DEBUG - remove later echo '
CAMEL match:
';
print_r($matches);
echo '
'; /**/ #$output = $matches[1].'""'.$matches[2].'""';`# freecamel $output = '""'.$matches[0].'""'; # camelword /* DEBUG - remove later/ echo 'REWRITE CAMEL - output: {'.htmlentities($output).'}

'; /**/ } // nothing to do else { $output = $thing; } return $output; } // 1) mark things to be ignored for rewriting (formatter wil take care of these when necessary) $content = preg_replace_callback('/'. PATTERN_CODE. '|'. PATTERN_LITERAL. '|'. PATTERN_ACTION. '|'. PATTERN_INTERWIKI. '|'. PATTERN_FORCEDURL. '|'. PATTERN_URL. '|'. PATTERN_ATTRIB. '/s', 'MarkIgnore', $content); /* DEBUG (!) - remove later echo '
content before rewriting links:
'; echo '{
'.htmlspecialchars($content).'
}
'; /**/ // 2) rewrite links (unless to be ignored) $content = preg_replace_callback('/'. PATTERN_IGNORE. # needed to be able to skip strings to be ignored '|'. PATTERN_FORCED. # rewrite '|'. PATTERN_IMGLINK. # rewrite '|'. PATTERN_CAMELWORD. # rewrite '/s', 'RewriteLink', $content); /* DEBUG - remove later echo '
content before cleaning up ignore markers:
'; echo '{
'.htmlspecialchars($content).'
}
'; /**/ // 3)strip "ignore markers" from content $content = str_replace(IGNOREMARKER, '', $content); /* DEBUG - remove later echo '
content after cleaning up ignore markers:
'; echo '{
'.htmlspecialchars($content).'
}
'; /**/ if ("Download this page" == $_POST['action']) # i18n { // SAVING FETCHED PAGE if ($this->LoadPage($page)) { // local page with this name already exists => display error message // in the future we might show a form to ask if the local version should be overwritten $header = 'Sorry, a page named **[['.$page.']]** already exists on this site! --- '; # i18n $form = $this->FormOpen().$form_main.$form_disconnect.$this->FormClose(); } else { // local page does not exist => proceed // write page to database and display message $note = "fetched from the Wikka server"; # i18n $this->SavePage($page, $content, $note); $header = 'This page is now available on your site! --- --- '; # i18n $form = $this->FormOpen().$form_page.$form_local.$form_main.$this->FormClose(); } } else { // display default header & form # @@@ i18n!! $header = 'You are currently browsing: **'; $header .= '""'.$page.'""'; $header .= '** --- from the **[['.$this->GetPageTag().' Wikka Documentation Project]]** --- '; $header .= '(fetched from the [['.$remote_server_root.$page.' Wikka server]])'; $form = $this->FormOpen().$form_page; $form .= ($this->LoadPage($page)) ? $form_local : $form_download; $form .= $form_disconnect.$this->FormClose(); } } /* DEBUG - remove later echo '
content after defining form:
'; echo '{
'.$content.'
}
'; /**/ // PRINT HEADER AND CONTENT print '
'.$this->Format($header).$form.'
'.$this->Format($content); } // CLOSE CONNECTION fclose($remote_page); ?> %% -- DarTar ''The code contains references to ""HelpInfo"" which has now disappeared and been replaced by WikkaDocumentation - I haven't updated your code here, but I //am// updating the copy I'm working on... --JavaWoman'' ~done -- DarTar ''Thanks for posting this code, DarTar! (No I don't mind.) A few notes about testing this though:'' ~1)''Note my just-added comment - somehow some of the regular expressions (copied from the formatter as noted, sometimes changed minimally) have become changed in transit. Compare with the corresponding code in ./formatters/wakka.php!'' ~1)''You see a lot of little comment blocks starting with the line'' ##/* DEBUG - remove later## ''and ending with'' ##""/**/""## ''. They are intended to trace the inner workings of the rewrite engine while it does its work. Each of these traces can be "turned on" simply by adding'' ## */## ''to the first line so it reads'' ##/* DEBUG - remove later */## ''. Don't do that for all of them at once: you'd end up with a huge amount of output - rather, pick and choose to concentrate on a particular aspect of the link rewriting. Simply remove the'' ## */## ''from the first line of the block again to suppress the debug output, but leave the lines in place so you can later turn them on again.'' ~1)''Finally, a lot still needs to be done... most of the work now was on the actual rewriting guts of the action. There are still matters of code organization, internationalization (preparation) and other things to address - but work on those aspects is pretty futile untile the rewrite engine itself works properly.'' ''Have fun testing! --JavaWoman'' ---- CategoryDevelopmentActions