Skip to content

Category: Uncategorized

Proposal for Metadata transmission standard between XR and traditional computer/laptop

I propose that we need a simple way to transfer whatever is in a document’s metadata, as well as the document itself, irrespective of contents. I further propose that this simply be Visual-Meta wrapped in JSON.

• Traditional computer host application reads all Visual-Meta from a document and wraps it in JSON for transmission.

• webXR receives information and parses Visual-Meta, same as the traditional computer application would (code available for this).

This removes the need for transcoding. An additional dummy file (PDF) acts as Library, storing all Library view and layout information, also in the format of Visual-Meta and is transferred in the same way. It will simply be called Library.PDF Information is to be designed as a group. This file will include list data for 2D devices and 3D information for XR.

User’s laptop needs to be online with WebDAV running in the Library software (in the case of our basic implementation that is Reader) to synch with the webXR system. Once done the user can put laptop to sleep. Synch should be continuously attempted in order to automatically reconnect to receive data from the headset to the laptop.

A vision of near future Extended Reality

You are sitting at your desk and put on your XR glasses, then you make a fist and  move your hand up, as though you are pulling up a sheet, and a map of the earth appears out of the ground, centred on your location. You pull it up further and it shapes to comfortably fit on your desk where you can scale it and move around with intuitive gestures.

This is an idea for gestures as ‘shortcuts’, similarly to how we can use keyboard shortcuts or gestures on our computers to do specific things instantly, instead of going through menus.

Imagine further that if you stand up, the map covers your room, and if you gesture in a circle, your room disappears and the map stretches to the horizon. You use your hands to move anywhere on earth and if you like, you can scale the earth to fit in your hand so you can then look around the solar system, our galaxy and perhaps beyond, eventually holding Laniakea in your hands.

Picture this as a default you get from your preferred vendor of XR glasses but at any scale of looking at the world you can open your hand, palm face towards you, and have access to choose any other model for that scale, time or location. In other words, your virtual desktop is infinite but you can always choose to view a different version of anywhere, any time.

Ears and Eyes

I think that in the same way that if someone wears headphones and talks to someone today and people will understand that they are likely not in Noise Cancellation mode, that they are likely in a Transparency mode and that the headphones do not necessarily fully block out sound, soon it will be the case with AR glasses that they will only augment when required.

This Journal

As of December 2023 I am writing my thoughts in the Journal in Author (cmd-J in Author to instantly access) which is one single long Author document. I write a new entry with the heading level 1 and when ready to post I copy the heading, select the body of the post/thought and use Liquid to post to WordPress, which is quick and simple, since the body text is automatically posted and I just have to paste or type the title. I then set the Category to ‘Journal’ and off I go.  

For those who might wonder, PDF does not enter into this, though I might produce quarterly PDF Journals or maybe keep it included annually in the Book.  

I am also considering starting a document similar to the Journal for transcripts of our Monday meetings and posting them in a similar way. We’ll see… 

A Spatial Vision for Author & Reader

Work for porting the software Author and Reader to visionOS is necessarily explorative. These are my personal notes to guide the research, based on experience and personal preference to be revised as we build experiments to experience what it actually feels like and learn what really works in this new environment. A design constraint is that the user should be able to seamlessly move between working in spatial and traditional environments. The practical opportunities available for coding in visionOS and the resources available are not unconstrained. We have therefore had to decide on which problem to solve first. Since space is the key difference with this medium, we have decided to go large and build a Defined Concepts Map, as already implemented in Author macOS.

  

Initial Problem to Solve : Mapping Concepts

Author supports the user defining concepts which can be viewed on a Map, something which testing shows can be more powerful in an expanded space where the Map can cover a larger area.

Integration with other views can include the ability for the user to toggle the Map being a background or foreground element, such as the main document view and document outline:

Above, schematic view of basic Author views. Below, same Author views tested in visionOS:

 

Layout Tests

Layouts have been prototyped and experimented with to experience in currently available headsets, such as the Quest Pro:

 

Interaction Test

A test implementation by Brandel is available to try on any headset. To use, open the following link in Google Chrome then drag an Author document (.liquid) onto the bar top left. You will then see the Map layout and you can interact with it. You can also choose to view this in your VR headset and then you will need to visit the same location to interact with the Map: https://t.co/nEIoUpiUsW

 

Currently implemented Map in macOS

The Defined Concepts Map in Author. The potential for porting to visionOS is truly powerful. A key aspect of the Map is that all connections are not shown all the time, which produces a cluttered view, but only when nodes are selected, allowing for more fine grained understanding of how the concepts relate:

    

Multiple Maps

It is not trivial, but it is relatively easy to create a Map of nodes of a large size. The issue becomes one of control of views.

The integration of different Maps, both in terms of data access and use and with the user interactions, is expected to be the main challenge, to integrate maps such as listed below, with questions such as inn visionOS, should these be accessed as windows, volumes or spaces, or–ideally–anyone the user chooses?

  • Defined Concepts
  • Library of References
  • Journal Research Notes
  • User’s articles
  • Online Resources

  

Interactions & Connections

Below is an illustration of several such Maps. Key design challenge will be how to move between them, how to see connections, how to change layout and views and how they can be nested:

Below is a set of commands for the user to specify what should be seen in a Map, for consideration as to what options should be visible in spatial environments:

  

Openness & Metadata Interoperability

The defined concepts are subsequently exported as Glossaries currently, and work should be done to explore including the layout of the Map view on export. Such a Map view should be extractable from documents when reading and in the Library, so there needs to be a visual and interaction means for the user to easily understand the different Maps and what scale they related to.

Early Tests

  

Discussion of further Work Modes

    

Authoring

Writing a short document should be relatively similar to writing on a laptop or desktop.

Writing a longer document becomes more of an issue of editing, with moving elements around and seeing how they related. This can likely benefit from innovative views in a larger space, going further than Author’s fold views, to maybe being able to move section off into separate displays to work on them, while still being part of the whole.

  

Reading

Finding what to read in a document can likely benefit from innovative views of the document, going further than what we have done in Reader, including showing all pages on a very large ‘wall’ display.

Reading pages or screens of text in this environment should be optimised for what is most comfortable for the eyes.

  

Library

A Library which supports innovative views may also benefit from a fully immersive space in order for the user to be able to arrange documents/books in space using layouts with specific themes/topics etc.

In order for a Library to have open books, so to speak, the metadata needs to be readily available. Visual-Meta as well as JSON and other means will be investigated to support this.

    

Research Journal/Notes

Another challenge is how to augment the user’s ability to take notes and then find their notes later, including notes on documents for citing them.

This is probably one area where spatial environments can really shine, with easily manipulable ‘murder wall’s and other types of views being deployed.

University of Southampton, and I was a former teacher of the year 2014 at London College of Communication.

  

  

Frode Hegland
June 2023

 

 

the future of text, the future of thought, is richly multidimensional

  

  

   

VR

There is no question in my mind that over the next few years headsets for VR and AR will become commonplace with anyone who works with text.

Issues to overcome include reducing the screen door effect so that reading text becomes pleasant, weight and comfort for longer sessions cost and maybe most importantly and surprising that it has not yet been solved: Instant access to the user’s information landscape and colleagues (though I do expect Apple to make some major advances here).

Opportunities initially include space, space and more space to think and work. Beyond this there is almost limitless potential for text and information interaction in general, something we have only started to scratch the virtual surface of.

The importance of today. I feel quite strongly that it will be important to develop this entirely new, all-encompassing environment as well as possible and not only based on the initial commercial explorations of big tech companies. This is why I have been experimenting to experience and hosting dialog on the future of text in VR through the Lab.

‘Ask AI’ in Reader

Because AI queries while writing are likely quite different than when reading, we have added AI to Reader directly.

Goal: Give users the power of AI when reading to help them evaluate & understand the material and how it connects to other knowledge in their field.

  

Issue Command

User ctrl-clicks on selected text and a new command ‘Ask AI’, menu appears at the very top of the list:

  

This menu has sub items, as listed here (subject to editing):

  • ‘What is this’ (W)
  • ‘Show me more examples’ (S)
  • ‘What does this relate to’ (R)
  • ‘Show me counter examples’ (C)
  • ‘Is this correct?’ (I)
  • ‘Explain the concept of (E)’
  • ‘Create a timeline of this’ (T)
  • ‘Discuss the causes and effects of’ 
  • ‘Edit’ (which opens Reader’s Preferences to allow the user to design their own)

    

Results

The results are shown in a floating window where the query text is shown in a box, as below, and the results below. The text can be interacted with to copy it. To dismiss simply close the window.

  

Preferences

Users will need to supply their own OpenAI API keys (same as the approach for AI in Liquid and macGPT) where they can also turn on and off commands.

Clicking [+] produces this dialog, same as clicking [Edit] for a current command:

  

  

‘Ask’ GPT in Liquid

After getting the MacGPT app, which is essentially toolbar access to GPT, and after user requests, I realised this can’t be that hard to implement in Liquid, so here is the plan: Use Liquid as an interface to take whatever the user comes across or writes and in a few clicks send to an AI system (GPT) with a custom prompt, to help both students, teachers and general knowledge workers:

  

Interaction

A new top level command in Liquid called ‘Ask’, with shortcut (A) to send selected text to ChatGPT, with an associated prompt:

The sub menu contains options to choose a prompt/how to preface the selected text (not all are on/visible by default):

  • ‘What is’ (W)
  • ‘Write in academic language’ (A)
  • ‘Show me more examples’ (S)
  • ‘What relates to’ (R)
  • ‘Show me counter examples to’ (C)
  • ‘Is this correct?’ (I)
  • ‘Check for plagiarism’ (P)
  • ‘Explain the concept of (E)’
  • ‘Create a timeline of’ (T)
  • ‘Discuss the causes and effects of’
  • Create a quiz with 5 multiple choice questions that assess students’ understanding of
  • ‘Edit’ (which opens Liquid’s Preferences to allow the user to design their own)

  

Results

Since the API can be slow, as can be seen when using MacGPT and other interfaces, there will be a flashing cursor while waiting for the results. If it is easier to produce the results in a web view, then we will do that.

Note, as in the error for 1980, AI is not at the stage where it can be trusted to always be correct, and maybe this will never happen. Nevertheless, it is a tool and user’s need to learn how to use it, including checking what it produces:

Development note: This should ideally be presented in a non-full screen, floating window, for the user to dismiss when done or leave open.

  

Preferences/Key (how it works)

Here the user will be able to customise and make their own preface text/prompts. Enter a name, shortcut and full text of prompt/preface text to send to Chat PGT:

Preferences is also where user’s add their own API Keys for GPT, as inspired by how MacGPT does it, and also option to choose model.

On first try of an AI service, Liquid will show a dialog asking for the API key. If dismissed, it will simply ask for it again on next attempt.

Future updates should be able to let the user choose other AI models, including Google Bard.

  

Notes on longer prompts

Some of the actual prompts will be longer than indicated above. This will need some basic experimenting. For example:

Check for plagiarism: I want you to act as a plagiarism checker. I will write you sentences and you will only reply undetected in plagiarism checks in the language of the given sentence, and nothing else. Do not write explanations on replies. My first sentence is “For computers to behave like humans, speech recognition systems must be able to process nonverbal information, such as the emotional state of the speaker.”

Increased Space

Although the act of writing is an intimate affair, where even a 13″ laptop screen can be ideal, allowing the author to focus, the act of editing and constructing a large document and thinking about connections can benefit from a larger display.

The 27″ Apple Studio Display really does provide some more space to see and to think.

Almost like XR in scale, though of course there is no third dimension. It was the act of working in VR which really showed me how more space helps however. If the current headsets were less likely to loose connection to my Macbook and had less screen door effect, I might not have needed to purchase this screen and I would have had the benefit of an even more flexible, and portable workspace.

I went from this when working in the Map view in Author:

showing workspace of 13″

to this on the Studio Display:

showing workspace of 27″