Skip to main content

6 posts tagged with "jupyter"

View All Tags

Datalayer adding GPU to Anaconda Notebooks

· 6 min read
Eléonore Charles
Product Manager

We are thrilled to announce our collaboration with Anaconda, a leader in Data Science and AI platforms. This partnership marks a step forward in our mission to democratize access to high-performance computing resources for Data Scientists and AI Engineers.

Anaconda offers Anaconda Notebooks, a cloud-based service that allows data scientists to use Jupyter Notebooks without the hassle of local environment setup. Through our collaboration, we are enhancing this platform with Datalayer's Remote Runtime technology, bringing seamless GPU access directly to Anaconda Notebooks users.

Why Remote Runtimes and GPUs Matter

In traditional Jupyter Notebook setups, all computations occur locally on a user's machine or a cloud instance. While this setup works well for small to medium-sized tasks, scaling these tasks to handle massive datasets, complex deep learning models, or resource-intensive simulations requires more powerful hardware, such as Graphics Processing Units (GPUs).

GPUs are game-changers for data science and AI because they can parallelize computations, drastically speeding up processes like neural network training, image processing, and large-scale data analytics. However, setting up a local or cloud environment with GPU support can be technically challenging and time-consuming, especially for non-experts.

By upgrading Anaconda Notebooks with Datalayer's Remote Runtime technology, the heavy lifting is done behind the scenes, allowing Anaconda users to focus on what matters most: their data science tasks.

How Datalayer Supercharges Anaconda Notebooks

One of the core advantages of Anaconda Notebooks is its ease of use. Users can quickly launch Jupyter Notebooks with all the libraries and environments they need without the hassle of local configuration. The collaboration with Datalayer builds on this strength, making it incredibly easy for Anaconda Notebooks users to access remote GPU-powered Runtimes.

Users can launch GPU Runtimes directly from the Anaconda Notebooks Jupyter Launcher and switch their Jupyter Notebook to a GPU Runtime with a single click.

info

Anaconda Notebooks is running on an Anaconda managed JupyterHub while Datalayer Runtimes are running on a separated Kubernetes cluster with IAM (Identity and Access Management) and Credits (Usage) integrations.

Architecture Diagram

Benefits for Anaconda Notebooks Users

The collaboration between Datalayer and Anaconda offers several key benefits to the platform's existing and future user base:

  • Enhanced Performance: Users now have access to powerful GPUs without having to manage the underlying infrastructure. This enhancement translates to faster computations and the ability to handle more complex tasks.

  • Cost-Effective Scaling: By leveraging Remote Runtimes, users only consume GPU resources when needed. They can switch between CPU and GPU Runtimes based on the task, optimizing both performance and cost.

  • User-Friendly: The familiar Anaconda Notebooks interface remains the same, with the added option of GPU Runtimes. No additional learning curve or configuration is required, making it accessible even for non-technical users.

  • Broader Use Cases: With GPU support, Anaconda Notebooks users can now tackle a wider range of projects. From deep learning models and complex simulations to high-dimensional data processing, the possibilities have expanded dramatically.

Datalayer provides one-click access to scalable GPU infrastructure, enabling Data Scientists and AI Engineers at all levels to run even the most advanced AI and ML tasks, integrated with the Jupyter Notebook where they are already working.

Jack EvansSr. Product Manager

For any Business in a Whitelabelled Variant

The Datalayer Runtimes are available for any company in a whitelabelled variant.

Integrating managed deployment of Datalayer with your existing Jupyter solution brings a significant advantage for its operators: it allows quick and straightforward installation of a JupyterLab extension and services on Kubernetes, without requiring additional development. This streamlines operations and enables operators to focus on managing the infrastructure, free from the complexities of configuration.

Reach out for more information on how to integrate Datalayer on you Kubernetes cluster and add Runtimes to your existing Jupyter solution.

Conclusion

Our partnership with Anaconda puts the power of high-performance computing at the fingertips of the Anaconda users, while preserving the simplicity and ease of use that Anaconda Notebooks is known for. This collaboration goes beyond simply boosting computational power; it democratizes access to essential tools, empowering Data Scientists and AI Engineers around the world to achieve more, faster, and with greater efficiency. By breaking down barriers, Anaconda and Datalayer are enabling Data Scientists and AI Engineers to unlock their full potential, paving the way for new innovations.

That Beta availabilty was announced at the latest NVIDIA GTC event. Looking ahead, we plan to refine this solution further by enhancing the user interface and incorporating feedback from early users. Additionally, we aim to integrate the GPU Runtime feature into the Anaconda Toolbox.

To learn how to access this feature, visit the official Anaconda GPU Runtimes documentation as well as this Anaconda blog post.

You can register on the Beta waiting list via this link.

Datalayer: Accelerated and Trusted JupyterRegister and get free credits

Datalayer Private Beta

· 4 min read
Eléonore Charles
Product Manager

We are super excited to announce that Datalayer is entering Private Beta! After months of development, we are inviting today those who signed up on our waiting list to experience our solution first-hand.

How to Join the Beta?

If you registered on our waiting list, keep an eye on your inbox, invitations are being sent out now! We're thrilled to have you onboard as part of this exclusive group, helping us shape the future of Datalayer.

But don't worry if you haven't signed up yet—there are still limited spots available. Simply register on the waiting list to secure your spot in the private beta.

Why Join the Beta?

This is your opportunity to get early access to the cutting-edge features of Datalayer, and we need your help to make it even better. Your experience and feedback will be invaluable in helping us fine-tune the product, optimize performance, and add features that truly meet your needs. It would be great to have you on board and we can't wait to hear your thoughts!

As a beta user, you'll enjoy:

  • Free credits to try out Remote Kernels.
  • Direct support from our team to ensure a smooth experience.
  • Directly influence the future development of Datalayer through your feedback.

What Can Datalayer Bring You?

Datalayer simplifies access to powerful computing resources (GPU or CPU) for data scientists and AI engineers. Whether you're training models or running large-scale simulations, you can seamlessly scale your workflows without changing your code or setup.

Key Benefits

  • Effortless Remote Kernel Access: Seamlessly connect to high-performance Remote Kernels from JupyterLab, VS Code, or via the CLI. Switch kernels with just a few clicks to run your code on powerful machines, without altering your workflow or setup.
  • Flexible and Simple Setup: Avoid the complexity of configuration changes or workflow disruption. Launch Remote Kernels effortlessly and scale your data science or AI workflows with ease, whether you're working on notebooks or scripts.
  • Optimized Resource Usage: Gain control over resource allocation by running specific notebook cells on Remote Kernels only when needed. This precision helps minimize resource consumption and maximize efficiency.
  • Flexible Credits-Based Model: Enjoy a pay-as-you-go credits system that adapts to your needs. With transparent usage tracking and detailed reports, you'll only pay for the resources you use, making it a cost-effective solution for scaling your projects.

Learn more about Datalayer's features on our user documentation and online SaaS.

Datalayer: Accelerated and Trusted JupyterRegister and get free credits

GPU Acceleration for Jupyter Cells

· 7 min read
Eléonore Charles
Product Manager

In the realm of AI, data science, and machine learning, Jupyter Notebooks are highly valued for their interactive capabilities, enabling users to develop with immediate feedback and iterative experimentation.

However, as models grow in complexity and datasets expand, the need for powerful computational resources becomes critical. Traditional setups often require significant adjustments or sacrifices, such as migrating code to different platforms or dealing with cumbersome configurations to access GPUs. Additionally, often only a small portion of the code requires GPU acceleration, while the rest can run efficiently on local resources.

What if you could selectively run resource-intensive cells on powerful remote GPUs while keeping the rest of your workflow local? That's exactly what Datalayer Cell Kernels feature enables. Datalayer works as an extension of the Jupyter ecosystem. With this innovative approach, you can optimize your cost without disrupting your established processes.

We're excited to show you how it works.

The Power of Selective Remote Execution

Datalayer Cell Kernels introduce a game-changing capability: the ability to run specific cells on remote GPUs while keeping the rest of your notebook local. This selective approach offers several advantages:

  1. Cost Optimization: Only use expensive GPU resources when absolutely necessary.
  2. Performance Boost: Accelerate computationally intensive tasks without slowing down your entire workflow.
  3. Flexibility: Seamlessly switch between local and remote execution as needed.

Let's dive into a practical example to see how this works. We'll demonstrate this hybrid approach using a sentiment analysis task with Google's Gemma-2 model.

Create the LLM Prompt

We start by creating our prompt locally. This part of the notebook runs on your local machine:

prompt = """
Analyze the following customer reviews and provide a structured JSON response for each review. Each response should contain:

- "review_id": A unique identifier for each review.
- "themes": A dictionary where each key is a theme or topic mentioned in the review, and each value is the sentiment associated with that theme (positive, negative, or neutral).

Format your response as a JSON array where each element is a JSON object corresponding to one review. Ensure that the JSON structure is clear and easily parseable.

Customer Reviews:

1. "I love the smartphone's performance and speed, but the battery drains quickly."
2. "The smartphone's camera quality is top-notch, but the battery life could be better."
3. "The display on this smartphone is vibrant and clear, but the battery doesn't last as long as I'd like."
4. "The customer support was helpful when my smartphone had issues with the battery draining quickly. The camera is ok, not good nor bad."

Respond in this format:
[
{
"review_id": "1",
"themes": {
"...": "...",
...
}
},
...
]
"""

Analyse Topics and Sentiment on Remote GPU

Now, here's where we leverage the remote GPU. This cell contains the code to perform sentiment analysis using the Gemma-2 model and the Hugging Face Transformers library. We'll switch to the Remote Kernel for just this cell:

from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Login to Hugging Face
login(token="HF_TOKEN")

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")

# Load the model
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-2b-it",
device_map="auto",
torch_dtype=torch.bfloat16,
)

# Prepare the prompt
chat = [{"role": "user", "content": prompt},]

# Generate the prompt and perform inference
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=2000)

# Decode the response, excluding the input prompt from the output
prompt_length = inputs.shape[1]
response = tokenizer.decode(outputs[0][prompt_length:])

By executing only this cell remotely, we're optimizing our use of GPU resources. This targeted approach allows us to tap into powerful computing capabilities precisely when we need them, without the overhead of running our entire notebook on a remote machine.

To execute this cell on a remote GPU, you just have to select the remote environment for this cell.

This is done with just a few clicks, as shown below:

With a simple selection from the cell dropdown, you can seamlessly transition from local to remote execution.

info

Using a Tesla V100S-PCIE-32GB GPU, the sentiment analysis task completes on average in 10 seconds. The number of tokens/seconds processed is ± 19.

The model was pre-downloaded in the remote environment. This was done to eliminate download time. Datalayer lets you customize your computing environment to match your exact needs. Choose your hardware specifications and install the libraries and models you require.

Datalayer Cell Kernels allow you to manage variable transfers between your local and remote environments. You can easily configure which variables should be passed from your local setup to the Remote Kernel and vice versa, as illustrated below:

This ensures that your remote computations have access to the data they need and that your local environment can utilize the results of remote processing.

info

Variable transfers are currently limited in practice to 7 MB of data. This limit is expected to increase in the future, and the option to add data to the remote environment will also be introduced.

To help you monitor and optimize your resource usage, Datalayer provides a clear and intuitive interface for viewing Remote Kernel usage.

Process and Visualize Results Locally

We switch back to local execution for processing and visualizing the results. This is the processed list of themes and sentiments extracted from the reviews by the Gemma-2 model:

[
{
'review_id': '1',
'themes': {'performance': 'positive', 'speed': 'positive', 'battery': 'negative'}
},
{
'review_id': '2',
'themes': {'camera': 'positive', 'battery': 'negative'}
},
{
'review_id': '3',
themes': {'display': 'positive', 'battery': 'negative'}
},
{
'review_id': '4',
'themes': {'customer support': 'positive', 'camera': 'neutral', 'battery': 'negative'}
}
]

And below is a visualization of the theme and sentiment distribution across the reviews:

Key Takeaways

Datalayer Cell Kernels allow you to selectively run specific cells on remote GPUs. This hybrid approach optimizes both performance and cost by using remote resources only when necessary. Complex tasks like sentiment analysis with large language models become more accessible and efficient.

Check out the full notebook example and sign up on the Datalayer waiting list today and be among the first to experience the future of hybrid Jupyter workflows!

Datalayer: Accelerated and Trusted JupyterRegister and get free credits

Datalayer 0.0.6, a more React.js Jupyter

· 8 min read
Eric Charles
Datalayer Founder

We are thrilled to announce the 0.0.6 release of Datalayer. This release improves the data analytics user and developer experience with Jupyter React, a javascript library to ensure React.js is a first-class citizen in the Jupyter ecosystem.

Jupyter React is built on top of JupyterLab which aims to be the next default notebook for Python data scientists and is actively developed. However, some users sill prefer the classic notebook and JupyterLab is not yet mainstream... The following points can be the identified as the source of the shadow:

  1. The user interface is intimidating and quite complicated. An initiative to strip-down the user interface has been taken with Retrolab, but the result still looks pretty much like JupyterLab without visible value compared to the classic notebook. Users will even loose some beloved features like their preferred keyboard shortcuts, VIM mode, performance...
  2. The extensions ecosystem is rich but breaking changes in the core of JupyterLab have made the overall ecosystem fragile and subject to failures on installation.
  3. The overall performance (startup time, load large notebook, switch tabs...) is know to be degraded on JupyterLab.
  4. The recently merged realtime collaboration feature is solely not usable with a real multi-user authentication and authorization system.
  5. As developer, the Lumino widget toolkit which backs JupyterLab user interface is hard to use and looks pretty much like a Qt toolkit rather than like a modern javascript e.g. React.js, Vue.js, Svelte...
Jupyter React Widgets Gallery
Datalayer: Accelerated and Trusted JupyterRegister and get free credits

Towards a cloud native Jupyter

· 5 min read
Eric Charles
Datalayer Founder

All Data Scientists know that story... Install the well-known Jupyter Classic or JupyterLab Notebook on their local PC/laptop, pip install some python libraries like pandas..., download some datasets and finally start analysing with a notebook in isolation. There are a few pain points there:

  1. Setting up the tools is hard and time consuming. You have to install Python, Jupyter and add the libraries you need. Conda environments or Docker containers can help mitigate the pain at some point, but finally these are yet additional tools to setup and manage.
  2. At some point, they want to collaborate with teammates, or want to share some results. The Data Scientist is just on his island and has no easy way to break the silo. The recent Realtime collaboration features have been merged into JupyterLab but it is just the permises and miss fundamental building blocks like identity, authorization...
  3. The analysis is not easily reproducible. The setup you have done on a particular Windows platform is completely different from the setup another collaborator may have done on macOS.

More Cloud-native

There comes the need for an better solution. At Datalayer we think that a more Cloud-native Jupyter can help remove those pain points. In other words, we embrasse the infrastructure provided by cloud providers like GCloud, AWS, Azure... and build on top to provide more power to the Data Scientist.

Cloud native computing is an approach in software development that utilizes cloud computing to "build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds.

Wikipedia https://en.wikipedia.org/wiki/Cloud_native_computing

Datalayer: Accelerated and Trusted JupyterRegister and get free credits

A new start with Jupyter

· 2 min read
Eric Charles
Datalayer Founder

Since our last blog post on January 2018, we have changed a lot the Datalayer architecture. Back in 2018, we had chosen for Apache Zeppelin for its good integration with Big Data frameworks like Apache Spark and competely replaced the existing Angular.js user interface with a home-brewed React.js implementation to integrate with the Kubernetes Control Plane. While rolling out more and more features on top of our former version 0.0.1, we have been intrigued in February 2018 by JupyterLab being announced to be ready for users. Back in time, in July 2016, JupyterLab was positioned as the next generation of the Jupyter Notebook.

Datalayer: Accelerated and Trusted JupyterRegister and get free credits